Apache Iceberg: Open Format for Data Warehouse and AI Agents

This title was summarized by AI from the post below.

Bauplan•5K followers

1mo

One question we got during our last webinar: "Is the underlying data warehouse Iceberg? What are the options?" Short answer: yes. Apache Iceberg is the open format handshake between Bauplan and the rest of your stack. Your data never moves — it stays in your storage layer, your existing source of truth. No migration, no lock-in. Turns out that matters quite a bit when AI agents are running dozens of experiments in parallel. In this webinar, we show exactly how that works end to end — with our friends at Recce: → An agent builds a user segmentation pipeline from scratch, in full branch isolation → A second agent adds bot detection to that same pipeline → Recce's review agent compares the branches, surfaces schema diffs + lineage impact, and generates an auditable merge report Zero production risk. Full human oversight. Structured workflow. Trusting AI agents with your data is one of the hardest unsolved problems in data engineering. This is how we're solving it. Full recording 👇 https://lnkd.in/g4FpT8Nc

Trusting AI with Your Data: Safe Automation from Branch to Production

https://www.youtube.com/

To view or add a comment, sign in

More Relevant Posts

Amy Hodler

GraphGeeks•6K followers
2w
Report this post
Is your AI missing the "Ground Truth"? 🧩 I caught up with Wes Madrigal, CEO of Kurve, at ODSC West to talk about why the world of metadata and foreign keys is actually the future of Generative AI. I learned that despite the hype, 80% of AI work is still stuck in the data discovery phase. Wes breaks down how Kurve acts as a developer tool to automate the extraction of primary and foreign keys on data lakes. By building a relationship metadata graph, they allow users to navigate complex data as a graph traversal problem rather than a manual data munching nightmare. As Wes puts it: "Text to SQL falls short without facts, without robust foreign keys... the reality is you need robust facts, just like humans do." Why this matters for us GraphGeeks: - Graph for Data Prep: By treating data preparation as a graph traversal problem, we can automate the manual merging and aggregation that usually slows down analytics. - The Fact Gap: Relational metadata is making a comeback as the essential ground truth that AI agents need to function reliably. - True ML automation: Is about the entire end-to-end pipeline, including data relationships—from discovery to model—not just tuning parameters. Between the technical deep dives, this is another incredible hallway conversation at the Open Data Science Conference (ODSC)! Special thanks to Bryce Merkl Sasaki Merkl-Sasaki for being the pro behind the camera and capturing these conversations live from the floor. 🎥 🎧 Listen on the go: https://lnkd.in/gpvE7Qv8 🎬 Watch the full Graph Chat interview here: https://lnkd.in/gXH9Ni-g

Graph Chat: Automating Data Discovery with Wes Madrigal

https://www.youtube.com/

3 Comments
Like Comment
To view or add a comment, sign in
TO THE NEW

165,208 followers
3w
Report this post
It’s time to stop viewing data governance as a "checkbox" and start seeing it as a competitive advantage. In his latest article featured in Express Computer, Jitender Punia, our Principal Technical Architect for Data Analytics, explores how the absence of a robust data governance framework can quickly turn AI models from assets into liabilities. Read the full feature in our Newsroom: https://lnkd.in/gQ2vQQB5 Do you see data governance as a business enabler or a speed bump in AI adoption? Share your thoughts in the comment below. Narinder Kumar | Vinayak . | Anmol Kalra #GenerativeAI #AIStrategy #DataQuality #DataGovernance
5 Comments
Like Comment
To view or add a comment, sign in
Jenny Kim

Equinix•1K followers
4w
Report this post
Why do so many AI projects stall after the pilot phase? I love the insights Glenn Dekhayser and Ravit Jain shared about how siloed systems, unstructured data, and infrastructure gaps hold enterprises back—and what can be done to fix it. https://lnkd.in/gZhHcHWu
Like Comment
To view or add a comment, sign in
Michael Kwamena-Poh jr

Equinix EMEA•5K followers
4w
Report this post
Why do so many AI projects stall after the pilot phase? I love the insights Glenn Dekhayser and Ravit Jain shared about how siloed systems, unstructured data, and infrastructure gaps hold enterprises back—and what can be done to fix it. https://lnkd.in/dN5KDR6U
Like Comment
To view or add a comment, sign in
Hari C P

Equinix•37K followers
4w
Report this post
Why do so many AI projects stall after the pilot phase? I love the insights Glenn Dekhayser and Ravit Jain shared about how siloed systems, unstructured data, and infrastructure gaps hold enterprises back—and what can be done to fix it. https://lnkd.in/gp5abh4P
Like Comment
To view or add a comment, sign in
Marco Zacchello

Equinix•3K followers
4w
Report this post
Why do so many AI projects stall after the pilot phase? I love the insights Glenn Dekhayser and Ravit Jain shared about how siloed systems, unstructured data, and infrastructure gaps hold enterprises back—and what can be done to fix it. https://lnkd.in/dfmD3kGW
Like Comment
To view or add a comment, sign in
Andy Yumol

Equinix•251 followers
3w
Report this post
Why do so many AI projects stall after the pilot phase? I love the insights Glenn Dekhayser and Ravit Jain shared about how siloed systems, unstructured data, and infrastructure gaps hold enterprises back—and what can be done to fix it. https://lnkd.in/e8kvJiwd
Like Comment
To view or add a comment, sign in
Jennifer Busfield

CData Software•1K followers
6d Edited
Report this post
A lot of AI conversations focus on models, but the real issue often shows up earlier in the pipeline. If your data is delayed, incomplete, or stuck in legacy systems, the outputs will be too. No amount of tuning fixes stale inputs. Jess Ramos lays this out well here, especially around the impact of batch pipelines and missed refreshes. If this is something your team is working through, register for the live event: https://bit.ly/4sAJHtO 📅 April 21 ⏰ 10 am ET / 7 am PT The conversation will cover how to move from batch to incremental approaches, when CDC makes sense, and how to prioritize modernization without breaking what already works. ⬇️

AI Isn’t the Problem. Your Data Pipeline Is. linkedin.com
Like Comment
To view or add a comment, sign in
Herbert J. Preuss

Equinix•1K followers
1mo
Report this post
Why do so many AI projects stall after the pilot phase? I love the insights Glenn Dekhayser and Ravit Jain shared about how siloed systems, unstructured data, and infrastructure gaps hold enterprises back—and what can be done to fix it. https://lnkd.in/eiCBDq3j
Like Comment
To view or add a comment, sign in
Aidapt

226 followers
2w
Report this post
Sigma just launched AI Agents that run on your data warehouse. no code. your data. your rules. the BI dashboard is officially on life support.
Like Comment
To view or add a comment, sign in