DuckDB's DuckLake 1.0 has dropped yesterday. Lots of cool stuff and improvements. As it turns out, I haven't been paying close enough attention to how the project was evolving. With much less noise surrounding it after the pretty hyped launch, I thought it has quietly shrunk to just a fun toy bird people did.
I was very wrong. Before getting into what 1.0 brings to the table, let's have a second to appreciate that Apache Spark can work with Ducklake. Remember when people sarcastically wrote in comments that the only thing supporting Ducklake is Duckdb? I 'member. Datafusion also supports it now, too. Neat!
More importantly, 1.0 codifies team's effort to remain Iceberg-compatible on data level. There's even possibility to shallow-copy migrate tables from Ducklake to Iceberg and vice versa. Me, personally, I always thought that it'd be easier and more proper (?) to use views to mimic Iceberg meta structure directly in Ducklake catalog, and contribute directly to engines a Trojan Iceberg spec - the one that will actually do SQL lookups instead of GET on S3. Anyway, I dig the direction team has taken. But maybe someone can tell me why they have decided not to do it that way?
But what really made me warm inside is the inlining. I think the first time I saw the implementation of this (or, rather something similar), was the RisingWave's streaming to Iceberg through Postgres-as-a-buffer. In both cases the concept is this: instead of creating a new Parquet file for each new insert or update, catalog database will store that data inside a conventional table until that data is flushed. Flushing writes a single Parquet file for the entirety of the batch. The same goes for deletions. And you can configure your buffer size to better fit the ingestion pattern. This helps out with small data files problem, if tuned accordingly.
Future of approachable lakehouse is looking bright!
MongoDB this is such a user-friendly feature! 🔍✨ Fuzzy matching that understands what you *mean* even when you mistype is the kind of thoughtful UX that keeps developers and end users happy. Search that actually works the way humans think — not just exact strings — is a massive win for any application. Brilliant capability and a great reminder why MongoDB continues to lead in developer experience! 💪 #MongoDB #FuzzySearch #DeveloperExperience #Database