Your AI coding agent knows SQL. But, it does not know streaming SQL. And that gap shows up in production. The one thing nobody is talking about: agents write code that runs but silently underperforms in stream processing pipelines. They reach for date_trunc when TUMBLE is the right choice. They use CREATE TABLE when CREATE SOURCE is what the pipeline needs. They set up CDC without sharing the source across materialized views, paying the ingestion cost twice. These are not syntax errors. They are architectural mistakes that only become visible under load. RisingWave just shipped something that fixes this: Agent Skills. Here is what it includes: → A core reference: covers the Source to Materialized View to Sink pattern, port configs, background DDL, MCP server setup and 100+ monitoring tools → 14 best practice rules across 5 categories: schema design, materialized views, streaming SQL, sink configuration, and performance optimization Some rules that stood out to me: → Share one Kafka source across multiple materialized views. Not one source per view. Big difference in cost. → Place watermarks at the source, not downstream. Your windows will actually close. → Use a two-step CDC pattern. One shared source, multiple downstream views. It works with Claude Code, Cursor, GitHub Copilot, Windsurf, and 15 other agents. Install it with one command: npx skills add risingwavelabs/agent-skills If you build real-time pipelines with AI tools, this makes your agent a lot better collaborator from day one. Read the full blog: https://lnkd.in/dYfJseEQ
RisingWave
Software Development
San Francisco, California 14,307 followers
The live data company. Powering humans and agents with what's happening now.
About us
The live data company. Powering humans and agents with what's happening now. Talk to us: https://risingwave.com/slack.
- Website
-
http://www.risingwave.com/
External link for RisingWave
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2021
Products
RisingWave
Event Stream Processing (ESP) Software
RisingWave is an event stream processing and management platform. It offers an unified experience for real-time data ingestion, stream processing, data persistence, and low-latency serving.
Locations
-
Primary
Get directions
95 3rd St
2nd Floor
San Francisco, California 94103, US
-
Get directions
16 Collyer Quay
Downtown Core, Central Region 049318, SG
Employees at RisingWave
Updates
-
25 pull requests. One week. Here's what changed in RisingWave. Most changelogs are boring. This one isn't. This one deserves more attention: you can now pass secrets directly into function call arguments. Before this, RisingWave supported secrets in connector definitions but not in user-defined functions. So if your UDF needed an API key, you were either hardcoding it or working around it. That gap is now closed. Here's the rest of what shipped this week (April 6-12): → jsonb_agg(*) wildcard support: aggregate entire rows into JSON without listing every column → Configurable join cache eviction: tune memory behavior per job instead of living with defaults → Vnode key stats for materialized views: finally see data skew across streaming fragments → CSV and XML encoding for file sinks: writing to S3/GCS in tabular format is now possible (POC) → Iceberg hardening: type mismatch fixes and primary key restrictions that prevent silent bugs And looking ahead to v2.9, the team is pushing hard on Iceberg table maintenance: garbage collection, compaction memory protection, and manifest rewrites. This is what production-ready Iceberg integration actually looks like. Full breakdown: https://lnkd.in/dWRMTkVY What streaming or Iceberg challenge are you dealing with right now? Drop it below for us! 👇
-
-
Most teams stream data into their lakehouse. But still rely on separate tools for querying, backfills, and observability. More tools. More complexity. More latency. RisingWave v2.8 changes that with a unified streaming and batch engine built for modern data and AI workloads. How RisingWave v2.8 helps: Query Iceberg tables directly with the built-in Apache DataFusion engine Run batch SQL including joins, aggregates, and window functions without Spark or Trino Snapshot backfill now default: Bootstrap large materialized views quickly with bulk loading and incremental updates Includes rate limiting, cancelation, and serverless backfill Per-job configuration Tune individual pipelines without impacting others Iceberg v3 and schema evolution: Add columns without rebuilding pipelines Use delete vectors for faster updates and efficient reads Supports IAM roles, JDBC catalogs, Google authentication, and automatic cleanup Streaming vector search Run real-time similarity search and RAG pipelines inside RisingWave Watermark TTL: Automatically clean up old state to reduce memory and speed up checkpoints Snowflake as a source Ingest and join historical data with real-time streams Adaptive parallelism: Automatically scale compute based on workload Separate scaling for backfill and streaming jobs Better observability: Job-level CPU profiling, improved Grafana dashboards, and richer diagnostics Track backfill progress and detect slow DDL operations CDC improvements: Configurable queue sizes, binlog lag monitoring, PostGIS support Reset sources easily and avoid publication conflicts Your pipelines stay simple. Your data stays fresh. Your system scales with you. One engine for streaming and batch. Less infrastructure. Faster insights. Building a streaming lakehouse? Join the community: go.risingwave.com/slack Read the release blog: https://lnkd.in/dYhSKvEN
-
We’re excited to be part of Kafka and Friends: Streaming + AI on May 12 in San Francisco, an event exploring how streaming infra is shaping the future of agentic AI. At RisingWave, we believe the live data layer is the foundation for the next generation of agentic AI systems. That’s something Yingjun Wu will also cover in his talk, “Designing Real-Time Systems for the Age of Agents.” In many of the agentic scenarios, the need for low-latency, incrementally updated data pipelines becomes critical. Modern streaming architectures enable these agents to react, adapt, and operate in real time. That's why we’re building really exciting products on top of RisingWave for those agentic AI use cases. There will also be a panel discussion on The Future of Kafka + AI, with leaders from StreamNative, Redpanda Data, Aiven, and RisingWave exploring where streaming meets AI applications. The event will also include talks: Sijie Guo: Streaming as the Backbone of Autonomous AI Agents Filip Yonov: Your Kafka Topics Already Know What They Are (live demo) Join us if you’re building in streaming, AI, or both. It’s going to be a great evening of ideas, demos, and conversations. Thanks to Hugh Evans for all the great work behind the scenes! 🙌 👉 Register here: https://luma.com/ub9sq0u5 See you in there!👋
-
-
Meet our RisingWave team at Iceberg Summit! 👋 At RisingWave, we’re building the streaming layer for the Iceberg lakehouse. With native Apache Iceberg support, RisingWave lets you: ✅ Create and manage Iceberg tables directly with Postgres simplicity ✅ Read and write external Iceberg tables ✅ Support MoR, CoW, compaction, and REST catalog ✅ Bridge real-time streams with Iceberg data lakes Come find us at the summit at our booth and meet our team members: Yingjun Wu, Rayees Pasha, and Zach Taapken! 👋
-
-
Agentic AI is moving from demos to production. Most data stacks weren't built for it. Agents need fresh data. Consistent data. At sub-second speed. Most stacks leave them with stale, bloated, or conflicting context. RisingWave fixes this with a live streaming data layer purpose-built for agentic AI. The problem: agents are only as good as the context they receive. Every decision depends on data that is fresh, distilled, and consistent. How RisingWave helps: Streaming materialized views keep data live to the second SQL transforms raw events into clean, agent-ready context Native vector(n) type powers real-time RAG and semantic search FLAT + HNSW indexes for exact and approximate nearest neighbor search Your agents get the right context. At the right time. Every time. Building an agentic data layer? Join the community: go.risingwave.com/slack
-
-
Apache Iceberg has become the default lakehouse table format, but scaling updates still forces a painful trade-off. Streaming and CDC pipelines need fast writes. BI dashboards need fast, predictable reads. So, most stacks make you pick a side. RisingWave solves this with configurable per-table write modes for Apache Iceberg, so your lakehouse adapts to your workload, not the other way around. Here's what you need to know 👇 Why Iceberg needs write modes? Iceberg never edits files in place. Every update or delete must either rewrite existing files or track changes separately. That's the CoW vs MoR trade-off. Copy-on-Write (CoW): Rewrites the full data file on every change. ✅ Fast, clean reads, no merging needed ✅ Ideal for dashboards and interactive analytics ⚠️ Higher write latency Merge-on-Read (MoR): Default in RisingWave Keeps base files untouched, appends small delta/delete files. ✅ Fast writes, great for CDC and streaming ✅ Lower storage cost ⚠️ Reads must merge base + deltas until compaction runs Where it applies in RisingWave? Both modes work for: → Iceberg sinks: RisingWave writing to externally managed Iceberg tables → Internal Iceberg tables: Tables created and managed inside RisingWave Configured per table in SQL with a single property: write_mode Don't forget compaction! MoR keeps writes fast by deferring cleanup. Compaction periodically merges deltas back into base files to keep reads efficient. → Iceberg sinks: enable compaction explicitly → Internal tables: compaction is on by default When to use which? Use MoR when ingest speed is the priority: CDC, streaming, frequent updates. Use CoW when read latency must be predictable: dashboards, batch refreshes, ad-hoc analytics. Most teams run both: MoR for raw streams, CoW for curated analytics tables. The bottom line: With CoW + MoR in RisingWave, you get a streaming compute engine and Iceberg writer in one, with full compatibility across Spark, Trino, and DuckDB. Your lakehouse fits your workload, not the other way around. Building a streaming lakehouse with RisingWave? Join the community: go.risingwave.com/slack #ApacheIceberg #Lakehouse #StreamProcessing #DataEngineering #RisingWave
-
-
Why Apache Iceberg is the future of data lakes? In the past, data lakes didn’t fail because of storage. They fail because tables were never really "tables". Hive-style lakes rely on file paths, partitions, and external coordination, which breaks when you have: multiple writers multiple engines changing schemas petabyte-scale metadata Apache Iceberg fixes this by bringing real table semantics to object storage: ACID transactions (safe concurrent writes) Time travel and rollback (snapshots) Fast planning at scale (manifests and metadata indexing) Schema evolution (add or rename columns without rewrites) Hidden partitioning (no manual partition traps) Multi-engine interoperability (Spark, Flink, Trino, RisingWave, etc.) Iceberg turns your lake from: a pile of files and scripts into a transactional, warehouse-like platform. If your lake needs: Strong consistency Streaming + batch Multiple engines Long-term evolution Then, build your data lake with Apache Iceberg. Want to build a streaming lakehouse? RisingWave lets you build one with Postgres simplicity that natively supports Apache Iceberg.
-
-
How SHOPLINE Delivered Customer-Facing Real-Time Order Analytics with RisingWave? It sounds obvious: just make real-time analytics fast. SHOPLINE didn’t just do that. They rethought the entire architecture. And that decision matters more than you think. An in-depth look at their real-time analytics strategy. Covering: Architecture decisions Tradeoffs System design evolution Operational impact SHOPLINE is a commerce platform where analytics directly impact merchant decisions. So the expectation is simple: “Why not compute analytics in real time on the database?” They tried. Their system evolved into a Lambda setup with separate batch and real-time layers. Their decision: Do not scale Lambda further. Unify everything into one continuous transformation layer. Real-time analytics is not just about fast queries. It affects data consistency, latency, cost, and operational complexity. Lambda created duplication. Every metric existed twice, once in batch and once in real time. This slowed iteration and increased maintenance. At the same time, analytics became user-facing. Dashboards and APIs depended on fresh data, but heavy joins and aggregations were still running on the application database. As load grew, latency became unpredictable and impacted user experience. Cost also increased. The application database handled both transactional and analytical workloads, creating scaling pressure. So SHOPLINE changed the approach. Do not compute at query time. Pre-compute continuously. They adopted RisingWave as a unified SQL transformation layer for both streaming data and historical backfill. Raw data is ingested, then cleaned and joined, and finally transformed into serving-ready aggregates. These results are continuously synced to application databases and data warehouses. The same logic applies to all data. No duplication. This changes how the system behaves. Metrics are always ready. Queries are lightweight. Systems are more stable. Result: predictable latency, lower cost, simpler architecture. The impact was clear. Operationally, they removed dual pipelines and reduced complexity. Engineering moved to SQL-based development with reusable transformations, enabling faster iteration. On the serving side, pre-computed data eliminated heavy query-time work and improved responsiveness. API latency dropped by 76.7%. The takeaway: SHOPLINE did not just optimize queries. They removed expensive queries. Because real-time analytics is about where computation happens. Their choice: Unify batch and streaming Pre-compute continuously Serve instantly Focus on simplicity, consistency, and performance. 👉 Read the full blog: https://lnkd.in/dutpT5Pq Ron Xing 邢华
-