Recce’s cover photo
Recce

Recce

Data Infrastructure and Analytics

Helping data teams preview, validate, and ship data changes with confidence.

About us

Recce helps modern data teams preview, validate, and ship data changes with confidence. By turning pull requests into structured, context-rich reviews, Recce makes it easy to spot meaningful changes, verify intent and impact, and reduce cognitive load for authors and reviewers alike. Curate reproducible checklists that compare data across environments — so you can catch what matters, skip what doesn’t, and align your team before merging. Accelerate development, cut down on manual QA, and bring visibility, verifiability, and velocity to your data workflows.

Website
https://datarecce.io
Industry
Data Infrastructure and Analytics
Company size
2-10 employees
Headquarters
San Francisco
Type
Privately Held
Specialties
dbt, Modern Data Stack, code review, Data Engineering, SQL, Data Lineage, Query Diff, Lineage Diff, and Data Model Diff

Locations

Employees at Recce

Updates

  • Commits on the Recce repo went from 20-50 a week to 100-200+ after Claude Code adoption in November. And that's not even the interesting number. What changed is how code moves through review. A feature that used to ship in 2-3 large PRs now arrives in 17 small commits, each individually testable. The Axios-to-Fetch migration was 46 files and ~1,800 lines. Pre-agents, that was a week of work. With Claude Code: a day and a half, in small commits that could be reviewed as they landed. Reviewing forward changes things. Smaller, more isolated changes are easier to catch problems in and easier to roll back. The velocity is real. But the commit structure is what actually matters. Read about how Claude Code changed the Recce development lifecycle -> link in comments. #ClaudeCode #AIAgents #DevWorkflow #SoftwareEngineering

  • Recce reposted this

    On episode 10 of Data Renegades Podcast, CL Kao (Recce) and Dori Wilson sit down with Toby Mao (Fivetran) to talk about designing data systems that are easy to change, the realities of building for real customers versus FAANG-scale assumptions, and how AI is transforming the speed of development. The episode highlights why simplicity, architecture, and iteration matter more than ever. Tune in! https://hubs.ly/Q049NWZn0

    • No alternative text description for this image
  • Recce's MCP server now includes an `impact_analysis` tool that verifies downstream data changes. When a PR modifies a model, agents can see which downstream models actually have different data — not just which ones appear in the lineage. Also new: the Recce Agent now detects non-data PRs and skips analysis entirely. No wasted agent runs, no false-positive noise in PR reviews. And we removed Axios entirely after the npm supply-chain compromise. Recce was never affected, but we eliminated the dependency to reduce future risk. Details: https://lnkd.in/eh5BxiDE #dbt #DataQuality #MCP

  • A wiki is something you look at when you want a reference. A shared AI system is something you work through. The difference shows up over time. Style guides get written down and slowly stop getting used. Knowledge drifts. Review cycles catch the gaps. When the same guidelines live inside the workflow, the opposite happens. Every time someone writes a blog or files a bug, they're running through the shared context. Outdated entries get noticed. Gaps get filled. The knowledge stays current because it's part of the work, not adjacent to it. We packaged the shared knowledge at Recce into a Claude Code plugin: voice guidelines, product context, terminology, step-by-step workflows. One install. Every session starts from the same source of truth. The post on how we built it is in the comments. #ClaudeCode #AIProductivity #DataEngineering #AnalyticsEngineering

    • No alternative text description for this image
  • "AI in a way is going to create this wave of what I would describe as radical accountability for creators of software projects where you do not have to accept things being mediocre, things being bad anymore." Wes McKinney built pandas and Apache Arrow. Now he is shipping new open source projects with AI agents and documenting what credibility looks like when the cost of building drops to near zero. This was our conversation with Wes on Data Renegades Podcast Full episode in comments. #AIcodingagent #radicalaccountability #dataengineering #DataRenegades #opensource

  • Wes McKinney coined a development methodology by accident: book-driven development. While writing "Python for Data Analysis," he kept hitting empty sections in the book that required features pandas did not have yet. So he would stop writing, go build the feature, then come back and finish the chapter. "Pandas in 2011 was essentially book-driven development, quite literally." The book and the library were built in tandem. The result: the most-read Python data book and the most-used Python data library, each making the other better. Check out the full conversation on the Data Renegades Podcast wherever you catch your favorite podcasts. #pandasPython #opensource #bookdrivendevelopment #dataengineering #DataRenegades

  • Pandas was not built like a database. Wes McKinney knew that. He shipped it anyway. "If I had taken three years longer to get something useful into the market, to do things the right way, it would have been too late." People needed to read CSV files. They needed basic data wrangling. The fact that pandas was architecturally imperfect did not matter because it solved real problems at the right moment. The "perfect" version (something like Polars) came 10 years later, when the scale actually demanded it. Shipping early with known limitations created the most-used Python data library in the world. Check out the full conversation on the Data Renegades Podcast wherever you listen to your favorite podcasts. #pandasPython #startuplessons #dataengineering #DataRenegades #shipearly

  • AI coding tools generate wrong SQL all the time. Not syntax errors. Logic errors where the query runs, the numbers look plausible, and the dashboard updates without complaint. The fix isn't a smarter model. It's giving the model the context it needs, when it needs it. AI skills are markdown files that encode domain knowledge, workflows, and guardrails into AI coding tools. No framework. No SDK. Structured text in a repo, version-controlled like dbt docs or YAML configs. The real power is the loop. Code guided by domain rules. Review catches what the code got wrong. Handoff captures the fix into persistent context. Updated skills make the next session smarter. Every cycle compounds. At Recce, one aggregation bug turned into a permanent rule the system now enforces automatically. Seven columns across three models, fixed once and remembered forever. Dori Wilson broke down the full framework at March's Data Debug SF, from skill anatomy to the self-improving loop to scaling skills into team-wide plugins. Full writeup on our blog. Link in comments. #DataEngineering #AnalyticsEngineering #AI #dbt

    • No alternative text description for this image
  • Text-to-SQL works well on benchmarks. It falls apart on real schemas. On our Data Renegades Podcast ep 9, Wes McKinney was at the CIDR database conference where Michael Stonebreaker presented the Beaver benchmark -- a text-to-SQL eval built on actual institutional database schemas from inside MIT. Frontier models struggled to generate correct SQL, even though MIT is home to many of the researchers who trained those models. The core problem: LLMs do not understand the subtleties of table relationships. Is this a one-to-one join or one-to-many? Should this metric count distinct users or total sessions? The join semantics and counting logic that determine whether a dashboard shows the right numbers -- these are exactly where models break down. Wes sees semantic modeling as the answer. The idea: predefine table relationships and metric logic so LLMs generate queries within understood boundaries instead of free-forming SQL across complex schemas. Tools like Malloy (Lloyd Tabb's successor to LookML) are the current best implementation of this approach. Without guardrails? "Otherwise, people are just going to be pointing Claude code at their production databases and blowing their feet off basically." Ben Stancil (founder of Mode Analytics & Data Renegades guest on ep 5) calls it the "vibe and verify" revolution. Analytics and data engineering resist full vibe coding because there are too many subtleties where models fail silently. Wes agrees. And he sees this as job security for data practitioners who understand their schemas deeply. Listen wherever you catch your favorite podcasts. #texttoSQL #AI #dataanalytics #semanticmodeling #dataengineering #DataRenegades

  • Can AI coding agents build something as intricate as Apache Arrow? Not yet, says the co-creator Wes McKinnny on the Data Renegades Podcast. "Arrow is a project that has the intricacy of a fine Swiss watch. There's a lot of very small details that were created very painstakingly over a long period of time." Wes McKinney uses AI agents daily. He runs parallel Claude Code sessions and shipped two new open source projects in the last month. But he draws a sharp line at core infrastructure: file formats, processing engines, metadata management. These require deliberation and architectural nuance that current agents lack. Data infrastructure remains one of AI's hardest frontiers. Listen wherever you catch your favorite podcasts. #ApacheArrow #datainfrastructure #AIlimits #dataengineering #DataRenegades

Affiliated pages

Similar pages

Browse jobs

Funding

Recce 1 total round

Last Round

Pre seed

US$ 4.0M

See more info on crunchbase