Mercor’s cover photo
Mercor

Mercor

Software Development

San Francisco, California 694,862 followers

Defining the future of work

About us

Mercor is defining the future of work. We connect human expertise with leading AI labs and enterprises to train frontier models.

Website
mercor.com
Industry
Software Development
Company size
51-200 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2023

Locations

Employees at Mercor

Updates

  • View organization page for Mercor

    694,862 followers

    Kimi K2.6 from Kimi (Moonshot AI) scores 27.9% at pass@1 on APEX-Agents AA from Artificial Analysis. The scores are evaluated on 452 of the 480 public tasks from our benchmark for long-horizon professional work in investment banking, management consulting, and corporate law. K2.6 (27.9%) is a substantial improvement over K2.5 (11.5%), putting it within 5 points of GPT-5.4 (xhigh) and Claude Opus 4.6 (Max) on professional services work.

    • No alternative text description for this image
  • View organization page for Mercor

    694,862 followers

    Anthropic Claude Opus 4.7 (Max) is only the second model ever to cross 50% mean score on APEX-Agents, our benchmark for complex, long-horizon professional work in investment banking, corporate law, and management consulting. GPT 5.4 was first. Opus 4.7 is second. It places 3rd overall on the leaderboard at 33.9% Pass@1, and tops the investment banking leaderboard at 37.2%, beating out GPT 5.2 (xHigh). The most interesting finding is that Opus 4.7 thinks harder than its predecessor, and that comes at a token cost: roughly 2x Opus 4.6 at the same effort level. Check out the latest leaderboard at the link in the comments.

    • No alternative text description for this image
  • View organization page for Mercor

    694,862 followers

    Mercor is in the Forbes AI 50 for the 2nd year in a row. Thank you to our experts, customers, and team for being a part of this. Advance the frontier of AI with us. We're hiring across nearly every function. Check out our open roles at the link in the comments.

    • No alternative text description for this image
  • View organization page for Mercor

    694,862 followers

    Ayushi spent years building at the intersection of AI and healthcare, most recently as the founder of a healthcare AI startup. She knew what it felt like to search for product-market fit from the inside, and what it cost when you didn't find it. When she started thinking about what came next, she was deliberate. She wanted colleagues who understood founder life without her having to explain it. About 30% of people at Mercor are former founders. "After years of trying to build something from nothing, there is a specific energy in joining a team that's already sprinting and finding out you can keep pace." At Mercor, she's working on problems that only exist at scale, helping build the infrastructure that connects human expertise to AI advancement. Read Ayushi's story at the link in the comments.

    • No alternative text description for this image
  • View organization page for Mercor

    694,862 followers

    We are excited to announce our collaboration with Artificial Analysis on APEX-Agents-AA — an independent, live leaderboard evaluating AI agents on the professional tasks that knowledge workers do every day. The leaderboard is built on APEX-Agents, Mercor's open-source benchmark of 480 tasks across investment banking, management consulting, and corporate law — including tool implementations, rubrics, and grading workflows, all available to the community for evaluation and training. Artificial Analysis runs a subset of these tasks through their open-source Stirrup harness, providing a reproducible, independent baseline that any team can verify and build on. APEX-Agents-AA results: 🥇 GPT-5.4: 33.3% 🥈 Claude Opus 4.6: 33.0% 🥉 Gemini 3.1 Pro Preview: 32.0% The top three frontier models are separated by just 1.3 percentage points. The leaderboard will update with key model releases. Check it out at the link in the comments.

    • No alternative text description for this image
  • View organization page for Mercor

    694,862 followers

    The privacy and security of our customers and contractors is foundational to everything we do at Mercor. We recently identified that we were one of thousands of companies impacted by a supply chain attack involving LiteLLM. Our security team moved promptly to contain and remediate the incident. We are conducting a thorough investigation supported by leading third-party forensics experts. We will continue to communicate with our customers and contractors directly as appropriate and devote the resources necessary to resolving the matter as soon as possible. 

  • View organization page for Mercor

    694,862 followers

    Does Training on APEX-Agents Dev Set Generalize Beyond the Benchmark? Applied Compute post-trained GLM-4.7 on ~2,000 expert Mercor tasks and achieved state-of-the-art legal performance on APEX-Agents. We then evaluated that model, AC-Small, on benchmarks outside its training distribution. On GDPVal, AC-Small's win+tie rate rose from 55.0% to 62.7% (+7.7pp), placing it 5th overall and ahead of Opus 4.5. To understand where the gain came from, we ran two ablations: On Toolathalon, AC-Small improved by +8.0pp, from 26.5% to 34.6%. On APEX, which removes tool use and agent loops, AC-Small moved up seven spots, beating Opus 4.5, Sonnet 4.5, and Grok 4. The biggest surprise was medicine. AC-Small placed 4th at 64.8%, ahead of GPT 5.4, Gemini 3.1 Pro, and o3, despite zero medical tasks in training. The gains appear to come from stronger procedural discipline: preserving sub-details, checking intermediate outputs, and catching logical errors. Read more at the links in the comment.

    • No alternative text description for this image
  • View organization page for Mercor

    694,862 followers

    "The most important problem in the world is what we do all day for work and how the knowledge work economy operates." - Brendan Foody, at Upfront Ventures Summit. Brendan sat down with Sundeep Peechu of Felicis to talk about the future of work, what's blocking enterprise AI, and why humans become more valuable as AI advances. Watch the full video at the link in the comments.

  • Mercor reposted this

    Traditional coding benchmarks do not reflect how software is actually built and maintained. That's why we built a new benchmark, APEX-SWE, in partnership with Cognition. It measures whether AI models can perform complex, real-world software engineering work to ship systems that work and debug them when they don't. APEX-SWE Leaderboard | Pass@1 🥇OpenAI GPT-5.3 Codex (High) at 41.5% 🥈Anthropic Opus 4.6 (High) at 40.5% 🥉Anthropic Opus 4.5 (High) at 38.7% Every frontier model fails on nearly 60% of real production tasks.

Similar pages

Browse jobs

Funding

Mercor 4 total rounds

Last Round

Series C

US$ 350.0M

See more info on crunchbase