Name: Scaling Open-Source Project FastMCP to 2M Downloads Daily | CodeRabbit posted on the topic | LinkedIn
Uploaded: 2026-04-03T16:29:16.396Z
Duration: 52 s
Channel: CodeRabbit
Description: How do you scale an open-source project to over two million downloads a day? 🤔 We sit down with Bill Easton, core maintainer of FastMCP and Director of Product at Elastic, to dive deep into the Model Context Protocol (MCP) ecosystem. Full video in the comments! 👇 https://lnkd.in/ec8rnTnd

CodeRabbit

31,026 followers

2w Edited

How do you scale an open-source project to over two million downloads a day? 🤔 We sit down with Bill Easton, core maintainer of FastMCP and Director of Product at Elastic, to dive deep into the Model Context Protocol (MCP) ecosystem. Full video in the comments! 👇 https://lnkd.in/ec8rnTnd

5 Comments

Antoni Mikalis 2w

Very very cracked team 😎

2 Reactions

Bill Easton 2w

Hendrik Krack We've since hit two+ million downloads a day which sets us up nicely for our next episode: "How do you scale an open-source project to over **two million** downloads a day?" :)

1 Reaction

Hendrik Krack 2w

Full episode here: https://youtu.be/lBjB-f_I-bs?si=ZwhE6U8JNbzIpMjl

Hendrik Krack 2w

🔥🔥🔥

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Bill Easton
2w
Report this post
A couple of weeks ago I had a fantastic chat with Hendrik Krack from CodeRabbit—we talked about what it's like to be an open-source maintainer, what it's like to work on AI, and where we think this is all going! Check it out.

CodeRabbit

31,026 followers
2w Edited

How do you scale an open-source project to over two million downloads a day? 🤔 We sit down with Bill Easton, core maintainer of FastMCP and Director of Product at Elastic, to dive deep into the Model Context Protocol (MCP) ecosystem. Full video in the comments! 👇 https://lnkd.in/ec8rnTnd

2 Comments
Like Comment
To view or add a comment, sign in
Byung-Gon Chun
2w Edited
Report this post
Same model. Inference is the differentiator. Open-weight models are advancing at an incredible pace. Great work from the teams pushing the frontier forward. Real-world outcomes come down to inference. FriendliAI leads in performance, reliability, and cost with super-effective caching. 👇 GLM 5.1 cache hit rate on OpenRouter
4 Comments
Like Comment
To view or add a comment, sign in
Abhishek Malvankar
1w
Report this post
KubeCon EU 2026 talks are live! If you're thinking about scaling inference beyond a single cluster, this session dives into federating inference servers—what it takes, what breaks, and how to design it right. A practical, forward-looking blueprint for where distributed inference is heading. #llmd #inference #AI #cloudnative #kubecon https://lnkd.in/g3eZVsr9

Federated llm-d: Elevating Distributed Inference Beyond Clus... Madhuri Yechuri & Abhishek Malvankar

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Tony Valderrama

the world’s fastest hyperscale cache 🚀 Head of Product @ Momento 🐿️
1w
Report this post
You scale the datastore and something else becomes the bottleneck. Connection limits, pooling, proxy layers. You think you’re scaling one thing, but something else caps you first. We’ve seen this show up pretty quickly once connection fanout increases. You can have plenty of headroom in the datastore, but the system in front of it starts to fall behind. There’s a talk at #UnlockedConf from Nextdoor on how they’re handling that with a RESP proxy. Worth a look if you’re dealing with similar constraints. Seattle · May 7

Unlocked 2026 | Valkey Conference | May 7, Seattle unlockedconf.io
Like Comment
To view or add a comment, sign in
Byung-Gon Chun
2w
Report this post
Google Gemma 4 is here. Open-weight models keep getting better—stronger performance, longer context, and multimodal capabilities becoming the default. But the real question isn’t just the model. It’s how well you can run it in production. At FriendliAI, we focus on high-performance, large-scale inference for open models. Same model. Different performance. 👉 Deploy Gemma 4: https://lnkd.in/gESSE4qP #Gemma #LLM #Inference #OpenModels #AIInfrastructure

google/gemma-4-26B-A4B-it - Fast, Reliable, and Scalable Inference on FriendliAI friendli.ai
Like Comment
To view or add a comment, sign in
Arko C.
2w
Report this post
🚨 Gemma 4 just dropped, marking the first time Google is shipping Gemma under a fully open commercial license with no usage restrictions. If you've been waiting to move your production workload off GPT-OSS, this might be it. > Gemma 4 31B Dense fits unquantized on a single 80GB GPU. Capable of folding two GPU serving jobs into one. > It currently sits at rank 27 on the Arena AI Text leaderboard with a score of 1452, statistically level with Claude Sonnet 4.5 at rank 25. > For the first time in the Gemma family's history, all four models ship under Apache 2.0. For teams building derivative fine-tunes on proprietary data, that license removes the legal review cycle entirely. Pipeshift (YC S24) will be dropping task-specific performance and deployment notes soon. Link to the arena in comments.
4 Comments
Like Comment
To view or add a comment, sign in
The New Claw Times

17 followers
3w
Report this post
CNET published a category-defining piece declaring "claw" a standalone computing category. Multiple vendors are now shipping their own claw variants. Full story: https://lnkd.in/ek5YkaSv

CNET Declares 'Claw' a Standalone Computing Category as Multiple Vendors Ship Alternatives to OpenClaw newclawtimes.com
Like Comment
To view or add a comment, sign in
Andrew Schwabe
2w Edited
Report this post
We just open-sourced LLMesh. For the past year we have been building an AI orchestration platform for product teams. At that scale, you can't run every inference call through a cloud API. The cost is wrong, the privacy is wrong, and the latency is wrong. So we built our own infrastructure: a distributed inference broker that pools local hardware into a single endpoint. The problem it solves is simple. Your app points at localhost. It works. You push to staging. It breaks. You switch to a cloud API, start paying for tokens you didn't want to pay for, and send data you didn't intend to share. LLMesh sits between your application and your compute. Your app always hits one endpoint. Your hardware — laptops, GPU boxes, workstations — connects from wherever it is. OpenAI and Anthropic API compatible, so it's a drop-in replacement. No config changes across environments. No data leaving your infrastructure. Think of it as nginx for LLM inference. We open-sourced it because the defensive moat on infrastructure isn't deep enough to justify keeping it closed. The value is in the community, the feedback, and the credibility. MIT licensed. Zero vendor lock-in. If you're running Ollama, vLLM, or MLX locally and managing multiple machines, this is for you. https://llmesh.net https://lnkd.in/g3RQd-Mq

LLMesh | nginx for LLM Inference llmesh.net

2 Comments
Like Comment
To view or add a comment, sign in
Manishkumar Ladva
1w
Report this post
🚀𝗟𝗼𝘄 𝗟𝗮𝘁𝗲𝗻𝗰𝘆 𝗣𝗿𝗮𝗰𝘁𝗶𝗰𝗲𝘀 𝗮𝗿𝗲 𝗲𝘀𝘀𝗲𝗻𝘁𝗶𝗮𝗹 𝘁𝗼 𝗯𝘂𝗶𝗹𝗱 𝗳𝗮𝘀𝘁, 𝗿𝗲𝘀𝗽𝗼𝗻𝘀𝗶𝘃𝗲 𝘀𝘆𝘀𝘁𝗲𝗺𝘀.🚀 Many engineers over-engineer infrastructure and ignore these battle-tested low-latency techniques used by top tech companies. Here are the guiding principles to help you reduce latency across your stack: 1. For Frequently Accessed Data – Use In-Memory Caching 2. For Slow DB Queries – Use Database Indexing 3. For High Request Volume – Use Connection Pooling 4. For Large Payloads – Use Gzip/Brotli Compression 5. For Global Users – Use CDN for Static Assets 6. For Reducing Round-Trips – Use HTTP/2 Multiplexing 7. For Network Efficiency – Use Request Grouping/Batching 8. For Heavy Background Jobs – Use Async Message Queues 9. For Distributing Load – Use Traffic Load Balancing 10. For Reducing API Delays – Avoid External Dependencies 11. For Near-User Processing – Use Edge Computing 12. For Fast Serialization – Use Protobuf or Avro 13. For Local Performance – Use Vertical Scaling 14. For Faster UX – Use Lazy Loading 15. For Load Sharing – Use Client-Side Rendering 16. For Instant Readiness – Use Prefetching Critical Resources These principles are not theory, they are the foundation of low-latency, high-performance systems at scale (Netflix, Google, Amazon).
Like Comment
To view or add a comment, sign in
Ratna Kumar Kovvuri
3w
Report this post
Envoy plugins are easy to write and easier to turn into a bottleneck. The plugin model opens up powerful possibilities for storage proxies, but threading models, buffer management, and filter chain design under real throughput are where things go wrong. At Snap's scale, getting those architectural foundations wrong isn't a performance issue, it's an incident. I'm covering the concrete patterns at Unlocked in Seattle on May 7. unlockedconf.io #Valkey #OpenSource #PerformanceEngineering #UnlockedConf

Unlocked 2026 | Valkey Conference | May 7, Seattle unlockedconf.io

2 Comments
Like Comment
To view or add a comment, sign in

31,026 followers

View Profile Connect

LinkedIn respects your privacy

Explore content categories

More Relevant Posts

Federated llm-d: Elevating Distributed Inference Beyond Clus... Madhuri Yechuri & Abhishek Malvankar

https://www.youtube.com/

Explore content categories