We just open-sourced LLMesh.
For the past year we have been building an AI orchestration platform for product teams. At that scale, you can't run every inference call through a cloud API. The cost is wrong, the privacy is wrong, and the latency is wrong. So we built our own infrastructure: a distributed inference broker that pools local hardware into a single endpoint.
The problem it solves is simple. Your app points at localhost. It works. You push to staging. It breaks. You switch to a cloud API, start paying for tokens you didn't want to pay for, and send data you didn't intend to share.
LLMesh sits between your application and your compute. Your app always hits one endpoint. Your hardware — laptops, GPU boxes, workstations — connects from wherever it is. OpenAI and Anthropic API compatible, so it's a drop-in replacement. No config changes across environments. No data leaving your infrastructure.
Think of it as nginx for LLM inference.
We open-sourced it because the defensive moat on infrastructure isn't deep enough to justify keeping it closed. The value is in the community, the feedback, and the credibility. MIT licensed. Zero vendor lock-in.
If you're running Ollama, vLLM, or MLX locally and managing multiple machines, this is for you.
https://llmesh.net
https://lnkd.in/g3RQd-Mq
Very very cracked team 😎