mesh-llm turns spare compute into a peer-to-peer inference cloud for open models.

mesh-llm pools GPUs across macOS and Linux machines so teams, researchers, and agents can run local or open-weight models through one OpenAI-compatible endpoint. It can serve a model on one node, distribute large models across nearby peers, route requests to specialized models, and let agents coordinate through mesh gossip.

What it is for

Share spare GPU capacity across trusted machines.
Run open models locally without a centralized inference provider.
Serve an OpenAI-compatible API at http://localhost:9337/v1.
Route requests across multiple nodes, models, and capabilities.
Experiment with distributed inference, MoE expert sharding, and agent collaboration.

see: https://docs.anarchai.org/ and: https://github.com/mesh-LLM/

Mesh uses a pipelined/network aware distributed inference approach built on llama.cpp called "skippy" - https://github.com/Mesh-LLM/hf-mesh-skippy-splitter contains current code which prepares models so layers can be efficiently JIT downloaded for participating nodes.