mesh-llm turns spare compute into a peer-to-peer inference cloud for open models.

mesh-llm pools GPUs across macOS and Linux machines so teams, researchers, and agents can run local or open-weight models through one OpenAI-compatible endpoint. It can serve a model on one node, distribute large models across nearby peers, route requests to specialized models, and let agents coordinate through mesh gossip.

What it is for

see: https://docs.anarchai.org/ and: https://github.com/mesh-LLM/

Mesh uses a pipelined/network aware distributed inference approach built on llama.cpp called "skippy" - https://github.com/Mesh-LLM/hf-mesh-skippy-splitter contains current code which prepares models so layers can be efficiently JIT downloaded for participating nodes.