Running llama.cpp across multiple CPU nodes on Discoverer: possibilities and expectations
Most people running llama.cpp are familiar with its single-node CPU mode, where inference is spread across cores using multithreading. What is less commonly known is that llama.cpp can also be distributed across multiple machines — but understanding what that actually means in practice is essential before building a cluster setup around it. The built-in RPC backend llama.cpp includes an RPC feature that connects multiple nodes over TCP. A master node holds the model file and coordinates inference, while worker nodes…