llama-launcher

// overview

llama-launcher turns a hand-tuned llama.cpp server into something you drive with a single command. Rather than editing a systemd unit and restarting by hand every time you want a different model or a longer context, you keep your setups as named profiles in one config.json and switch between them with lcpp use <profile>.

A profile is just a set of llama-server flags — model, context size, GPU layers, KV-cache type, batch sizes, chat template. Applying one writes those flags to the service's environment file, restarts the server, and waits for /health to come back green before reporting success. If the new profile can't load — a bad flag, not enough VRAM — it rolls back to the last working one on its own.

Everything runs as a per-user systemd service, so the daily loop never touches root: install with pipx, seed the unit once, then switch, restart, and tail logs as yourself. It stays a thin, transparent wrapper around your build — lcpp show <profile> always prints the exact command it would run — not a model registry or a proxy.

// how it works

service model

A systemd --user unit wrapping llama-server — no root in the loop, and it comes up at boot via lingering.

profiles

Named flag sets in config.json. lcpp use <profile> writes them to the unit's EnvironmentFile and restarts, gating on /health with automatic rollback.

autotune

Reads GGUF metadata + free VRAM to suggest a safe context size and GPU-layer split; --fit stays on as the runtime net, so an over-ambitious context degrades instead of OOM-ing.

watch daemon

Polls for running games — a Steam launch, a Prism/Minecraft instance, any non-llama GPU process — and frees the GPU while they run (shrink to a lighter profile or stop the server outright), then restores the previous profile on exit.

backends

Drives the llama-server already on your PATH (CUDA in practice; ROCm-aware, with an rocm-smi telemetry fallback) — it runs the binary, it doesn't build it. Profile flags are verified against the build you have, and lcpp doctor warns when an option has drifted out of your binary's --help.

// screenshots

lcpp --help output listing the available commands — lcpp --help — 25+ commands across service control, profiles, and diagnostics

// stack

PythonTypersystemdllama.cpppipx

one runtime dependency · 63 tests · ruff + mypy clean

// links

GitHub mirror ↗ ← back to work