Before = gross input, split into After (sent upstream) + Removed (stripped). Output drawn to the same scale.
How much of your provider rate/usage window is consumed and when it resets.
Per-period activity (not cumulative): Before/After volume, Removed tokens, and the reduction-% trend.
The actual path each request takes: your agent, then BugMunch's relay, then Headroom, then the model provider. If a provider you didn't expect shows up here, something's routing your traffic somewhere it shouldn't.
The most recent calls Headroom handled: which client sent them, where they were sent, the model, token reduction, and whether the prefix cache hit.
How one user turn fans out into several agent requests, like sub-tasks, parallel tool calls, and retries, with cumulative token reduction across the whole turn. Grouped by Headroom's turn_id.
Coding agents BugMunch has actually observed routing through Headroom (from each request's client tag). If an agent you use isn't here, it isn't being compressed yet. Generate its setup below.
Point any coding agent at Headroom, on this machine or a remote one. Pick the agent, set where Headroom is reachable, and apply the generated setup. Using something that isn't in the list, or a local model? Pick Custom / Local LLM.
http://headroom.lan:8787 or an SSH-tunnelled http://127.0.0.1:8787. Defaults to this relay's upstream.Add Headroom's memory/retrieval MCP server to Claude Code or Claude Desktop. Generates the config you paste in.
Per-model usage & cost (any provider). Falls back to request counts until cost data is present.
The provider usage components your bill is based on, as measured by Headroom. A huge “gross tokens” number is mostly cheap cache reads.
What your provider prices against, as measured by Headroom. This is not an actual invoice. Big gross token totals are mostly cache reads, billed around 10 percent of fresh input.
Dollar savings only. Compression $ and prefix-cache $ have different bases from the token figures above and from each other.
These are Headroom's estimates at the provider's rates. They are not your provider's actual bill, and not a bill from BugMunch or Headroom. Compression and prefix cache dollars are separate effects with different bases.
How many tokens each layer removed: compression vs CLI filtering vs RTK.
RTK (Rust Token Killer) is the shell output rewriter Headroom bundles. It trims things like git diffs, ls output, and installer spew before the model sees them. Its bar moves when it is actually rewriting CLI output. A zero can mean little shell output came through, or that Headroom is set to a different context tool such as lean-ctx. Compression is the proxy side layer that handles everything else.
Headroom's on-box stores: compression cache, semantic cache, request log, batch context. Entries, size, and hit rates per store.
Estimate cost for a hypothetical task. Rates pre-fill from a model's observed usage where available, and you can edit any field. It is pure arithmetic, nothing is sent anywhere.
If you've turned on usage logging (Settings, under "usage logging"), BugMunch writes a small JSONL or CSV line every interval. Drop one of those files here to scrub back through it as charts and a table: token use, cost, and reduction over time. It reads the file right in the browser, so you can do it on a laptop with nothing else running.
.jsonl / .csv usage log…BugMunch is a dashboard for Headroom, a proxy that sits between your coding agent and the model provider and compresses the context before it's sent. Headroom does the saving. BugMunch just reads its stats and shows you what happened: how many tokens got stripped, what it saved you, where your requests actually went, and how it's trending.
It's a single page served by a small Python relay. The relay reads stats from Headroom: counts and timings, not your API keys or message content.
You need Headroom running first. It's the thing BugMunch reads from. Then point BugMunch at it and open the page:
HEADROOM_URL=http://127.0.0.1:8787 python3 server.py # then open http://127.0.0.1:8081
Nothing shows up until traffic flows through Headroom. Route an agent at it. The Agents tab generates the exact line for your shell. For Claude Code it's just:
export ANTHROPIC_BASE_URL=http://127.0.0.1:8787
Headroom on another machine? Set HEADROOM_URL to wherever it lives (or tunnel to it) and restart. There's no build step and nothing to install beyond Python.
Local models work too. Headroom can route to local backends (Ollama, LM Studio, vLLM, anything OpenAI-compatible) the same way it routes to a cloud provider. You don't configure anything in BugMunch for this: it reads providers and models straight from Headroom's stats, so a local model just shows up in By model and Routing once traffic flows. The one thing to expect is that most local servers don't publish a usage quota, so the Quota & limits panel may sit empty for them.
Config: everything's optional. Copy config.sample.json to config.json next to server.py, or use env vars (env wins). It covers bind/port, upstream, branding, logging, and read-only extra endpoints.
Access: BugMunch has no login of its own. Run it on loopback, over a tunnel, or behind a firewall (default). To put it on a network it refuses to bind openly unless you set BUGMUNCH_ALLOW_OPEN=1. For real multi-user access, front it with your own reverse-proxy login (oauth2-proxy, Authelia, Caddy, Cloudflare Access...) and set BUGMUNCH_AUTH=forward-auth. It trusts the identity header your proxy injects, only from a proxy IP you pin. Don't expose it without one of those.
Addons: drop a .js/.css into addons/, list it in config.json, and use the window.BugMunch API (register_panel, on_data, plus format helpers). It loads same-origin under the strict CSP, no core edits needed.
?demo to the URL, or open it with no backend reachable, and it loads the sample fixtures in demo/. That's how the hosted demo runs.HEADROOM_URL. Check the status dot in the header.A small, free dashboard for Headroom, the local LLM context-compression proxy. It reads Headroom's stats and shows what is actually happening. By SparkBugz, under the MIT license.
A stdlib Python 3 relay (no pip install) plus vanilla JavaScript. These numbers are computed live from the files on disk.
Click any ? for details on what a setting does.
HEADROOM_URL env var. To watch a Headroom on another machine, point the relay there and restart.Headroom can trim CLI/shell output with RTK (Rust Token Killer) before it reaches the model. To turn it on, install RTK where Headroom runs and set Headroom's context tool to rtk (see the RTK and Headroom docs).
Ship each usage snapshot somewhere durable, either a local file or a remote HTTP collector. Configure it here and apply the generated setup. The relay on the host does the actual shipping.
BUGMUNCH_LOG_TOKEN line with a placeholder. Put your own key in on the host and in the matching collector. BugMunch never handles it. Use an https collector so a key is never sent over plain http.Grab the numbers for a spreadsheet or a backup, or wipe the settings this browser is holding.
Turn panels on or off. Disabled modules are hidden everywhere and skipped on refresh. Saved in this browser.