BugMuncha visualizer for Headroom

connecting… Headroom: checking… Headroom

Getting started?

Before tokens
0
cumulative · sum of all requests
Removed tokens
0
during compression
After tokens
0
net input sent upstream
Output tokens
0
generated by model
Token reduction %
0.0%
tokens.savings_percent
Requests
0
 
Cache-hit rate
0%
cached / total
Total cost avoided
$0.00
Headroom estimate · not a bill
Avg context / request
0
≈ context-window size (totals are cumulative across requests)

Token flow this session?

Before = gross input, split into After (sent upstream) + Removed (stripped). Output drawn to the same scale.

Output (to scale of Before)

Quota & limits subscription window?

How much of your provider rate/usage window is consumed and when it resets.

History ?

Per-period activity (not cumulative): Before/After volume, Removed tokens, and the reduction-% trend.

Before After
Removed tokens
Token reduction %

Where your commands go live destination?

The actual path each request takes: your agent, then BugMunch's relay, then Headroom, then the model provider. If a provider you didn't expect shows up here, something's routing your traffic somewhere it shouldn't.

Live request feed recent requests?

The most recent calls Headroom handled: which client sent them, where they were sent, the model, token reduction, and whether the prefix cache hit.

Orchestration grouped by turn?

How one user turn fans out into several agent requests, like sub-tasks, parallel tool calls, and retries, with cumulative token reduction across the whole turn. Grouped by Headroom's turn_id.

Proxy health?

Detected agents seen in live traffic?

Coding agents BugMunch has actually observed routing through Headroom (from each request's client tag). If an agent you use isn't here, it isn't being compressed yet. Generate its setup below.

Wrap an agent through Headroom?

Point any coding agent at Headroom, on this machine or a remote one. Pick the agent, set where Headroom is reachable, and apply the generated setup. Using something that isn't in the list, or a local model? Pick Custom / Local LLM.

For a remote setup, use the host/IP where Headroom listens, e.g. http://headroom.lan:8787 or an SSH-tunnelled http://127.0.0.1:8787. Defaults to this relay's upstream.

Install the Headroom MCP server?

Add Headroom's memory/retrieval MCP server to Claude Code or Claude Desktop. Generates the config you paste in.

By model?

Per-model usage & cost (any provider). Falls back to request counts until cost data is present.

Billing basis: the usage your provider prices against PROVIDER usage object?

The provider usage components your bill is based on, as measured by Headroom. A huge “gross tokens” number is mostly cheap cache reads.

What your provider prices against, as measured by Headroom. This is not an actual invoice. Big gross token totals are mostly cache reads, billed around 10 percent of fresh input.

Cost avoided DOLLARS, not tokens?

Dollar savings only. Compression $ and prefix-cache $ have different bases from the token figures above and from each other.

Compression $
$0.00
from stripping tokens before the call
Prefix-cache discount $
$0.00
provider prefix-cache pricing
Total cost avoided
$0.00
compression + cache

These are Headroom's estimates at the provider's rates. They are not your provider's actual bill, and not a bill from BugMunch or Headroom. Compression and prefix cache dollars are separate effects with different bases.

Savings by source TOKEN counts?

How many tokens each layer removed: compression vs CLI filtering vs RTK.

Compression CLI filtering RTK

RTK (Rust Token Killer) is the shell output rewriter Headroom bundles. It trims things like git diffs, ls output, and installer spew before the model sees them. Its bar moves when it is actually rewriting CLI output. A zero can mean little shell output came through, or that Headroom is set to a different context tool such as lean-ctx. Compression is the proxy side layer that handles everything else.

Memory store Headroom caches?

Headroom's on-box stores: compression cache, semantic cache, request log, batch context. Entries, size, and hit rates per store.

What-if calculator sandbox?

Estimate cost for a hypothetical task. Rates pre-fill from a model's observed usage where available, and you can edit any field. It is pure arithmetic, nothing is sent anywhere.

Log player replay a saved log?

If you've turned on usage logging (Settings, under "usage logging"), BugMunch writes a small JSONL or CSV line every interval. Drop one of those files here to scrub back through it as charts and a table: token use, cost, and reduction over time. It reads the file right in the browser, so you can do it on a laptop with nothing else running.

Choose or drop a .jsonl / .csv usage log…

What this is

BugMunch is a dashboard for Headroom, a proxy that sits between your coding agent and the model provider and compresses the context before it's sent. Headroom does the saving. BugMunch just reads its stats and shows you what happened: how many tokens got stripped, what it saved you, where your requests actually went, and how it's trending.

It's a single page served by a small Python relay. The relay reads stats from Headroom: counts and timings, not your API keys or message content.

Run it

You need Headroom running first. It's the thing BugMunch reads from. Then point BugMunch at it and open the page:

HEADROOM_URL=http://127.0.0.1:8787 python3 server.py
# then open http://127.0.0.1:8081

Nothing shows up until traffic flows through Headroom. Route an agent at it. The Agents tab generates the exact line for your shell. For Claude Code it's just:

export ANTHROPIC_BASE_URL=http://127.0.0.1:8787

Headroom on another machine? Set HEADROOM_URL to wherever it lives (or tunnel to it) and restart. There's no build step and nothing to install beyond Python.

Local models work too. Headroom can route to local backends (Ollama, LM Studio, vLLM, anything OpenAI-compatible) the same way it routes to a cloud provider. You don't configure anything in BugMunch for this: it reads providers and models straight from Headroom's stats, so a local model just shows up in By model and Routing once traffic flows. The one thing to expect is that most local servers don't publish a usage quota, so the Quota & limits panel may sit empty for them.

The tabs

The numbers, decoded

Config, access & addons

Config: everything's optional. Copy config.sample.json to config.json next to server.py, or use env vars (env wins). It covers bind/port, upstream, branding, logging, and read-only extra endpoints.

Access: BugMunch has no login of its own. Run it on loopback, over a tunnel, or behind a firewall (default). To put it on a network it refuses to bind openly unless you set BUGMUNCH_ALLOW_OPEN=1. For real multi-user access, front it with your own reverse-proxy login (oauth2-proxy, Authelia, Caddy, Cloudflare Access...) and set BUGMUNCH_AUTH=forward-auth. It trusts the identity header your proxy injects, only from a proxy IP you pin. Don't expose it without one of those.

Addons: drop a .js/.css into addons/, list it in config.json, and use the window.BugMunch API (register_panel, on_data, plus format helpers). It loads same-origin under the strict CSP, no core edits needed.

Demo, export & troubleshooting

About BugMunch

A small, free dashboard for Headroom, the local LLM context-compression proxy. It reads Headroom's stats and shows what is actually happening. By SparkBugz, under the MIT license.

Footprint

A stdlib Python 3 relay (no pip install) plus vanilla JavaScript. These numbers are computed live from the files on disk.


  

Settings

Click any ? for details on what a setting does.

How often the dashboard re-polls Headroom. Default 5.
Set server-side via the HEADROOM_URL env var. To watch a Headroom on another machine, point the relay there and restart.

Security & remote access?

Shell-output rewriting (RTK)

Headroom can trim CLI/shell output with RTK (Rust Token Killer) before it reaches the model. To turn it on, install RTK where Headroom runs and set Headroom's context tool to rtk (see the RTK and Headroom docs).

Usage logging destination?

Ship each usage snapshot somewhere durable, either a local file or a remote HTTP collector. Configure it here and apply the generated setup. The relay on the host does the actual shipping.

Want the collector authenticated? Don't enter a key here. The generated setup below includes a BUGMUNCH_LOG_TOKEN line with a placeholder. Put your own key in on the host and in the matching collector. BugMunch never handles it. Use an https collector so a key is never sent over plain http.

Export & reset?

Grab the numbers for a spreadsheet or a backup, or wipe the settings this browser is holding.

Purge clears this browser's BugMunch settings only. Your Headroom data is separate and stays put.

Modules?

Turn panels on or off. Disabled modules are hidden everywhere and skipped on refresh. Saved in this browser.