● BugMuncha visualizer for Headroom

connecting… Headroom: checking… Headroom

Getting started?

Before tokens

cumulative · sum of all requests

Removed tokens

during compression

After tokens

net input sent upstream

Output tokens

generated by model

Token reduction %

0.0%

tokens.savings_percent

Requests

Cache-hit rate

cached / total

Total cost avoided

$0.00

Headroom estimate · not a bill

Avg context / request

≈ context-window size (totals are cumulative across requests)

Token flow this session?

Before = gross input, split into After (sent upstream) + Removed (stripped). Output drawn to the same scale.

Output (to scale of Before)

Quota & limits subscription window?

How much of your provider rate/usage window is consumed and when it resets.

History ?

Per-period activity (not cumulative): Before/After volume, Removed tokens, and the reduction-% trend.

Before After

Removed tokens

Token reduction %

Where your commands go live destination?

The actual path each request takes: your agent, then BugMunch's relay, then Headroom, then the model provider. If a provider you didn't expect shows up here, something's routing your traffic somewhere it shouldn't.

Live request feed recent requests?

The most recent calls Headroom handled: which client sent them, where they were sent, the model, token reduction, and whether the prefix cache hit.

Orchestration grouped by turn?

How one user turn fans out into several agent requests, like sub-tasks, parallel tool calls, and retries, with cumulative token reduction across the whole turn. Grouped by Headroom's turn_id.

Proxy health?

Detected agents seen in live traffic?

Coding agents BugMunch has actually observed routing through Headroom (from each request's client tag). If an agent you use isn't here, it isn't being compressed yet. Generate its setup below.

Wrap an agent through Headroom?

Point any coding agent at Headroom, on this machine or a remote one. Pick the agent, set where Headroom is reachable, and apply the generated setup. Using something that isn't in the list, or a local model? Pick Custom / Local LLM.

Agent

Shell / OS

Headroom address (host:port reachable from the agent's machine)

For a remote setup, use the host/IP where Headroom listens, e.g. http://headroom.lan:8787 or an SSH-tunnelled http://127.0.0.1:8787. Defaults to this relay's upstream.

Install the Headroom MCP server?

Add Headroom's memory/retrieval MCP server to Claude Code or Claude Desktop. Generates the config you paste in.

Target

By model?

Per-model usage & cost (any provider). Falls back to request counts until cost data is present.

Billing basis: the usage your provider prices against PROVIDER usage object?

The provider usage components your bill is based on, as measured by Headroom. A huge “gross tokens” number is mostly cheap cache reads.

What your provider prices against, as measured by Headroom. This is not an actual invoice. Big gross token totals are mostly cache reads, billed around 10 percent of fresh input.

Cost avoided DOLLARS, not tokens?

Dollar savings only. Compression $ and prefix-cache $ have different bases from the token figures above and from each other.

Compression $

$0.00

from stripping tokens before the call

Prefix-cache discount $

$0.00

provider prefix-cache pricing

Total cost avoided

$0.00

compression + cache

These are Headroom's estimates at the provider's rates. They are not your provider's actual bill, and not a bill from BugMunch or Headroom. Compression and prefix cache dollars are separate effects with different bases.

Savings by source TOKEN counts?

How many tokens each layer removed: compression vs CLI filtering vs RTK.

Compression CLI filtering RTK

RTK (Rust Token Killer) is the shell output rewriter Headroom bundles. It trims things like git diffs, ls output, and installer spew before the model sees them. Its bar moves when it is actually rewriting CLI output. A zero can mean little shell output came through, or that Headroom is set to a different context tool such as lean-ctx. Compression is the proxy side layer that handles everything else.

Memory store Headroom caches?

Headroom's on-box stores: compression cache, semantic cache, request log, batch context. Entries, size, and hit rates per store.

What-if calculator sandbox?

Estimate cost for a hypothetical task. Rates pre-fill from a model's observed usage where available, and you can edit any field. It is pure arithmetic, nothing is sent anywhere.

Model

Input tokens

Output tokens

Compression reduction %

Cache-hit rate %

Input $ / 1M tok

Output $ / 1M tok

Log player replay a saved log?

If you've turned on usage logging (Settings, under "usage logging"), BugMunch writes a small JSONL or CSV line every interval. Drop one of those files here to scrub back through it as charts and a table: token use, cost, and reduction over time. It reads the file right in the browser, so you can do it on a laptop with nothing else running.

Choose or drop a .jsonl / .csv usage log…

What this is

BugMunch is a dashboard for Headroom, a proxy that sits between your coding agent and the model provider and compresses the context before it's sent. Headroom does the saving. BugMunch just reads its stats and shows you what happened: how many tokens got stripped, what it saved you, where your requests actually went, and how it's trending.

It's a single page served by a small Python relay. The relay reads stats from Headroom: counts and timings, not your API keys or message content.

Run it

You need Headroom running first. It's the thing BugMunch reads from. Then point BugMunch at it and open the page:

HEADROOM_URL=http://127.0.0.1:8787 python3 server.py
# then open http://127.0.0.1:8081

Nothing shows up until traffic flows through Headroom. Route an agent at it. The Agents tab generates the exact line for your shell. For Claude Code it's just:

export ANTHROPIC_BASE_URL=http://127.0.0.1:8787

Headroom on another machine? Set HEADROOM_URL to wherever it lives (or tunnel to it) and restart. There's no build step and nothing to install beyond Python.

Local models work too. Headroom can route to local backends (Ollama, LM Studio, vLLM, anything OpenAI-compatible) the same way it routes to a cloud provider. You don't configure anything in BugMunch for this: it reads providers and models straight from Headroom's stats, so a local model just shows up in By model and Routing once traffic flows. The one thing to expect is that most local servers don't publish a usage quota, so the Quota & limits panel may sit empty for them.

The tabs

Dashboard: the at-a-glance view: token reduction, cost avoided, cache-hit rate, the token-flow bar, and history over time.
Routing: where your requests actually go (agent, relay, Headroom, provider), a live feed of recent calls, and the per-turn view that groups one prompt's fan-out of sub-requests together.
Agents: which agents BugMunch sees in your traffic, copy-paste setup to route each one through Headroom, the Headroom MCP install, and a per-model breakdown.
Cost: the dollar side: what compression and the prefix cache each saved, your provider quota/limits, and a what-if calculator for hypothetical tasks.
Cache: the token mechanics: the provider usage object your bill is based on, which layer stripped tokens, and Headroom's own caches.
Replay: load a usage log you saved earlier and scrub back through it.
Settings: theme, refresh rate, turning panels on/off, export/backup, logging, and the footprint/trust readout.

The numbers, decoded

Before / Removed / After / Output: Before is the gross input your agent sent. Removed is what compression stripped. After is what actually went upstream (Before minus Removed). Output is what the model generated. These are cumulative across every request, not a single call.
Token reduction %: Removed divided by Before. The honest token number, separate from any dollar figure.
Cache-hit rate: how often the prefix cache was reused. Cache reads are billed at roughly a tenth of fresh input, so a high hit rate is most of where the money actually goes.
Billing basis (Cache tab): the provider's own usage object: fresh/uncached tokens, cache reads, and cache writes (5-minute and 1-hour). A scary-looking "gross tokens" number is mostly cheap cache reads.
Compression $ vs prefix-cache $: two different effects with different math. BugMunch keeps them apart instead of mashing them into one "savings" number, and uses Headroom's own dollar figures rather than guessing. These are estimates, not your provider's actual bill.
Orchestration / turn (Routing tab): real agents fan one prompt out into many sub-requests. Headroom tags each with a turn id, so BugMunch can group a whole turn and show the cumulative reduction across it.
Cache busts: you paid the cache-write premium and the entry got thrown out before it was reused. Wasted money, worth watching.

Config, access & addons

Config: everything's optional. Copy config.sample.json to config.json next to server.py, or use env vars (env wins). It covers bind/port, upstream, branding, logging, and read-only extra endpoints.

Access: BugMunch has no login of its own. Run it on loopback, over a tunnel, or behind a firewall (default). To put it on a network it refuses to bind openly unless you set BUGMUNCH_ALLOW_OPEN=1. For real multi-user access, front it with your own reverse-proxy login (oauth2-proxy, Authelia, Caddy, Cloudflare Access...) and set BUGMUNCH_AUTH=forward-auth. It trusts the identity header your proxy injects, only from a proxy IP you pin. Don't expose it without one of those.

Addons: drop a .js/.css into addons/, list it in config.json, and use the window.BugMunch API (register_panel, on_data, plus format helpers). It loads same-origin under the strict CSP, no core edits needed.

Demo, export & troubleshooting

Demo: add ?demo to the URL, or open it with no backend reachable, and it loads the sample fixtures in demo/. That's how the hosted demo runs.
Export / backup: Settings has Export JSON (a full snapshot), Export requests (CSV), and Purge (clears this browser's settings only, your Headroom data is untouched).
Everything's zero? No traffic has gone through Headroom yet, or BugMunch is pointed at the wrong HEADROOM_URL. Check the status dot in the header.
A panel looks empty: some panels depend on data your Headroom build/plan exposes (quota windows, per-layer savings). That's expected, not a bug.
Console mentions a blocked inline script: that's the strict CSP blocking a browser extension, not BugMunch (the page ships no inline scripts). It's harmless.

About BugMunch

A small, free dashboard for Headroom, the local LLM context-compression proxy. It reads Headroom's stats and shows what is actually happening. By SparkBugz, under the MIT license.

Footprint

A stdlib Python 3 relay (no pip install) plus vanilla JavaScript. These numbers are computed live from the files on disk.

Settings

Click any ? for details on what a setting does.

Theme

Refresh interval (seconds)?

How often the dashboard re-polls Headroom. Default 5.

Upstream Headroom URL?

Set server-side via the HEADROOM_URL env var. To watch a Headroom on another machine, point the relay there and restart.

Security & remote access?

Shell-output rewriting (RTK)

Headroom can trim CLI/shell output with RTK (Rust Token Killer) before it reaches the model. To turn it on, install RTK where Headroom runs and set Headroom's context tool to rtk (see the RTK and Headroom docs).

Usage logging destination?

Ship each usage snapshot somewhere durable, either a local file or a remote HTTP collector. Configure it here and apply the generated setup. The relay on the host does the actual shipping.

Destination

Host OS / service manager?

Local file path

Collector URL?

Want the collector authenticated? Don't enter a key here. The generated setup below includes a BUGMUNCH_LOG_TOKEN line with a placeholder. Put your own key in on the host and in the matching collector. BugMunch never handles it. Use an https collector so a key is never sent over plain http.

Format

Snapshot interval (seconds)?

Export & reset?

Grab the numbers for a spreadsheet or a backup, or wipe the settings this browser is holding.

Purge clears this browser's BugMunch settings only. Your Headroom data is separate and stays put.

Modules?

Turn panels on or off. Disabled modules are hidden everywhere and skipped on refresh. Saved in this browser.

Getting started?

Token flow this session?

Quota & limits subscription window?

History Hourly Daily Weekly Monthly ?

Where your commands go live destination?

Live request feed recent requests?

Orchestration grouped by turn?

Proxy health?

Detected agents seen in live traffic?

Wrap an agent through Headroom?

Install the Headroom MCP server?

By model?

Billing basis: the usage your provider prices against PROVIDER usage object?

Cost avoided DOLLARS, not tokens?

Savings by source TOKEN counts?

Memory store Headroom caches?

What-if calculator sandbox?

Log player replay a saved log?

What this is

Run it

The tabs

The numbers, decoded

Config, access & addons

Demo, export & troubleshooting

About BugMunch

Footprint

Settings

Security & remote access?

Shell-output rewriting (RTK)

Usage logging destination?

Export & reset?

Modules?

History ?