Caveman: Cut Claude Code's Token Bill Without Losing the Plot
A skill that makes Claude terse on purpose. Around 65% fewer output tokens, full technical accuracy, and a set of commands for commits, reviews, and memory compression.
View on GitHub
AI & Web Consultant · June 7, 2026
Claude is verbose by default. It explains, it hedges, it writes three sentences where one would do. That politeness is nice to read and expensive to pay for, because every extra word is output tokens you are billed for.
Caveman is the fix. It is a Claude skill with one job: make the model terse on purpose while keeping every bit of technical substance.
How it works
Caveman injects compression rules into the agent so it answers in tight, fragment-style language instead of full prose. You pick how hard it pushes.
- lite for a gentle trim
- full for the default everyday mode
- ultra when you want it as compact as possible
- wenyan for a classical Chinese variant, if you want to go extreme
It also ships handy commands. /caveman-commit writes concise commit messages, /caveman-review gives one-line PR feedback, /caveman-compress rewrites bloated memory files, and /caveman-stats shows what you have saved in real dollars.
Why I keep it on
On a long session, output tokens add up fast. Caveman quietly cuts the bill without making the answers worse. I still get the fix, just without the essay wrapped around it.
It is the rare optimization with no real downside. You lose the words you did not need and keep the ones you did.
Getting it
It auto-activates with Claude Code once installed, or you trigger it with /caveman. Follow the repo for the current install steps.
Built by Julius Brussee. A simple, honest win for anyone running Claude Code all day.
Want help implementing this?
I help B2B companies implement AI solutions that actually move metrics — not science projects. If this guide resonated, let's talk about what it looks like for your business.
Get in touch