Remember your first roaming bill shock? Two weeks in Dubai, you come home, and suddenly you're staring at a 1,000-euro phone bill instead of the usual 30. Same phone. Same behavior. Completely different billing model.
That's exactly what's happening to every company in the world right now. Your CTOs are sitting at the kitchen table thinking: "We pay 30 dollars a month for Copilot licenses." And then someone quietly opens the API invoice. It's not 30 dollars. It's 1,500. Per employee. Per month.
Andrej Karpathy — OpenAI co-founder, ex-Tesla AI chief — just put it bluntly in a recent post:
"90% of your AI bill is for context you never actually need." Imagine you're building a house for 100,000 dollars. The contractor says: "Malcolm, that'll be 1 million." — "Why 10× more?" — "Well, the context..."
That's what your company is doing with every single AI query.
📚 How we got here
2022-2023: Prompt Engineering. Salaries 200,000-500,000 dollars. "Please and thank you," "think step by step," Chain of Thought. Some of it still works today.
2024: The "Prompt Engineer" job title disappears. Karpathy introduces Context Engineering — the delicate art of giving the AI the right information in the right context window.
2026: We now need Prompt Engineering 2.0 — not for better answers, but for answers that are 10× cheaper.
🔧 Eight measurable token levers nobody in mid-market uses
Chunking — split large documents into semantic chunks instead of burning 100 PDFs in one query
Grab-before-Fetch — tell the AI exactly which book to pull from the library instead of letting it read 100
Prompt Caching — with stable prefix instructions, you pay only 10% (Anthropic). First cache write costs 90%, every reuse 10%. On a 17-page compliance brief = massive lever.
Skill.MD / Agent.MD — work instructions for the AI. Karpathy did the math: without Skill.MD = 4 dollars per session. With Skill.MD = 30 cents. Factor 13.
Compaction — manually compact long sessions yourself, don't wait for the AI to do it. Works in Claude Code, Codex, etc.
Model Routing — Haiku $5/1M tokens (classification, formatting), Sonnet $15 (code review), Opus $25+ (architecture). Don't drive the Bugatti to the grocery store.
Change your default model — your devs have the most expensive model set as default. Sonnet is enough in 85% of cases.
Auto-Context-Loading + Prompt-Audits by a second AI = automatic context-bloat killer
🚦 The electricity-bill analogy for your board
Private life: 20-dollar lightbulb. If you leave it on 24 hours, it doesn't matter. Electric bill 800 or 850 — who cares.
Now scale it up: factory floor. 50,000 lights. Three-shift operation. Plus machines, server room. Suddenly 5 million dollars in electricity. That's your AI bill in 2026. You spent two years buying AI without installing the meter.
If I walk in as a consultant and say "1-million-dollar project to optimize your prompts" — and you go from 5 million to 500,000? That's factor 10. From 4 million in savings, I'd happily take 1 million.
📟 Cloud-Meter — the physical electricity meter for your AI
Someone built a small cube with a touchscreen that displays in real time how much money he's burning on tokens. Sits on the desk next to the laptop. GitHub repo, viral on TikTok. A human built a literal power meter for AI because he can't grasp how much he's spending in the abstract.
🎯 Three Monday actions
1. Subscription Audit: Claude Code + Codex + Cursor + Lovable Pro + ChatGPT Plus + Gemini all running in parallel? Have an AI list every duplicate spend. At werchota.ai we save thousands monthly by subscribing fast and canceling fast.
2. Build Skill.MDs: The moment you do a process twice, write a Skill.MD. We have a GitHub Skill Repository at werchota — every skill = better quality + 13× fewer tokens.
3. Change the default model: Open Claude / Codex / Cursor, switch the default model to Sonnet (or smaller). You'll hit "max out" less often — and you can work much longer per session.
💬 The question every board needs to answer
"How much does one token cost us?"
Your CFO knows the electricity bill. Knows the gold price. Knows the price of gasoline. Knows the price of milk at the supermarket. They don't know the token price. And they don't yet know they should know it.
That's the new language we have to learn. AI-language. First mover wins.
⏱️ Timestamps
00:00 — Cold open: The 1,000-dollar Dubai roaming bill
03:30 — Two worlds: private flat-rate vs. enterprise API
06:00 — Karpathy: 90% of your AI bill is wasted context
08:30 — Retro: Prompt Engineering 2022 → Context Engineering 2024 → Prompt Engineering 2.0
13:00 — Chunking + Grab-before-Fetch
16:00 — Prompt Caching: 10% instead of 100%
19:00 — Skill.MD / Agent.MD — Factor 13
22:00 — Compaction
25:00 — Electricity bill analogy: 5M in token costs with no meter
28:00 — Cloud-Meter — the physical token meter
30:00 — Model Routing: Haiku / Sonnet / Opus — Skoda, Ferrari, Bugatti
33:00 — Three Monday actions: Subscription Audit, Skill.MDs, Default Model
37:00 — The question for every board: "How much does one token cost us?"
🎙️ About the Host
Malcolm Werchota runs AI adoption programs for companies across Europe. After 15+ years at Novartis and Schlumberger, today's focus: AI without the bullshit. Lecturer at ESADE and HSLU. Studied in Leoben.
🚀 Resources for Executives
📚 Chief AI Academy — AI for Decision Makers
👥 AI Leadership Community
🌐 werchota.ai
📬 Contact
LinkedIn: linkedin.com/in/malcolmwerchota
Email:
[email protected]📰 Sources
Andrej Karpathy — recent X/Twitter post on Context Engineering & Skill.MD factor 13
Anthropic — Prompt Caching Pricing (10/90 split)
Anthropic — Model pricing Haiku / Sonnet 4.6 / Opus 4.7
GitHub — Cloud-Meter open-source project (viral on TikTok)
Werchota.ai — internal Skill Repository & Subscription Audit workflow
Tags: #PromptEngineering #ContextEngineering #Karpathy #Anthropic #Claude #ClaudeCode #Codex #Tokens #AICost #PromptCaching #SkillMD #ModelRouting #CFO #CTO #werchota #ChiefAIAcademy #TheAICookbookShow