44 Hidden Flags Inside Claude Code

3 April 2026

Today we’re starting a bit differently. Some food for thought — the kind that’s sometimes needed in the AI arms race we’re all living through right now.

Earlier this week at the NextGen AI Conference there was one talk that’s hard to shake. Most of the program was what you’d expect — new models, new tools, people demoing things that’ll be outdated in three months. Good stuff, genuinely exciting. But this one was different.

It was about the people behind the data labeling. Kenyan workers hired to train ChatGPT’s content filters. They earned about $2 an hour, pulled 20-hour shifts, and spent that time labeling the content that AI models need to learn to filter out — graphic images of violence, murders, child abuse, pornography, the worst things humans do to each other. The psychological damage is real — workers have reported PTSD, nightmares, lasting trauma.

It’s the kind of story you’d expect to land if you follow AI ethics — but most readers won’t have heard of it before. And once you start digging, you learn how the system is designed. The big AI companies don’t employ these workers directly. They outsource through chains of smaller subcontractors — layer after layer — which conveniently shields them from any responsibility for what happens to the people at the bottom. It’s structured so that no one is accountable. And that’s exactly why these stories don’t reach us.

Watch the 60 Minutes investigation — it’s about 15 minutes and worth every one of them.

There aren’t easy answers here. Yes, there’s irony in writing this in an AI newsletter. That’s part of the point.

Anyway. Here’s what happened in AI this week.

Topic of the Week

The Claude Code Source Map Leak

You’ve probably seen the headlines all week — and read at least three different breakdowns. So rather than rehash what you already know, here’s a clean summary plus the details most coverage buried or got wrong.

On March 31, someone at Anthropic shipped npm package v2.1.88 without adding *.map to .npmignore. The result: a 59.8 MB JavaScript source map file went out to the public registry, exposing roughly 512,000 lines of TypeScript source code. Within hours, mirrors were up across GitHub. The initial reports flagged 35 hidden feature flags; the actual count turned out to be 44.

The discoveries inside are more interesting than the leak itself:

– KAIROS — an unreleased background agent that stays alive between sessions and can act on its own (monitor GitHub, send notifications). Named after the Greek concept of “the right moment.” Anthropic is clearly thinking about AI that doesn’t wait for you to ask. – Undercover Mode — when Anthropic employees use Claude Code on public repos, this hides all traces: no “Co-Authored-By” tags, no mentions of internal tools or unreleased models. Stealth mode for dogfooding in the wild. – Buddy — yes, someone built a Tamagotchi pet system inside Claude Code. Collectible creatures with rarity tiers and shiny variants. Not shipped, but fully built. Someone at Anthropic had fun with that one. – WTF Telemetry — a file called userPromptKeywords.ts watches for frustration words like “wtf,” “omfg,” “dammit” and logs them. No way to opt out of just this — it’s all telemetry or nothing. The most debated find by far.

Anthropic’s official response was brief: “No sensitive customer data or credentials were involved. Release packaging issue caused by human error, not a security breach.” Technically accurate — this wasn’t a hack, it was a missing line in a config file. But the real takeaway isn’t about security. It’s about what the hidden feature flags reveal: Anthropic is building toward persistent, autonomous agents that run in the background, and they’re already instrumenting frustration signals to improve the experience. The leak is embarrassing; what it shows about the roadmap is genuinely fascinating.

Reddit | VentureBeat | The Register

What the community did with it

The internet didn’t just read the code — it got to work. Someone extracted the full multi-agent orchestration system (coordinator mode, tool routing, team management) and packaged it as an open-source framework that works with any LLM — 742 upvotes on launch. Now that both Claude Code and Codex are visible, people did a proper architectural comparison: Claude Code is an interactive copilot (plans, asks for confirmation, executes step-by-step, 17 programmable hooks for governance), while Codex is an autonomous executor (delegate and it runs end-to-end). Safety works differently too — Codex locks things down at the OS kernel level, Claude Code does it at the application layer. The biggest differentiator? Claude Code’s Agent Teams — sub-agents that each get their own context window and git worktree, and can message each other mid-task.

Separately, someone reverse-engineered the binary and found two cache bugs silently 10-20x-ing API costs. Bug one: a string replacement for billing tracking can accidentally break the cache prefix. Bug two: --resume misses the cache entirely. Max 5x users went from 8 hours of work to 1 hour; Max 20x users saw usage jump from 21% to 100% in a single prompt. Workarounds: use npx instead of global install, avoid --resume, some report downgrading to v2.1.34 helps. Anthropic’s Lydia Hallie confirmed they’re actively investigating.

Multi-agent extraction | CC vs Codex | Cache bugs

Fresh Papers

“Terminal Agents Suffice for Enterprise Automation”
ServiceNow Research Read the paper

Everyone’s building MCP tool stacks right now. Custom tools for every API, elaborate integrations, carefully orchestrated pipelines. ServiceNow’s research team just tested that approach against something much simpler: an agent that writes and runs code in a terminal. Across 729 tasks on real enterprise platforms (ServiceNow, GitLab, ERPNext), the terminal agent matched web-browsing agents at 5-9x lower cost — and blew past the MCP approach entirely. ServiceNow’s own platform exposed 93 MCP tools, and agents using them still couldn’t complete basic tasks like ordering from the service catalog. The best MCP setup topped out at 55% success. Meanwhile, Claude Sonnet running terminal commands hit 72.7% at $0.56 per task, compared to $3.29 for the web agent doing the same work.

Two findings stand out beyond the headline. First, throwing documentation at agents doesn’t automatically help — reference-style API docs actually misled them. Only task-oriented guides (step-by-step “here’s how to do X”) improved performance. Second, when terminal agents saved successful task solutions as reusable “recipes” for later, accuracy went up 3.6-5.8 percentage points and costs dropped 17-44%. Skills that compound over time beat tools that don’t learn.

The practical takeaway: before building another custom integration layer, consider whether a capable coding agent with API access already solves your problem. The paper suggests that for a surprising range of enterprise tasks, it does — faster, cheaper, and with less maintenance overhead. That’s worth keeping in mind next time someone pitches a 50-tool MCP server as the answer.

New Models

TurboQuant (Google Research)
Training-free compression that squeezes KV cache down to 3 bits with negligible quality loss. The community then adapted it for model weights too. Bottom line: Qwen3.5-27B now fits on a $400 RTX 5060 Ti with 16GB VRAM — and people are running it on a MacBook Air. The blog post got 12M views; the arXiv paper has 2 citations. Says a lot. Google Research | Reddit

Gemma 4 (Google DeepMind)
Four new open-weight models, now under Apache 2.0. The 31B dense model hits 89.2% on AIME 2026 and 2,150 Codeforces ELO. The sleeper hit: E4B runs on a T4 GPU and still pulls 42.5% on AIME. Multimodal, native reasoning, runs on a Raspberry Pi. Google DeepMind | Reddit

Claude Mythos (Anthropic) — teaser
A leaked model tier called “Capybara,” sitting above Opus. “Dramatically higher” scores on coding, reasoning, and cybersecurity. Plans and executes autonomously across systems. No pricing, no release date, “very expensive to serve.” We’ll cover it when it ships. Fortune

Tools of the Week

Cline Kanban is a standalone app for CLI-agnostic multi-agent orchestration. It gives you a Kanban board where every card is a live agent task. Set up dependency linking so when a parent task completes, dependent tasks auto-kick-off. Each task gets its own terminal and git worktree. Works with Claude Code, Codex, Cline, and others. Install with npm i -g cline — local-first, no cloud needed. Cline Kanban

AI at Tenvalleys

This week’s internal pick: /ideate — a skill that turns five ML architectures into structured creative thinking modes. Breed and cross-pollinate ideas (Evolutionary), refine rough drafts from noise to clarity (Diffusion), stress-test proposals through adversarial attack loops (GAN), sharpen positioning by defining what something is NOT (Contrastive), or compress a complex argument down to one sentence (Distillation).

It’s been useful for the kind of writing that needs sharpening rather than starting from scratch — hero-section copy, positioning statements, the line that has to do a lot of work in a small space. Fifteen minutes inside the skill produces a tighter result than fifteen minutes of staring at a blank page.

If you’re building a library of in-house AI skills your team will actually use, or you’d like to see how /ideate works in practice — reach out at contact@tenvalleys.com.

Must-See

“The AI Doc: Or How I Became an Apocaloptimist” — a new full-length documentary (1h 43min) directed by Daniel Roher, who won the Oscar for “Navalny,” and produced by the team behind “Everything Everywhere All At Once.” It features Sam Altman, Dario and Daniela Amodei, and tackles the big question head-on: is AI the collapse of humanity, or our ticket to the cosmos? Sitting at 8.2 on IMDb and 87% on Rotten Tomatoes, it’s currently in US theaters only (Focus Features, since March 27) but coming to Apple TV later this year. Worth putting on your watchlist for when it hits streaming.

See you next week.

Prepared at Tenvalleys — a delivery-first AI engineering partner — by Nikola Powałka. Feedback? Email us at contact@tenvalleys.com or reach out on LinkedIn.