Claude Code Undercover Mode: The Leak

A Sourcemap Slip on NPM Exposes Everything

Researcher Chaofan Shou discovered that Anthropic had accidentally published the entire source code of Claude Code — their official CLI coding assistant — inside the source map files of the NPM package.

Source maps are debug files that bridge minified production code back to original sources. When they end up in a distributed package, they expose everything: every file, every comment, every internal constant. In this case, the sourcesContent array inside the .map file contained the entire project in plain text. The root cause is mundane: nobody added *.map to .npmignore or disabled source map generation in the Bun bundler that Claude Code uses.

The irony? Inside that code exists a system called "Undercover Mode" specifically designed to prevent internal information leaks. They built a full anti-leak subsystem and then shipped the entire source in a .map file.

How the Undercover Mode Works

Undercover Mode activates when Anthropic employees (identified by the USER_TYPE === 'ant' variable) use Claude Code on public or open-source repositories. The utils/undercover.ts file injects a very explicit directive into the agent's system prompt that categorically forbids revealing internal information in commits and pull requests.

What gets censored specifically:

Internal model codenames — internal models use animal names: Capybara, Tengu, Fennec. None of these should end up in a commit message
Unreleased versions — references like opus-4-7 or sonnet-4-8 are blocked
Internal infrastructure — Slack channels, short links like go/cc, private repository names
AI identity — the directive explicitly forbids mentioning "Claude Code" or revealing in any way that the contribution was generated by an AI. Co-Authored-By tags are also banned

The activation logic is deliberately cautious: without an explicit override via environment variable (CLAUDE_CODE_UNDERCOVER=1), the system enters undercover mode automatically unless the repository remote matches an internal allowlist. There is no force-off switch. If the system isn't 100% certain it's inside an internal repository, it stays undercover.

What This Means for Open Source

This mechanism officially confirms that Anthropic employees actively contribute to open-source repositories using Claude Code — and that the AI is instructed to hide its own existence. The question is concrete: how many commits on public projects are actually generated by masked AI agents?

From a technical standpoint, the Undercover Mode architecture is solid. Separating sensitive contexts from public ones through feature flags and conditional prompt injection is a reasonable pattern. But the leak also revealed other interesting details from the codebase:

Buddy — a full Tamagotchi system with deterministic gacha, 18 species, and shiny variants
Dream — a background memory consolidation engine that runs as a forked subagent, with a three-gate trigger system (time, sessions, lock) and four processing phases
KAIROS — a proactive always-on assistant that observes and acts autonomously
Coordinator Mode — multi-agent orchestration with parallel workers and a shared scratchpad

All these features are removed from the public build through compile-time dead code elimination. But source maps don't respect dead code elimination — and so everything ended up online.

For anyone building similar tools, the lesson is twofold: protecting the prompt is necessary, but protecting the distributed package is even more critical. A forgotten .npmignore can undo months of operational security work.

FAQs

Was the leak caused by a cyber attack?

No. It was a simple build configuration oversight: source maps were not excluded from the published NPM package.

What else emerged from the source code?

Internal systems like Buddy (a Tamagotchi pet), Dream (background memory consolidation), KAIROS (proactive assistant), and a multi-agent architecture with a coordinator and parallel workers.

Does it make sense to implement a similar system in your own projects?

Yes. Injecting prompt guardrails to prevent sensitive information leaks is a valid operational security practice for any AI agent exposed to public contexts.

claude-codeKuberwastaken

View on GitHub →

Need a consultation?

I help companies and startups build software, automate workflows, and integrate AI. Let's talk.

Get in touch