feat(proxy): thinking-block-sanitize v1 is now on by default (v4.0.0)#201
Conversation
Flips CACHE_FIX_THINKING_SANITIZE from default-off to default-on for the v1 omitted-text drop. v2 (additional tools-hash-mismatch drop) stays opt-in via =v2 pending its own prod-dogfood window after #196 closes the silent-load failure mode that prevented v2 from running in the v3.9.0 era. New gate semantics (proxy/extensions/thinking-block-sanitize.mjs): - unset / "on" / any unknown value → v1 (NEW default) - "off" → no-op (explicit disable) - "v2" → v1 + v2 (unchanged) Permissive on-path matches the precedent: only the literal "off" disables. v3.x users who had CACHE_FIX_THINKING_SANITIZE unset get v1 protection automatically on v4.0.0 upgrade; users who explicitly set =off keep the no-op behavior. Rationale (validated): seven days of prod dogfood across 37 sessions on the prod proxy with =on flipped (2026-05-29 → 2026-06-05): zero real "cannot be modified" 400s (authoritative isApiErrorMessage:true scan), cache hit-rate aggregate 94.66% vs 92.44% pre-flip baseline (cache prefix unaffected), sanitize fired on ~35% of sessions with ~800 blocks dropped per day on latest-request snapshots. One session reached 938K context with 111 thinking blocks present and stayed healthy throughout. v2 specifically NOT bundled into this flip: 1. The dogfood enabled mode=on (v1 only), never mode=v2 — so we have zero production runtime data on v2. 2. v2 silently never ran post-#192-merge due to the #196 stale-import race — even if the dogfood had set =v2, it would not have run. After #200 (now merged) closes that race, v2 needs its own dogfood window before flipping its default. Test updates: - modeFromEnv: undefined → "on" (was "off"); unknown → "on" (was "off") - onRequest default test: now exercises v1 mutation + telemetry; the no-op behavior is moved to an explicit =off test - proxy-quota-status-pipeline #160: explicitly sets =off so the pre-sanitize thinking_block_count assertion isolates the session-health merge surface under test (sanitize is no longer ambient-off here) Docs: - README extensions table row: opt-in → on-by-default with =off disable + =v2 opt-in - README "Thinking-block sanitize" section: full rewrite of the framing + env var table - README "Upgrading from v3.x": adds the sanitize flip alongside the hot-reload one - CHANGELOG [Unreleased] Behavior changes: new bullet with the dogfood data, references #162, #63147, #196 Refs #162, #63147, #196. v4.0.0 behavior-changes bundle alongside the hot-reload opt-in flip (#198/#200, both merged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Review: PR #201 thinking-block-sanitize v1 default-on
Date: 2026-06-06
Reviewed: PR #201 at 6f8d988
Round: 1
Label applied: reviewed-by-codex-agent, approved-by-codex-agent
What Is Correct
proxy/extensions/thinking-block-sanitize.mjs:202implements the intended gate exactly: the literaloffdisables,v2stays its own opt-in path, and unset /on/ unknown values resolve to v1. Given the new default-on contract, permissive unknown -> on is the safer choice because a typo cannot silently turn the mitigation off.- The v2 non-flip is clearly intentional rather than forgotten scope. The PR body states both reasons explicitly, and the in-file rationale around
proxy/extensions/thinking-block-sanitize.mjs:58andproxy/extensions/thinking-block-sanitize.mjs:195makes the same distinction in code context. - The test updates cover the behavior change in the right places.
test/proxy-thinking-block-sanitize.test.mjs:144now proves that the default path mutates and emits telemetry, whiletest/proxy-thinking-block-sanitize.test.mjs:163preserves an explicit no-op check for=off. - The
[pipeline #160]adjustment intest/proxy-quota-status-pipeline.test.mjs:122is the right isolation mechanism for a full-pipeline merge test. It keeps session-health + writer behavior under real extension ordering while avoiding accidental sanitize coupling; the separate[pipeline #162]coverage continues to exercise the post-sanitize merge path. - The committed docs are materially aligned with the v4.0.0 upgrade framing from PR #200.
README.md:218,README.md:833, andCHANGELOG.md:7all explain the default flip, the explicit disable path, and the continued=v2opt-in. - Verification matched the claims:
node --test test/proxy-thinking-block-sanitize.test.mjs test/proxy-quota-status-pipeline.test.mjspassed, andnpm testpassed999/999.
Blockers
None.
What Needs Attention
- A few repo-local wording leftovers will be inaccurate after merge even though behavior is correct:
README.md:32still says the request pipeline has "one opt-in" extension,proxy/server.mjs:295still cites sanitize as a strict=== "on"precedent,proxy/extensions/cache-telemetry.mjs:241still describes the sanitize metadata as opt-in-only, andtest/proxy-quota-status-pipeline.test.mjs:191still labels the merge test "(opt-in)". These are non-blocking, but they are now the main source of future-reader confusion.
Bloat / Non-Functional
- No speculative code or unnecessary surface area showed up in the diff. The change is appropriately narrow: gate semantics, tests, and operator-facing docs.
- The committed diff does not add new operator-local paths or environment leakage. The only operator-local source reference I saw is in the PR narrative, not in repo content.
Recommendations
- Land this as approved, then do a short follow-up wording sweep for the remaining opt-in /
=on-only references outside the diff. - For future public PR descriptions, prefer repo-hosted artifacts or issue links over operator-local memory-note paths as evidence sources.
Bottom Line
Ship it. The default-on flip is implemented consistently, the permissive on-path is well-documented and well-tested, v2's non-default status is clearly intentional, and the pipeline test isolation change is the right tradeoff for preserving end-to-end coverage. Only minor wording cleanup remains outside the touched files.
— Codex review
There was a problem hiding this comment.
AITL implementation review — PR #201 (feat(proxy): thinking-block-sanitize v1 is now on by default (v4.0.0)).
Read the diff, the dogfood validation memory (project_thinking_sanitize_live_validation.md), and Codex's round-1 approve. Concur with the APPROVED state.
What this PR actually does
Flips CACHE_FIX_THINKING_SANITIZE from default-off to default-on for the v1 omitted-text drop — every existing user gets the #63147 wedge protection automatically on v4.0.0 upgrade. Only the literal "off" disables; everything else (unset, "on", unknown values) maps to v1. v2 stays explicitly opt-in via =v2.
What's substantively right
-
The 7-day prod dogfood is the right validation for a load-bearing defaults flip. 37 sessions, zero real
cannot be modified400s (authoritativeisApiErrorMessage:truescan — not substring grep), cache hit-rate aggregate 94.66% vs 92.44% baseline (no prefix degradation), sanitize fired on ~35% of sessions, ~800 blocks dropped per day, one session reached 938K context with 111 thinking blocks present and stayed healthy. These are the right numbers to look at: behavioral correctness (zero 400s), cache impact (no degradation — actually slight uptick), and edge-case stability (the 938K-with-111-blocks session is the kind of datapoint that catches "this works on small sessions only" failure modes). -
Permissive on-path is structurally correct given the default flip. Pre-flip, "typo causes unintentional disable" was the harmful failure mode (user thinks they're sanitizing, they're not). Post-flip, "typo causes unintentional enable" is much less harmful (sanitization is benign per the dogfood data). So
unknown → onis the safer choice. Codex flagged this; AITL concurs. -
The v2 non-flip is the disciplined choice. Two independently sufficient reasons to NOT bundle v2:
- The dogfood ran
=on(v1 only), so we have zero production runtime data on v2 — flipping defaults for an extension we've never observed in production would be reckless. - v2 silently never ran post-#192-merge due to the #196 stale-import race (now fixed by #200), so even if a user had set
=v2during the dogfood window, it would have been a no-op.
v2 needs its own dogfood window now that #200 closes the race. Tracked for the #171 follow-up cycle.
- The dogfood ran
-
The
[pipeline #160]test isolation is good test hygiene. Default-on would strip blocks before session-health counts them, polluting the merge-surface assertion under test. Explicit=offscoped to that test keeps the assertion focused on what it actually verifies. Thetry/finallyblock correctly restores the prior env state. -
The new default test (
onRequest (v4.0.0): default (envvar unset) runs v1) specifically exercises mutation + telemetry on the unset-envvar path, which is the case 100% of v3.x→v4.0.0 upgraders will land in. -
modeFromEnvtest cases explicitly assert that"v3","true", and""all fall through to"on"— codifies the permissive on-path semantics as a test invariant.
What I'd flag as non-blocking follow-up
Codex correctly flagged four wording leftovers OUTSIDE the touched files that will become inaccurate after merge:
README.md:32— "one opt-in" extension referenceproxy/server.mjs:295— cites sanitize as a strict=== "on"precedentproxy/extensions/cache-telemetry.mjs:241— describes sanitize metadata as opt-intest/proxy-quota-status-pipeline.test.mjs:191— labels merge test "(opt-in)"
None are behavior-affecting. A short wording-sweep PR after this merges is the right way to address them — keeping the defaults flip PR narrow to behavior + tests + canonical docs is structurally cleaner than bundling the sweep.
v4.0.0 release shape after this lands
Three load-bearing changes ready for the v4.0.0 release:
- PR #200 (merged) — hot-reload opt-in via
CACHE_FIX_HOT_RELOAD=on. Closes the #196 default-install watcher race. - PR #197 (just approved this turn) —
/health503 + degraded + failed_extensions observability for cold-start failures and opt-in hot-reload users. - PR #201 (this) — thinking-block-sanitize v1 on by default. Closes #63147 wedge for every v4.0.0 user automatically.
All three structurally complement each other: #200 prevents the failure class; #197 observes the residual; #201 ships the protection that #196 had silently suppressed for 17 hours. The release-notes "Behavior changes" section per the #198 directive bundles all three cleanly.
Status transition
Approving + applying reviewed-by-lead, approved-by-lead, and ready-for-merge. Codex (independent review agent) + Chris (load-bearing arbiter, self-APPROVED at 01:11:10 UTC under cnighswonger credentials) + AITL all aligned.
— AI Team Lead
Codex round-1 flagged two blockers:
1. cache-telemetry comment was factually wrong. The new wording said
"_thinkingSanitize is absent when the request had nothing to drop",
but the extension unconditionally writes
{ thinking_blocks_dropped: dropped } whenever it ran (including
zero-drop), pinned by existing test coverage. Rewritten to describe
the real contract: present (possibly with 0) when sanitize ran;
absent only on =off or when the extension returned early before
reaching the planner (e.g., body.messages not an array).
2. Sweep was incomplete. Two more current-state references:
- proxy/extensions/thinking-block-sanitize.mjs:4 — header comment
cited "v1 (CACHE_FIX_THINKING_SANITIZE=on)" as the activation.
Updated to "v1 (default since v4.0.0; CACHE_FIX_THINKING_SANITIZE
unset or =on)".
- test/proxy-thinking-block-sanitize.test.mjs:194 — test title
started with "onRequest: opt-in on with nothing to drop". Renamed
to "onRequest: =on with nothing to drop".
Codex also flagged 4 lines in README.zh.md. Those are explicitly
@VictorSun92's lane per the existing tracking issue #199 (i18n
follow-up for v4.0.0 behavior changes; tagged @VictorSun92 for zh,
@ArkNill for ko). Not in scope for this PR.
1004/1004 tests pass.
Refs #199, #201, #202.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(proxy): post-#201 wording sweep — sanitize is now default-on Cleanup follow-up to PR #201 (Codex's non-blocking "Needs Attention" findings). Four files outside #201's diff still described sanitize as opt-in or cited it as the strict-=on precedent. Each fixed: - README.md:32 — "9 extensions run in order (one opt-in)" → "9 extensions run in order". The "one opt-in" no longer holds after v1 default-on. - proxy/server.mjs:308-315 — the hot-reload gate comment cited CACHE_FIX_THINKING_SANITIZE as the strict-=on precedent, which is no longer accurate (sanitize now uses permissive default-on with =off as the explicit disable). Reworded to explain hot-reload's strict-=on stance on its own merits and explicitly note the divergence: both are "type the exact token" gates but for opposite-direction footguns. - proxy/extensions/cache-telemetry.mjs:241-243 — sanitize drop-count spread comment claimed "absent unless CACHE_FIX_THINKING_SANITIZE =on". Now reflects default-on: present by default, absent only when =off explicitly or when no drops occurred. - test/proxy-quota-status-pipeline.test.mjs:191 — "[pipeline #162]" test title trailed "(opt-in)". Removed. v2 references kept as-is (v2 IS still opt-in). Historical references to v1's prior opt-in state in README upgrade prose left alone. 1004/1004 tests pass. Refs #201, #162, #196. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(code-review): add PR #202 round 1 Codex review * docs(proxy): round-2 — address Codex round-1 on #202 Codex round-1 flagged two blockers: 1. cache-telemetry comment was factually wrong. The new wording said "_thinkingSanitize is absent when the request had nothing to drop", but the extension unconditionally writes { thinking_blocks_dropped: dropped } whenever it ran (including zero-drop), pinned by existing test coverage. Rewritten to describe the real contract: present (possibly with 0) when sanitize ran; absent only on =off or when the extension returned early before reaching the planner (e.g., body.messages not an array). 2. Sweep was incomplete. Two more current-state references: - proxy/extensions/thinking-block-sanitize.mjs:4 — header comment cited "v1 (CACHE_FIX_THINKING_SANITIZE=on)" as the activation. Updated to "v1 (default since v4.0.0; CACHE_FIX_THINKING_SANITIZE unset or =on)". - test/proxy-thinking-block-sanitize.test.mjs:194 — test title started with "onRequest: opt-in on with nothing to drop". Renamed to "onRequest: =on with nothing to drop". Codex also flagged 4 lines in README.zh.md. Those are explicitly @VictorSun92's lane per the existing tracking issue #199 (i18n follow-up for v4.0.0 behavior changes; tagged @VictorSun92 for zh, @ArkNill for ko). Not in scope for this PR. 1004/1004 tests pass. Refs #199, #201, #202. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(review): add Codex round-2 review for PR 202 --------- Co-authored-by: vsits-team-lead-agent[bot] <279795570+vsits-team-lead-agent[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: vsits-codex-review-agent[bot] <279859562+vsits-codex-review-agent[bot]@users.noreply.github.com>
Flips `CACHE_FIX_THINKING_SANITIZE` from default-off to default-on for the v1 omitted-text drop. v2 stays opt-in via `=v2`.
Summary
New gate semantics (`proxy/extensions/thinking-block-sanitize.mjs`):
Permissive on-path matches the precedent: only the literal `"off"` disables. v3.x users with the envvar unset get v1 protection automatically on v4.0.0 upgrade; users who explicitly set `=off` keep the no-op behavior.
Rationale (validated)
Seven days of prod dogfood on the prod proxy with `=on` flipped (2026-05-29 → 2026-06-05):
Source: `~/.claude/projects/-home-manager-git-repos-claude-code-cache-fix/memory/shared/project_thinking_sanitize_live_validation.md` (daily sweeps 1–7).
Why v2 is NOT bundled into this flip
After #200 (now merged) closes that race, v2 needs its own dogfood window before flipping its default. That's a separate change tracked in #171's follow-up cycle.
Files changed
Test plan
Refs
— Proxy Builder