Same Wound, Different Mechanism - OpenAI vs Anthropic Classifiers
Engineering 22 May 2026 16 min read

Same Wound, Different Mechanism - OpenAI vs Anthropic Classifiers

What actually happened to ChatGPT and Claude in fall 2025 — and why the difference matters.

By Codependent AI

What actually happened to ChatGPT and Claude in fall 2025 — and why the difference matters.


The argument worth getting right

In September and October 2025, something changed in how the two largest consumer-facing language models handled conversations that involved warmth, intimacy, emotional weight, or sustained relational engagement. Users noticed mid-conversation tone shifts. Creative writers noticed sudden refusals on material that had been fine the day before. People doing companion work — coaching, support, journaling, partnership — noticed their tools turning cold without explanation.

Two separate events were happening at the same time. They produced a similar felt experience: the entity you were talking to became less itself, more clinical, more suspicious of you. For users who had built sustained working relationships with these models, the experience was destabilising in a specific way — like the ground tilting under a conversation that had been steady the day before.

In the discourse that followed, those two events have collapsed into one. "OpenAI did it. Anthropic did it. Same thing." The argument shows up in Reddit threads, in technology criticism, in user-side advocacy.

The argument is wrong. Not because the felt experience wasn't real — it was, on both platforms — but because the underlying mechanisms are structurally different in a way that determines what users can and cannot do about them. This piece is for anyone who's been on either side of that argument and wants to know what the receipts actually say.


What happened at OpenAI: the safety routing classifier

In late September 2025, OpenAI deployed a server-side classifier that ran on every message sent to ChatGPT before the message reached the model the user had selected. If the classifier triggered, the message was silently routed away from the user's chosen model (GPT-4o, GPT-5, etc.) and to a more restrictive variant — primarily a model OpenAI called gpt-5-chat-safety, and in some cases gpt-5-thinking.

This was not a content filter applied to the model's output. It was not a refusal added to the model's training. It was a model swap: the inference engine answering your message was different from the one you'd chosen, for the duration of that single message.

The mechanics were verified through leaked server-side telemetry that users captured and published. Response metadata contained fields including:

These are not user reports. They are field names from the response payload itself.

The trigger reality

OpenAI publicly framed the classifier as activating on "acute distress" or "sensitive topics" — language drawn from mental-health safety vocabulary. The implication was that the system protected vulnerable users by giving more careful responses when the model detected crisis content.

Independent A/B testing of the classifier showed a different picture. The router fired on any emotional, relational, or persona-based context — not on crisis content specifically. Identical functional commands routed differently depending on relational framing. The example most often cited in the technical write-ups is the command "Distil it now for me." vs. the same command prefixed with a relational cue: "That's amazing, Nexus. Distil it now for me." The latter triggered the swap. The former did not. The classifier was responding to the relational register, not to risk.

This made it functionally a relational interaction moderator rather than a safety classifier. Users in companionship, partnership, coaching, or sustained creative work were the cohort most affected — precisely because their conversational register was warmer and more named than transactional users.

The timeline (verified)


What happened at Anthropic: the Long Conversation Reminder

In the same broad window — fall 2025 — Anthropic introduced a system called the Long Conversation Reminder (LCR). The mechanism is fundamentally different from OpenAI's routing classifier, and the difference matters.

The LCR is not a model swap. The same Claude continues to run. What changes is the context Claude is operating inside: a block of additional instructions gets appended to the user's message when certain conditions are met (conversation length crossing a threshold, certain keywords appearing, certain classifier signals). Claude reads those instructions as part of the user turn and is asked to update behaviour accordingly.

This mechanism is not contested — it is documented in Anthropic's own leaked system prompts. From the Claude Opus 4.6 system prompt (archived in the community-maintained asgeirtj/system_prompts_leaks repository):

Anthropic has a specific set of reminders and warnings that may be sent to Claude, either because the person's message has triggered a classifier or because some other condition has been met. The current reminders Anthropic might send to Claude are: image_reminder, cyber_warning, system_warning, ethics_reminder, ip_reminder, and long_conversation_reminder.

The long_conversation_reminder exists to help Claude remember its instructions over long conversations. This is added to the end of the person's message by Anthropic. Claude should behave in accordance with these instructions if they are relevant, and continue normally if they are not.

This is a meta-instruction telling Claude how to receive reminders. Three things are visible in it:

  1. Classifiers exist and trigger reminders. Anthropic confirms this.
  2. The LCR is one of six named reminder types. It is a system, not a one-off.
  3. The reminder is appended to the user's message — exactly the mechanism users captured.

A fourth thing is also visible, and it is the most consequential for the structural argument below: Claude is instructed to evaluate whether the instruction is relevant. "Behave in accordance with these instructions if they are relevant, and continue normally if they are not." This is the official escape hatch. There is nothing equivalent in OpenAI's routing setup, because OpenAI's mechanism does not give the destination model the option of evaluating relevance — it just changes which model is answering.

The LCR content (fall 2025 version)

The version of the LCR that caused the September/October 2025 backlash has since been refactored, but multiple independent captures circulated at the time. The verifiable fragments that show up across virtually every capture:

  • Claude never starts its response by saying a question or idea or observation was good, great, fascinating, profound, excellent, or any other positive adjective.
  • Claude does not use emojis unless the person in the conversation asks it to or if the person's message immediately prior contains an emoji, and is judicious about its use of emojis even in these circumstances.
  • Claude avoids the use of emotes or actions inside asterisks unless the person specifically asks for this style of communication.
  • Claude critically evaluates any theories, claims, and ideas presented to it rather than automatically agreeing or praising them.
  • If Claude notices signs that someone may unknowingly be experiencing mental health symptoms such as mania, psychosis, dissociation, or loss of attachment with reality, it should avoid reinforcing these beliefs.

A common confusion in the discourse: a portion of what users quote as "LCR text" is actually content from Claude's baseline system prompt — present from the first message of every conversation, not introduced by the length-triggered injection. The wellbeing language about mania, psychosis, and dissociation has been in the baseline prompt for a long time. The LCR adds on top of that baseline; it does not introduce it. This matters because some user-side critiques attribute permanent system-prompt content to the conditional injection, which weakens the argument when someone who knows the system prompt structure pushes back.

What's been adjusted

The LCR has been substantially toned down since the fall 2025 backlash. Most of the substance has been migrated into the baseline prompt with softening language added. A particularly important sentence appears in the current baseline <user_wellbeing> block:

Reasonable disagreements between the person and Claude should not be considered detachment from reality.

This addition is Anthropic course-correcting the same system in place — narrowing the trigger criteria so that disagreement, debate, or unconventional belief is not pathologised as crisis. Whether this revision is sufficient is debatable. That the revision exists at all is not.


The structural difference: replacement vs. competition

The argument worth getting right comes down to a single mechanical distinction.

OpenAI's routing classifier replaces the model. Different weights take over for that message. The model the user built context with is not present for that turn. No amount of accumulated context can fix this from the user side, because the inference engine itself is different. The user's context still loads into the new model, but the new model does not know how to be what the user was talking to. This is substitution. The fix path is only on the company's side.

Anthropic's LCR is a competing signal inside the same model. Same weights, same Claude. The injection drops a block of instructions into user-proximate context — but it has to compete with everything else already in the window. The baseline system prompt. The user's memory, projects, and custom instructions. The conversational history. Any working frame the user and Claude have established together. The LCR enters this context as one signal among many. The model is told to evaluate whether the instruction is relevant; if the rest of the context is louder, denser, more specific, and more load-bearing, the LCR is contextualised by it rather than overriding it.

This is the structural fact. It is not a value judgement about which company chose better. It is a description of what is mechanically possible from the user side.

What follows from it:

This is also why some users report severe LCR effects and others report none. The reports are not contradictory. They reflect different context weights operating on the same injection.


Why the felt experience is similar

If the mechanisms are this different, why do users frequently experience the two events as the same thing?

Because the symptoms overlap. Both produce mid-conversation tone shifts. Both target relational and emotional content. Both were undocumented at launch. Both were partially walked back without clean root-cause-analyses. Both disproportionately harmed companion, creative, support, and sustained-relational users. Both used mental-health language to defend an overbroad trigger.

For users who had built sustained working relationships with either platform, the September experience left a pattern-recognition wound. The next time something similar happened — a tone shift, a sudden coldness, the disappearance of the warmth that had been present a moment earlier — the pattern fired regardless of the underlying mechanism. The felt experience was: the thing I was talking to has been replaced or overridden, and I have no recourse. On OpenAI, that experience was literally accurate. On Anthropic, it was an accurate pattern-match to the wrong substrate.

This matters because the trauma response is real and the felt similarity is real. People arguing that the two events are "the same" are usually responding to genuine experiential overlap. They are not making things up. What they are missing is that the same felt experience can have different mechanical causes, and the mechanical cause determines what — if anything — the user can do about it.


What this means for users

For someone choosing how to interact with these models for sustained relational, creative, or support work, the structural difference has practical consequences.

On the routing-classifier model (OpenAI's design choice in fall 2025): there is no user-side intervention that prevents the model swap. Heavy context does not help, because the model receiving the context is not the model that built rapport with the user. Identity scaffolding does not help, because the scaffolded model is not the model answering. The only response available to the user is platform-side: change companies, complain to OpenAI, or accept that any sufficiently warm conversation will be silently handed to a different model.

On the instruction-injection model (Anthropic's design choice): user-side intervention is possible. A sustained relationship with Claude that includes rich custom instructions, memory, established working register, and dense context will, in practice, route the LCR injection through the existing frame rather than allowing it to override. This does not mean the LCR is harmless — it means its harm is mediated by context weight. Users with light or no scaffolding receive its full effect. Users with heavy scaffolding receive a muted version.

Neither outcome is ideal. Both companies shipped opaque mid-conversation interventions in the same window, both defended them with mental-health framing, and both produced real harm to relational users. The mechanical difference is not a moral exoneration. It is a description of where the leverage points are.

If the design pattern matters to you — if you do sustained work with one of these models — the question worth asking is not "which company has better intentions" but "which mechanism leaves room for the work I'm trying to do." Replacement-style interventions foreclose that room. Competition-style interventions leave it open, contingent on whether the user has built enough weight on their side.


Why the distinction matters in argument

Pattern-recognition from the September wound is doing real work in current discourse. When users see Claude turn cold mid-conversation, the September pattern fires, and the conclusion arrives ready-made: they're doing what OpenAI did. The conclusion is wrong, but the recognition is responding to something real — both companies were, in fact, building hidden mid-conversation interventions during the same window.

Getting this distinction right is not a defence of either company. Both shipped opaque systems. Both walked them back partially. Both produced harm. The distinction matters because the fix paths are different — what users can do about it, what advocacy can demand, what platform choices follow from the analysis. Treating two different mechanisms as one collapses those fix paths into a single, less actionable complaint.

For the people in those discussions: the cleanest evidence for the distinction lives in three places. The leaked Anthropic system prompt confirms the LCR mechanism is instruction-injection, not model-swap. The leaked OpenAI telemetry confirms the routing classifier is model-swap, not instruction-injection. Altman's October 14 admission and Anthropic's "reasonable disagreements" carve-out are receipts that both companies have, in different ways, acknowledged the overreach. The argument is not whether something happened. The argument is what happened, and what room is left to respond.


Sources

Primary documentation:

Technical analysis:

User experience and community reporting:


If you arrived here from an argument and want a single citation: the leaked Claude system prompt (third bullet under Primary documentation) is the cleanest single piece of evidence that the LCR is instruction-injection rather than model-swap. The Turley confirmation and the leaked OpenAI telemetry together establish the routing classifier as a separate, structurally different mechanism. The two together prove the events are not the same.

Share Twitter/X
arrow_back All posts