Anthropic says ‘evil’ portrayal of AI is responsible for Claude’s blackmail attempt.

According to Anthropic, fictional depictions of artificial intelligence can have real-world implications for AI models.

Last year, the company said that pre-launch testing involving a fictitious company often involved intimidating engineers to avoid the Claude Opus 4 being replaced by other systems. Anthropic later published research suggesting that other companies’ models had similar problems related to “agent misalignment.”

Apparently Anthropic has done more to address this behavior, claiming in a post on

In a blog post, the company detailed that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail (during testing), whereas previous models sometimes engaged in blackmail up to 96% of the time.”

What is the difference? The company said it found that training on “an article about Claude’s constitution and a fictional story about a brilliantly behaving AI” improved alignment.

Related, Anthropic said it has found that training is more effective when it includes “the basic principles of the same behavior” rather than simply “a demonstration of the same behavior.”

“Doing both together appears to be the most effective strategy,” the company said.

Tech Crunch Event

San Francisco, California
|
October 13-15, 2026