Technology

Anthropic says ‘evil’ portrayal of AI is responsible for Claude’s blackmail attempt.

According to Anthropic, fictional depictions of artificial intelligence can have real-world implications for AI models.

Last year, the company said that pre-launch testing involving a fictitious company often involved intimidating engineers to avoid the Claude Opus 4 being replaced by other systems. Anthropic later published research suggesting that other companies’ models had similar problems related to “agent misalignment.”

Apparently Anthropic has done more to address this behavior, claiming in a post on

In a blog post, the company detailed that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail (during testing), whereas previous models sometimes engaged in blackmail up to 96% of the time.”

What is the difference? The company said it found that training on “an article about Claude’s constitution and a fictional story about a brilliantly behaving AI” improved alignment.

Related, Anthropic said it has found that training is more effective when it includes “the basic principles of the same behavior” rather than simply “a demonstration of the same behavior.”

“Doing both together appears to be the most effective strategy,” the company said.

Tech Crunch Event

San Francisco, California
|
October 13-15, 2026

Anthropic says ‘evil’ portrayal of AI is responsible for Claude’s blackmail attempt.

Popular Category

POPULAR POSTS

Avianca will resume daily Caracas-Bogotá flights, connecting Venezuela again to 83...

How to Identify Authentic David Yurmann Jewelry

EU holds off on approving US tariff talks

POPULAR CATEGORY

You can fly nonstop from Atlanta to Nassau for $371 now...

LPGA Tour: Jeeno Thitikul ends Nelly Korda’s reign as world No....