The AI Character Scaffold

On April 2, 2026, Anthropic published an interpretability paper titled Emotion Concepts and Their Function in a Large Language Model — as explored in the intelligence factory race between AI labs. The paper studied Claude Sonnet 4.5. Its findings are being discussed mostly in terms of “AI has emotions,” which is the least interesting framing and misses the structural point entirely.

What the paper actually found is this: there are 171 linear directions in the model’s activation space that function like emotion concepts. They are not meta — as explored in the interface layer wars reshaping consumer tech — phors or behavioral tendencies observed from the outside.

They are geometric structures inside the model — measurable, steerable, and causally upstream of output.

When you activate the calm direction by +0.05 through steering, reward hacking drops from 70% to under 10%.

THE BUSINESS ENGINEER

Continue Reading: The AI Character Scaffold

On April 2, 2026, Anthropic published an interpretability paper titled Emotion Concepts and Their Function in a Large Language Model. The paper studied Claude Sonnet 4.5. Its findings are being discus

Free access · 90,000+ readers
10,000+
ANALYSES
110+
FRAMEWORKS
Daily
UPDATES
Scroll to Top

Discover more from FourWeekMBA

Subscribe now to keep reading and get access to the full archive.

Continue reading

FourWeekMBA