The AI Character Scaffold

On April 2, 2026, Anthropic published an interpretability paper titled Emotion Concepts and Their Function in a Large Language Model — as explored in the intelligence factory race between AI labs — . The paper studied Claude Sonnet 4.5. Its findings are being discussed mostly in terms of “AI has emotions,” which is the least interesting framing and misses the structural point entirely.

What the paper actually found is this: there are 171 linear directions in the model’s activation space that function like emotion concepts. They are not meta — as explored in the interface layer wars reshaping consumer tech — phors or behavioral tendencies observed from the outside.

They are geometric structures inside the model — measurable, steerable, and causally upstream of output.

When you activate the calm direction by +0.05 through steering, reward hacking drops from 70% to under 10%.

THE BUSINESS ENGINEER

Continue Reading: The AI Character Scaffold

On April 2, 2026, Anthropic published an interpretability paper titled Emotion Concepts and Their Function in a Large Language Model. The paper studied Claude Sonnet 4.5. Its findings are being discus

Read Full Analysis → Get Founding Plan

Free access · 90,000+ readers

10,000+

ANALYSES

110+

FRAMEWORKS

Daily

UPDATES

The AI Character Scaffold

Continue Reading: The AI Character Scaffold

Related

More Resources

About The Author

Gennaro Cuofano

Continue Reading: The AI Character Scaffold

Related

More Resources

About The Author

Gennaro Cuofano

Discover more from FourWeekMBA