From SFT to RLHF: The Thin Layers That Made ChatGPT Possible

While pretraining scaling consumed the headlines, a quieter revolution was happening in the stages that came after. The production LLM stack stabilized into three layers: pretraining, supervised finetuning (SFT), and RLHF.

The Five Scaling Phases of AI — Animated Explainer

The Transformation

SFT turned a raw text predictor into something that could follow instructions. RLHF turned an instruction-follower into something that felt helpful, harmless, and honest. Together, they were the recipe that made ChatGPT possible.

The Escalating Economics

Post-training costs escalated rapidly. Llama 2’s post-training cost $10–20 million. Llama 3.1’s exceeded $50 million — despite using similar volumes of preference data. The cost increase came from more complex processes requiring specialized teams of ~200 people.

The Structural Ceiling

These stages had fundamental limitations:

  • SFT ceiling: The model can never exceed what its human demonstrators showed it
  • RLHF ceiling: Models learn to produce outputs that look correct rather than outputs that are correct
  • The reward signal is noisy (humans disagree), expensive (every label needs a paid annotator), and subjective

These weren’t fixable problems. They were structural constraints of the paradigm — and they set up the need for Phase 4 and 5.

Read the full analysis on The Business Engineer →

Scroll to Top

Discover more from FourWeekMBA

Subscribe now to keep reading and get access to the full archive.

Continue reading

FourWeekMBA