
The Edge-to-Cloud Continuum Replaces the Cloud-First Hierarchy
The conventional AI architecture assumed a simple flow: train in the cloud, infer in the cloud, occasionally push small models to the edge. What is emerging instead is a continuum — a distributed inference fabric in which workloads route dynamically between device, edge, and cloud based on latency requirements, privacy needs, and cost optimization.
Amon illustrated this with a deceptively simple example at Davos. If you’re walking with someone and you ask your glasses, “Who is this person?”, the answer can’t be “Hold on, let me think — keep walking.” The person walks by, and you’ve missed the moment. That inference must happen instantly, on-device, in the glasses.
But the architectural implication is deeper than latency. The shift from pre-training to post-training to test-time compute creates fundamentally different infrastructure requirements at each stage:
- Pre-training needs massive centralized GPU clusters
- Post-training needs flexible inference platforms
- Test-time compute needs real-time, often on-device inference
Qualcomm’s “Hybrid AI” layer — disaggregated serving, multi-model orchestration, edge-to-cloud routing — is architecturally optimized for the test-time scaling era. The company that owns silicon at every point on the compute continuum — from earbud (5 watts) to data center rack (160kW) — can optimize inference routing in ways that single-layer competitors cannot.
Dell CTO John Roese puts it simply: “AI is increasingly living closer to where the data and users are.” IDC predicts AI use cases will spur edge computing spend to nearly $378B by 2028.
The SaaS Economics Revolution
One of Amon’s most underappreciated points at Davos wasn’t about hardware — it was about software economics. He described a scenario: “If you’re a SaaS company and you say, I’m going to have an agent within my application, and every time I run it, I’m paying for a GPU in the cloud… well, how good is my SaaS model?”
Summarize a document? You can send it to the cloud and pay for inference. Or you can run the model locally on a Snapdragon-powered AI PC — for free, since the compute is already included in the hardware.
This has cascading implications across the entire software industry:
- If on-device inference becomes powerful enough for most agent interactions, SaaS economics shift from cloud-compute-intensive to edge-compute-native
- On-device inference turns hardware into a “Summarize for Free” tool locally for free
- Cloud inference becomes optional, not mandatory
The Value Chain Shifts
Today: AI value flows: training → cloud inference → application → user.
Emerging model: training → on-device inference → application → user.
The cloud inference step shrinks. The on-device chip becomes the critical chokepoint. And the company providing that chip — at scale, across billions of devices — captures value that currently flows to cloud compute providers.
For those building AI strategy: the companies most disrupted by this shift are not hardware makers — they are cloud providers whose inference revenue depends on SaaS companies paying per-token. When inference moves to the edge, that revenue moves with it.
This is part of a comprehensive analysis. Read the full analysis on The Business Engineer.









