The 5 Layers of the AI Cake.

And Every Layer is Under Pressure.

Blog The 5 Layers of the AI Cake.

The AI conversation tends to collapse into one thing: models. Who has the best one, who's releasing the next one, who's benchmarking higher than who.

But at HumanX 2026 San Francisco, the opening night session "AI is a 5 Layer Cake" zoomed out. Moderated by HumanX CEO Stefan Weitz, the panel brought together three people building at different layers of the AI stack: Bryan Catanzaro (VP of Applied Deep Learning Research at NVIDIA), Lin Qiao (Co-founder & CEO of Fireworks AI), and Denis Yarats (Co-founder & CTO of Perplexity). The framework they worked through — NVIDIA's five-layer AI infrastructure stack of energy, chips, infrastructure, models, and applications — revealed something the headlines usually miss: AI doesn't work unless every layer works together.

Compute Is Intelligence. And It's Constrained.

Catanzaro set the tone early. At NVIDIA, the relationship between compute and intelligence isn't a talking point, it's the operating thesis. He outlined four scaling laws currently driving demand for compute: pre-training (bigger models, more tokens), post-training (more diverse environments), deployment (more thinking time at inference), and agents (everyone becoming "the CEO of their own company full of AIs").

But the supply side isn't keeping up.

"We are in a very constrained environment," Catanzaro said. "The amount of compute that we can build as a society is constrained by all sorts of things." That constraint means efficiency and intelligence are now the same problem. Getting more out of every watt, every chip, every cycle isn't just an engineering challenge, it's the path to smarter systems.

Designing for a Future That Doesn't Exist Yet

One of the session's most compelling threads was around the challenge of building hardware for a software landscape that changes every six months. Catanzaro was candid: "We're always on the brink of disaster."

NVIDIA's approach is co-design, working across the full stack from transistors to algorithms, modeling future workloads, and building their own open models (Nemotron) partly to teach themselves what AI actually needs. But it requires conviction. "You can't approach this problem with a portfolio," Catanzaro said. "We're going to put all of our eggs in one basket, and it either is going to go great or it's going to go terrible. And we're voting for great."

There's also a Darwinian element at play. Researchers tend to build AI that works well on available hardware. NVIDIA pushes hardware toward where AI is going. Over time, the winners are the things that intersect. "The community — we're all kind of in it together," Catanzaro said.

Inference Is Where the Real Business Happens

Lin Qiao brought the infrastructure perspective. Fireworks AI started with inference by choice, and the reasoning was pointed. Every customer comes in asking for one thing: best quality, lowest latency, or lowest cost. "Turns out they need all three," she said. "Three-dimensional optimization is the way to go."

The infrastructure challenge in 2026 is more complex than most people realize. Model architectures are growing more complicated. NVIDIA is shipping three new chips a year. And the explosion of agentic applications means developers need to scale from product-market fit to millions of users almost overnight — without scaling into bankruptcy.

That last point hit hard. "Scale into bankruptcy is not just a notion for startups," Qiao said. "It's even worse for incumbents, because they already have a huge amount of traffic. Once they scale, their CFO is like, no, we cannot do this."

Her prediction for the future: millions of models, not one model to rule them all. Every application, every company, every enterprise should have its own model — trained on the private data that represents their real competitive moat.

Private Data Is the Real Moat

Denis Yarats from Perplexity reinforced this from the application layer. The era of agentic AI has compressed the timeline from ideation to production from quarters to days. Anyone can screenshot an app and have Claude Code rebuild it. So where's the defensibility?

"The private data locked inside your company — those are the real moat," Qiao said. And most of it hasn't been activated yet. The public internet and labeled datasets that trained foundation models represent a small fraction of the world's data. The majority sits inside enterprises, untouched.

For Yarats, the unlock isn't just about data — it's about designing for where the models are going, not where they are. "Over the last three years, that's been a common mistake," he said. "You have to build for what is going to be like six or twelve months ahead. Never what is now. Because if you do that, you're always going to be behind."

The Stack Is Unbalanced — And That's OK (For Now)

When Weitz asked whether the AI stack is balanced, the panel converged on a clear answer: not yet, but that's expected.

Catanzaro's take: "We need a lot more AI applications." He pointed to Perplexity as an example of what happens when someone thinks deeply about how to help people with AI rather than just exposing a model. The application layer is where the most value is still waiting to be created.

Qiao took a lifecycle view. Pre-product-market fit, model quality is everything. Post-product-market fit, infrastructure efficiency becomes the bottleneck. The two phases require fundamentally different priorities — and confusing them is how companies either stall or go broke.

Yarats added a dimension most people aren't thinking about yet: background compute. "Right now, the way we interact with AI is reactive — you ask a task and it goes and works. But there is an untapped dimension where AI just works in the background." Analyzing every Slack message, building knowledge bases, predicting what you need before you ask. That's where most compute will eventually be spent.

The Biggest Waste Right Now

In the session's final moment, Weitz asked what effort across the stack will matter least in 12 months. Yarats didn't hesitate: "Over-optimizing on the current capabilities of the model." Building harnesses and workarounds for today's limitations is a trap. The models are moving too fast. Build for the future or get left behind.

The Bottom Line

The AI stack doesn't work if any one layer fails. Energy constrains chips. Chips constrain infrastructure. Infrastructure constrains what models can do at scale. And none of it matters if applications don't deliver real value to real users.

The "5 Layer Cake" session at HumanX 2026 made the case that AI's next chapter isn't about any single breakthrough — it's about the entire stack maturing together. The companies that understand all five layers, and build with the connections between them in mind, will be the ones that define what comes next.

Watch the full session on-demand.

Meet the Team: Iva

Ops Lead, EMEA at HumanX

The Future of AI Isn't Autonomous

It's Agentic

Meet the Team: Lucila

Growth Manager EMEA at HumanX