AI Supercomputers In A Nutshell

AI supercompute — as explored in the economics of AI compute infrastructure — rs are ultrafast processors that can manage and interpret quantities of data on an enormous scale.

Table of Contents

Understanding AI supercomputers

Supercomputers are those with levels of performance that far exceed conventional computers, laptops, and other consumer devices. This enables them to process vast quantities of data and, importantly, derive important insights from it.

By extension, AI supercomputers are those that can run the next generation of AI algorithms. They are comprised of hundreds of thousands of individual processors, a specialized network, and a significant amount of storage. Since there are so many processors, each performs only a small amount of the work and communicates with the others to increase processing speed.

While AI supercomputers may seem complex, standard operating systems like Linux manage application, network, and scheduling tasks. But with densely populated circuit boards, they tend to run hot and require an extensive cooling system with circulating refrigerant and forced air that dissipates heat.

The supercomputer market is predicted to experience a CAGR of 9.5% until 2026 based on the increased adoption of cloud computing and related technologies.

This increase will also be driven by a need for systems that can handle vast datasets to train and operate AI models. According to OpenAI, the computing power required to train such models has been doubling every 3.4 months.

Why do AI supercomputers matter in the context of the current AI paradigm?

The transformer architecture completely shifted the AI paradigm, finally transforming AI from narrow to specialized.

Yet, for that change to become commercially viable and valuable, it needed to scale; how?

By enabling large language model — as explored in the intelligence factory race between AI labs — s to be trained on a massive amount of data, by breaking those down into billions of parameters, and by training them for long-enough to create a general-purpose AI engine, able to be then customized.

This whole process required a massive amount of computation and not any computation. These large language models would scale via a transformer architecture that required parallel computing, achieved via a special kind of chip called GPU.

Thus, a bunch of powerful GPUs, organized around a specific architecture, optimized for parallel computing on the cloud, enabled large language models like GPT, first and tools like ChatGPT to become viable.

Indeed, underlying OpenAI’s GPT models and ChatGPT, there is the Microsoft Azure AI Supercomputer, on which Microsoft has spent billions to consolidate since 2019.

The AI Supercomputer is a critical piece of the puzzle to understanding the OpenAI business model.

how-does-openai-make-money — OpenAI has built the foundational layer of the AI industry. With large generative models like GPT-3 and DALL-E, OpenAI offers API access to businesses that want to develop applications on top of its foundational models while being able to plug these models into their products and customize these models with proprietary data and additional AI features. On the other hand, OpenAI also released ChatGPT, developing around a freemium model. Microsoft also commercializes opener products through its commercial partnership.

Today, the AI business architecture is comprised of three key paradigms:

The software paradigm (from very narrow and cumbersome to general and highly trainable).
The hardware paradigm (from CPUs to GPUs)
And the business paradigm (the three layers of AI).

How can AI supercomputers manage heavy workloads?

There are three core components.

Circuits

Very small wire connections mean the circuit board can be loaded with more power when compared to those used in a standard desktop PC. This allows for arithmetic and logical operations to be interpreted and executed in a sequential fashion.

Nodes

AI supercomputers have numerous CPUs to facilitate rapid computational speed. Each of these CPUs (nodes) has 10 to 12 cores and there are often thousands of nodes within an architecture. Work performance is often in the trillions of cycles per second range.

Processing

AI supercomputers run multiple workloads simultaneously with parallel processing. Since thousands of tasks are performed at once, the work is completed in a matter of milliseconds.

As a result, companies can train faster and more accurate AI models with precision. They can also apply key insights to processes, test more scenarios, and ultimately, advance the industries in which they operate.

AI supercomputer examples

Microsoft

Microsoft build a supercomputer for OpenAI in 2020 as part of its substantial investment in the company. Designed for OpenAI’s machine learning research, Microsoft’s unnamed supercomputer has 285,000 CPU cores, 10,000 GPUs, and some 400 gigabits per second of network connectivity.

The supercomputer is hosted in Azure and was seen as the first step in making powerful AI models available for other developers and organizations to build upon.

Nvidia

Nvidia’s Cambridge-1 was launched in July 2021 and was dubbed the most powerful supercomputer in the United Kingdom. It would primarily be used by the nation’s top scientists and health professionals to facilitate the digital biology revolution.

The company noted that its AI supercomputer could be incorporated into nanotechnologies to better understand dementia. Alternatively, it could be used to improve the accuracy of identifying disease-causing variations in human gene sequences.

Key takeaways

AI supercomputers are ultrafast processors that can manage and interpret quantities of data on an enormous scale. To run the next generation of AI algorithms, they are comprised of hundreds of thousands of individual processors, a specialized network, and a sizeable amount of storage.
The supercomputer market is predicted to experience a CAGR of 9.5% until 2026 based on the increased uptake of cloud technology and the need for systems that can handle vast datasets to train and operate AI.
Three notable AI supercomputer examples include Meta’s AI Research SuperCluster (RSC), Nvidia’s Cambridge-1, and Microsoft’s unnamed supercomputer built specifically for machine learning research at OpenAI.

Key Highlights

AI Supercomputers Overview:
- AI supercomputers are high-performance processors designed to handle vast amounts of data and execute complex AI algorithms.
- They consist of numerous individual processors, a specialized network, and substantial storage capacity.
- These supercomputers utilize parallel computing, where multiple tasks are executed simultaneously to achieve rapid processing speeds.
Importance in AI Paradigm:
- AI supercomputers play a crucial role in enabling the training and operation of large language models like GPT-3.
- The transformer architecture, used in large language models, requires significant computation, often achieved using specialized chips like GPUs.
- Microsoft Azure AI Supercomputer underpins OpenAI’s GPT models and ChatGPT.
Components and Functionality:
- Circuits: AI supercomputers use small wire connections for efficient power distribution, enabling sequential execution of operations.
- Nodes: Each supercomputer has numerous CPUs (nodes) with multiple cores, providing rapid computational speed.
- Parallel Processing: Supercomputers execute multiple tasks simultaneously, allowing for quick completion of complex workloads.
Advantages and Applications:
- AI supercomputers enable faster and more accurate training of AI models.
- Companies can gain valuable insights, test scenarios, and advance various industries.
AI Supercomputer Examples:
- Meta’s AI Research SuperCluster (RSC): Aimed to be the world’s fastest supercomputer, focused on training models in computer vision and NLP for Metaverse development.
- Microsoft’s Supercomputer: Built for OpenAI with 285,000 CPU cores, 10,000 GPUs, and hosted in Azure. Supports AI model development.
- Nvidia’s Cambridge-1: Dubbed the most powerful supercomputer in the UK, used by scientists and health professionals for medical and biological research.
Market Predictions:
- The AI supercomputer market is projected to experience a Compound Annual Growth Rate (CAGR) of 9.5% until 2026.
- Growth is driven by increased cloud technology adoption and the need for handling large datasets for AI training.