Nvidia CEO Jensen Huang Presents the Next Generation “Blackwell” Chip Family at GTC

Nvidia co-founder and CEO Jensen Huang, left, held up the new Blackwell GPU chip to compare it to its predecessor, the H100 “Hopper.”

Nvidia

Nvidia CEO Jensen Huang on Monday led the AI ​​chipmaker’s first in-person technology conference since the COVID-19 pandemic, the GPU Technology Conference (GTC), in San Jose, California, and introduced the company’s new design for its Chips in front, code with the name “Blackwell”.

Many consider GTC to be the “Woodstock of AI” or the “Lalapalooza of AI.” “I hope you realize that this is not a concert,” Huang said at the start, to great applause. He called the numerous partners and customers present to him.

“Michael Dell is sitting right there,” Huang said, noting that the Dell founder and CEO was in the audience.

Also: The AI ​​startup Cerebras introduces the WSE-3, the largest chip to date for generative AI

Huang emphasized the computational effort required to train large language models of generative AI, or GenAI. A model with trillions of parameters combined with training data consisting of trillions of “tokens,” or parts of words, would require “30 billion quadrillion floating-point operations,” or 30 billion petaFLOPS, Huang noted. “If you had a PetaFLOP GPU, it would take you 30 billion seconds to calculate and train that model – 30 billion seconds is about 1,000 years.”

“I’d like to do it sooner, but it’s worth it — that’s usually my answer,” Huang quipped.

<!–> nvidia-2024-huang-and-large-Language-Models.png

–>

Huang opened his talk with an overview of the increasing size of AI workloads, noting that training the most powerful chips would take 30 billion seconds, or 1,000 years.

Nvidia

Nvidia’s H100 GPU, the current state-of-the-art chip, delivers about 2,000 trillion floating point operations per second, or 2,000 TFLOPS. A thousand TFLOPS is equivalent to one PetaFLOP, which is the H100, and its brother H200 can only manage a few PetaFLOPS, far less than the 30 billion that Huang mentioned.

Also: Making GenAI more efficient with a new chip

“What we need is bigger GPUs – we need much, much bigger GPUs,” he said.

Blackwell, known in the industry as “HopperNext,” can run 20 PetaFLOPS per GPU. The chips are delivered in an 8-way system, an “HGX” board.

Using “quantization,” a type of condensed mathematics in which each value is represented in a neural network with fewer decimal places, called “FP4,” the chip can run up to 144 PetaFLOPs in an HGX system.

The chip has 208 billion transistors, Huang said, and uses a custom semiconductor manufacturing process at Taiwan Semiconductor Manufacturing known as “4NP.” That’s more than double the 80 billion for Hopper GPUs.

<!–> nvidia-blackwell-architecture-image-cropped

–>

The Nvidia Blackwell GPU increases the number of floating point math operations per second tenfold and doubles the number of transistors in the previous “Hopper” series. Nvidia touts the chip’s ability to run large language models 25 times faster.

Nvidia

Blackwell can run large generative AI language models with a trillion parameters 25 times faster than previous chips, Huang said.

Also: For the age of AI PCs, here comes a new speed test

The chip is named after David Harold Blackwell, who, according to Nvidia, “was a mathematician who specialized in game theory and statistics and was the first Black scholar inducted into the National Academy of Sciences.”

The Blackwell chip uses a new version of Nvidia’s high-speed network connection NVLink, which delivers 1.8 terabytes per second to each GPU. A discrete part of the chip is what Nvidia calls a “RAS engine” to maintain “reliability, availability and serviceability” of the chip. A collection of decompression circuits improves the performance of things like database queries.

Amazon Web Services, Dell, Google, Meta, Microsoft, OpenAI, Oracle, Tesla and xAI are among the early adopters of Blackwell.

As with its predecessors, two Blackwell GPUs can be combined with one of Nvidia’s “Grace” microprocessors into a combined chip, dubbed the “GB200 Grace Blackwell Superchip.”

<!–> nvidia-gb200-grace-blackwell-superchip-copy

–>

Like its predecessor Hopper GPUs, two Blackwell GPUs can be combined with one of Nvidia’s “Grace” microprocessors to create a combined chip called the “GB200 Grace Blackwell Superchip.”

Nvidia

Thirty-six of the Grace and 72 of the GPUs can be combined into a rack-based computer, which Nvidia calls the “GB200 NVL72,” capable of running 1,440 petaFLOPS, which is closer to the billion petaFLOPs cited by Huang.

A new system for the chips, the DGX SuperPOD, combines “tens of thousands” of the Grace Blackwell superchips, increasing operations per second even more.

Also: Nvidia is expanding its “superchip” Grace Hopper with faster memory for AI

In addition to Blackwell, Nvidia made several other announcements:

  • New generative AI algorithms to expand the existing library of semiconductor design algorithms called “cuLitho,” which refers to the photolithography used in the semiconductor design process. The GenAI code generates an initial “photomask” for lithography, which can then be refined using conventional methods. It speeds up the design of such photomasks by 100%. TSMC and chip design software manufacturer Synopsys are implementing cuLitho and the new GenAI functions into their technologies.
  • A new line of network switches and network interface cards based on the InfiniBand technology developed by Nvidia’s Mellanox operation, the “Quantum-X800 Infiniband”, and the Ethernet networking standard, the “Spectrum-X800 Ethernet”. Both technologies deliver 800 billion bits per second, or 800 Gbit/s. According to Nvidia, the switches and NICs are “optimized for trillion-parameter GPU computing” to handle the speed of the chips’ floating-point operations.
  • A catalog of 25 “Microservices,” cloud-based application container services software, pre-built for individual applications, including custom AI models, based on Nvidia’s “NIM” container software suite, which in turn is part of the company’s AI Enterprise software offering. The company describes the programs as a “standardized way to run custom AI models optimized for Nvidia’s installed CUDA base of hundreds of millions of GPUs across clouds, data centers, workstations and PCs.” The microservices include a bundle of life sciences-focused tasks, some of which specialize in “generative biology,” chemistry, and “molecular prediction,” to “inference,” the generation of predictions, “for a growing collection of models in the fields Imaging, medical technology, drug discovery and digital health.” The microservices will be made available across Dell and third-party systems, as well as public cloud services such as AWS, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure, and can be tested on Nvidia’s own cloud service become.
  • Earth-2, a separate microservice designed as a “digital twin” to simulate extreme weather conditions, is said to “provide alerts and updated forecasts in seconds, compared to minutes or hours with traditional CPU-driven modeling.” The technology is based on a generative AI model developed by Nvidia called “CorrDiff,” which can produce “12.5 times higher resolution images” of weather patterns “than current numerical models, 1,000 times faster and 3,000 times more energy efficient.” The Weather Company is an early adopter of the technology.
<!–> Earth-2

–>

A high-resolution Earth image simulation from a “digital twin” simulation of extreme weather conditions, called Earth-2 Climate, is designed to “provide alerts and updated forecasts in seconds, compared to minutes or hours with traditional CPU-driven modeling.” The technology is based on a generative AI model developed by Nvidia called “CorrDiff,” which can produce “12.5 times higher resolution images” of weather patterns “than current numerical models, 1,000 times faster and 3,000 times more energy efficient.” The Weather Company is an early adopter of the technology.

Nvidia

Also: How Apple’s AI advances could make or break the iPhone 16

In addition to its own product and technology announcements, Nvidia announced several initiatives with partners:

  • A collaboration with Oracle for “sovereign AI” to run AI programs locally, “within the secure premises of a country or organization.”
  • A new supercomputer for Amazon AWS based on DGX systems with Blackwell chips called “Ceiba”.
  • A partnership with Google Cloud to extend the JAX programming framework to Nvidia chips and “expand access to large-scale LLM training for the broader ML community.”

For more news, visit the Nvidia newsroom.

You can watch the entire keynote speech as a replay on YouTube.

!function(f,b,e,v,n,t,s)
{if(f.fbq)return;n=f.fbq=function(){n.callMethod?
n.callMethod.apply(n,arguments):n.queue.push(arguments)};
if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version=’2.0′;
n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];
s.parentNode.insertBefore(t,s)}(window, document,’script’,
‘https://connect.facebook.net/en_US/fbevents.js’);
fbq(‘set’, ‘autoConfig’, false, ‘789754228632403’);
fbq(‘init’, ‘789754228632403’);

Sharing Is Caring:

Leave a Comment

close