The NVIDIA H100 Tensor Core GPU is a ninth-generation data center GPU that delivers unprecedented performance, scalability, and security for large-scale AI and HPC workloads. It features the NVIDIA Hopper architecture, a dedicated Transformer Engine, and the NVLink Switch System that enables exascale computing and trillion-parameter AI. In this article, we will provide an overview of the NVIDIA H100 Tensor Core GPU, its features, and its technical details.
NVIDIA H100 Tensor Core GPU Overview
The NVIDIA H100 Tensor Core GPU is designed to address the challenges and opportunities of the following domains:
Accelerated Computing
The H100 GPU offers an order-of-magnitude performance leap over the previous generation NVIDIA A100 Tensor Core GPU for AI and HPC applications. It supports a wide range of precision modes, including FP8, FP16, TF32, BF16, INT8, and INT4, to optimize performance and efficiency for different workloads. It also supports mixed-precision computing, which combines lower-precision arithmetic with higher-precision accumulation and scaling, to accelerate training and inference of deep neural networks.
Large Language Model Inference
The H100 GPU includes a dedicated Transformer Engine that can accelerate large language models (LLMs) up to 175 billion parameters by 30X over the A100 GPU. The Transformer Engine is a specialized hardware unit that implements the core operations of the Transformer neural network architecture, such as matrix multiplication, softmax, and layer normalization. The H100 GPU also supports the NVIDIA NVL PCIe form factor with NVLink bridge, which allows easy scaling of LLMs across multiple GPUs in a data center.
Enterprise AI
The H100 GPU is compatible with the NVIDIA AI Enterprise software suite, which simplifies AI adoption for enterprises. The NVIDIA AI Enterprise software suite is a comprehensive set of AI frameworks, tools, and libraries that are optimized and certified for NVIDIA GPUs and VMware vSphere. It enables enterprises to build and deploy AI applications such as chatbots, recommendation engines, vision AI, and more on mainstream servers with H100 GPUs.
Features of the NVIDIA H100 Tensor Core GPU
The NVIDIA H100 Tensor Core GPU offers the following features to enable secure, transformational, and high-performance AI and HPC:
Secure Workloads
The H100 GPU supports NVIDIA Multi-Instance GPU (MIG) technology, which allows multiple users and applications to share a single GPU with hardware-enforced quality of service and isolation. MIG enables secure and efficient utilization of GPU resources in cloud, edge, and enterprise environments. The H100 GPU also supports NVIDIA BlueField®-3 DPU, which offloads and accelerates networking, storage, and security functions from the CPU to the DPU, enhancing data center security and performance.
Transformational AI Training
The H100 GPU features fourth-generation Tensor Cores and a Transformer Engine with FP8 precision that provide up to 4X faster training over the A100 GPU for GPT-3 (175B) models. The H100 GPU also supports the NVIDIA NVLink Switch System, which connects up to 256 H100 GPUs with 900 GB/s of GPU-to-GPU interconnect bandwidth, enabling exascale and trillion-parameter AI. The H100 GPU also leverages NDR Quantum-2 InfiniBand networking, which accelerates communication across nodes, and NVIDIA Magnum IO software, which optimizes data movement and storage for AI and HPC workloads.
Real-Time Deep Learning Inference
The H100 GPU delivers up to 3X higher performance for real-time deep learning inference over the A100 GPU, thanks to its improved Tensor Cores, Transformer Engine, and memory bandwidth. The H100 GPU supports NVIDIA Triton™ Inference Server, which simplifies the deployment and management of AI models across multiple frameworks and platforms. The H100 GPU also supports NVIDIA Jarvis, which is a fully accelerated conversational AI framework that enables natural language understanding, speech recognition, and speech synthesis on H100 GPUs.
High-Performance Computing
The H100 GPU delivers up to 3X higher performance for HPC applications over the A100 GPU, thanks to its improved floating-point units, memory bandwidth, and interconnect. The H100 GPU supports NVIDIA CUDA®, which is a parallel computing platform and programming model that enables developers to harness the power of GPUs for HPC. The H100 GPU also supports NVIDIA HPC SDK, which is a comprehensive suite of compilers, libraries, and tools for HPC development and optimization on NVIDIA GPUs.
Data Analytics
The H100 GPU delivers up to 2X higher performance for data analytics over the A100 GPU, thanks to its improved memory bandwidth and interconnect. The H100 GPU supports NVIDIA RAPIDS™, which is a collection of open-source libraries and APIs that enable GPU-accelerated data preparation, machine learning, and visualization for data analytics. The H100 GPU also supports NVIDIA Spark 3.0, which is a GPU-accelerated version of Apache Spark that enables fast and scalable data processing and analytics on H100 GPUs.
Technical Details of the NVIDIA H100 Tensor Core GPU
The NVIDIA H100 Tensor Core GPU is based on the NVIDIA Hopper architecture, which is the first multi-chip-module (MCM) GPU architecture from NVIDIA. The Hopper architecture consists of two main components: the GPU Processing Unit (GPU-PU) and the GPU Memory Unit (GPU-MU). The GPU-PU contains the compute, graphics, and memory subsystems of the GPU, while the GPU-MU contains the high-bandwidth memory (HBM) and the memory controller of the GPU. The GPU-PU and the GPU-MU are connected by a high-speed interconnect called the Hopper Interconnect Fabric (HIF).
The technical details of the NVIDIA H100 Tensor Core GPU are as follows:
Hopper Architecture
The H100 GPU is built with 80 billion transistors using a cutting-edge TSMC 4N process custom-tailored for NVIDIA’s accelerated computing needs. It features eight GPU PUs and eight GPU MUs, each with 10 GB of HBM3 memory, resulting in a total of 80 GB of HBM3 memory per GPU. The H100 GPU also features 16 NVLink ports, each with 56.25 GB/s of bidirectional bandwidth, resulting in a total of 900 GB/s of GPU-to-GPU interconnect bandwidth per GPU. The H100 GPU also supports PCIe Gen5, which provides 64 GB/s of host-to-GPU bandwidth per GPU.
Transformer Engine
The H100 GPU includes a dedicated Transformer Engine that can accelerate large language models up to 175 billion parameters by 30X over the A100 GPU. The Transformer Engine is a specialized hardware unit that implements the core operations of the Transformer neural network architecture, such as matrix multiplication, softmax, and layer normalization. The Transformer Engine supports FP8, FP16, TF32, and BF16 precision modes, and can process up to 512 tokens per cycle. The H100 GPU has 128 Transformer Engines per GPU-PU, resulting in a total of 1024 Transformer Engines per GPU.
Trillion-Parameter Language Model
The H100 GPU can scale up to 256 GPUs with the NVLink Switch System, enabling exascale and trillion-parameter AI. The NVLink Switch System is a novel interconnect topology that connects multiple H100 GPUs with a high-radix NVLink switch. The NVLink switch provides 256 NVLink ports, each with 56.25 GB/s of bidirectional bandwidth, resulting in a total of 14.4 TB/s of switch bandwidth. The NVLink switch also supports adaptive routing, congestion management, and quality of service, ensuring efficient and reliable communication across GPUs. The NVLink Switch System can support up to 20 trillion parameters with 256 H100 GPUs, delivering unprecedented performance and scalability for AI and HPC workloads.