Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (2024)

The NVIDIA DGX™ H100/H200 Systems are the universal systems purpose-built for all AI infrastructureand workloads from analytics to training to inference. The DGX H100/H200 systems are built on eightNVIDIA H100 Tensor Core GPUs or eight NVIDIA H200 Tensor Core GPUs.

Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (1)

Hardware Overview

DGX H100/H200 Component Descriptions

The NVIDIA DGX H100 (640 GB)/H200 (1,128 GB) systems include the following components.

Table 1. Component Description

Component

Description

GPU

For H100: 8 x NVIDIA H100 GPUs that provide 640 GB total GPU memory

For H200: 8 x NVIDIA H200 GPUs that provide 1,128 GB total GPU memory

CPU

2 x Intel Xeon 8480C PCIe Gen5 CPUs with 56 cores each 2.0/2.9/3.8 GHz (base/all core turbo/Max turbo)

NVSwitch

4 x 4th generation NVLinks that provide 900 GB/s GPU-to-GPU bandwidth

Storage (OS)

2 x 1.92 TB NVMe M.2 SSD (ea) in RAID 1 array

Storage (Data Cache)

8 x 3.84 TB NVMe U.2 SED (ea) in RAID 0 array

Network (Cluster) card

4 x OSFP ports for 8 x NVIDIA® ConnectX®-7 Single Port InfiniBand Cards

Each card provides the following speeds:

  • InfiniBand (default): Up to 400Gbps

  • Ethernet: 400GbE, 200GbE, 100GbE, 50GbE, 40GbE, 25GbE, and 10GbE

Network (storage and in-band management) card

2 x NVIDIA® ConnectX®-7 Dual Port Ethernet Cards

Each card provides the following speeds:

  • Ethernet (default): 400GbE, 200GbE, 100GbE, 50GbE, 40GbE, 25GbE, and 10GbE

  • InfiniBand: Up to 400Gbps

System memory (DIMM)

2 TB using 32 x DIMMs

BMC (out-of-band system management)

1 GbE RJ45 interface

Supports Redfish, IPMI, SNMP, KVM, and Web user interface

System management interfaces (optional)

Dual port 100GbE in slot 3 and 10 GbE RJ45 interface

Power supply

6 x 3.3 kW

Mechanical Specifications

Table 2. Mechanical Specifications

Feature

Description

Form Factor

8U Rackmount

Height

14” (356 mm)

Width

19” (482.3 mm) max

Depth

35.3” (897.1 mm) max

System Weight

287.6 lbs (130.45 kg) max

Power Specifications

The DGX H100/H200 system contains six power supplies with balanced distribution of the power load.

Table 3. Power Specifications

Input

Specification for Each Power Supply

200-240 volts AC

10.2 kW max.

3300 W @ 200-240 V, 16 A, 50-60 Hz

Support for PSU Redundancy and Continuous Operation

The system includes six power supply units (PSU) configured for 4+2 redundancy.

Refer to the following additional considerations:

  • If a PSU fails, troubleshoot the cause and replace the failed PSU immediately.

  • To replace faulty PSUs, ensure that the system is idle or shut down the system before installing operational PSUs.

  • If three PSUs lose power as a result of a data center issue or power distribution unit failure,the system continues to function, but at a reduced performance level.

  • If only three PSUs have power, shut down the system before replacing an operational PSU.

  • The system only boots if at least three PSUs are operational.If fewer than three PSUs are operational, only the BMC is available.

  • Do not operate the system with PSUs depopulated.

DGX H100/H200 Locking Power Cord Specification

The DGX H100/H200 system is shipped with a set of six (6) locking power cords that have been qualifiedfor use with the DGX H100/H200 system to ensure regulatory compliance.

Warning

To avoid electric shock or fire, only use the NVIDIA-provided power cords to connect power to the DGX H100/H200.For more details, refer to Electrical Precautions.

Power Cord Specification

Power Cord Feature

Specification

Electrical

250VAC, 20A

Plug Standard

C19/C20

Dimension

1200mm length

Compliance

Cord: UL62, IEC60227

Connector/Plug: IEC60320-1

Using the Locking Power Cords

This section provides information about how to use the locking power cords.

Locking and Unlocking the PDU Side

Power Distribution Unit side

  • To INSERT, push the cable into the PDU socket.

  • To REMOVE, press the clips together and pull the cord out of the socket.

    Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (2)

Locking/Unlocking the PSU Side (Cords with Twist-Lock Mechanism)

Power Supply (System) side - Twist locking

  • To INSERT or REMOVE make sure the cable is UNLOCKED and push/ pull into/out of the socket.

    Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (3)

Environmental Specifications

Here are the environmental specifications for your DGX H100/H200 system.

Feature

Specification

Operating Temperature

5° C to 30° C (41° F to 86° F)

Relative Humidity

20% to 80% non-condensing

Airflow

1105 CFM Front-to-Back @ 80% fan PWM

Heat Output

38,557 BTU/hr

Front Panel Connections and Controls

This section provides information about the front panel, connections, and controls of the DGX H100/H200 system.

With a Bezel

Here is an image of the DGX H100/H200 system with a bezel.

Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (4)

Control

Description

Power Button

Press to turn the DGX H100/H200 system On or Off.

  • Green flashing (1 Hz): Standby (BMC booted)

  • Green flashing (4 Hz): POST in progress

  • Green solid On: Power On

ID Button

Press to have the blue LED turn On or blink (configurable through the BMC) as an identifier during servicing.

Also causes an LED on the back of the unit to flash as an identifier during servicing.

Fault LED

Amber On: System or component faulted

With the Bezel Removed

Here is an image of the DGX H100/H200 system without a bezel.

Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (5)

Important

Refer to the section First Boot Setup for instructions on how to properly turn the system on or off.

Rear Panel Modules

Here is an image that shows the real panel modules on DGX H100/H200.

Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (6)

Motherboard Connections and Controls

Here is an image that shows the motherboard connections and controls in a DGX H100/H200 system.

Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (7)
Table 4. Motherboard Controls

Control

Description

Power Button

Press to turn the system On or Off.

ID LED Button

Blinks when ID button is pressed from the front of the unit as an aid in identifying the unit needing servicing.

BMC Reset button

Press to manually reset the BMC.

See Network Connections, Cables, and Adaptors for details on the network connections.

Motherboard Tray Components

Here is an image that shows the motherboard tray components in a DGX H100/H200 system.

Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (8)

GPU Tray Components

Here is an image of the GPU tray components in a DGX H100/H200 system.

Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (9)

Network Connections, Cables, and Adaptors

This section provides information about network connections, cables, and adaptors.

Network Ports

Here is an image that shows the network ports on a DGX H100/H200 system.

Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (10)
Table 5. Network Port Mapping

Port Designation

Port

PCI Bus

Default

Optional

RDMA

OSFP1P1

dc:00.0

ibp220s0

enp220s0np0

mlx5_11

OSFP1P2

9a:00.0

ibp154s0

enp154s0np0

mlx5_6

OSFP2P1

ce:00.0

ibp206s0

enp206s0np0

mlx5_10

OSFP2P2

c0:00.0

ibp192s0

enp192s0np0

mlx5_9

OSFP3P1

4f:00.0

ibp79s0

enp79s0np0

mlx5_4

OSFP3P2

40:00.0

ibp64s0

enp64s0np0

mlx5_3

OSFP4P1

5e:00.0

ibp94s0

enp94s0np0

mlx5_5

OSFP4P2

18:00.0

ibp24s0

enp24s0np0

mlx5_0

Slot1 P1

aa:00.0

ibp170s0f0

enp170s0f0np0

mlx5_7

Slot1 P2

aa:00.1

enp170s0f1np1

ibp170s0f1np1

mlx5_8

Slot2 P1

29:00.0

ibp41s0f0

enp41s0f0np0

mlx5_1

Slot2 P2

29:00.1

enp41s0f1np1

ibp41s0f1np1

mlx5_2

Slot3 P1

82:00.0

ens6f0

N/A

irdma0

Slot3 P2

82:00.1

ens6f1

N/A

irdma1

On-board

0b:00.0

eno3

N/A

Compute and Storage Networking

Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (11)

Network Modules

  • New form factor for aggregate PCIe network devices

  • Consolidates four ConnectX-7 networking cards into a single device

  • Two networking modules are installed on interposer board

  • Interposer board connects to CPUs on one end and to GPU tray on the other

  • DensiLink cables are used to go directly from ConnectX-7 networking cards to OSFP connectors at the back of the system

Each DensiLink cable has two ports, one from each ConnectX-7 card

Table 6. Network Modules

Port

ConnectX Device

Network Module/CPU

GPU

Default

RDMA

OSFP1P1

CX0

1

7

ibp220s0

mlx5_11

OSFP1P2

CX1

1

4

ibp154s0

mlx5_6

OSFP2P1

CX2

1

6

ibp206s0

mlx5_10

OSFP2P2

CX3

1

5

ibp192s0

mlx5_9

OSFP3P1

CX2

2

ibp79s0

mlx5_4

OSFP3P2

CX3

1

ibp64s0

mlx5_3

OSFP4P1

CX0

3

ibp94s0

mlx5_5

OSFP4P2

CX1

ibp24s0

mlx5_0

Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (12)

BMC Port LEDs

The BCM RJ-45 port has two LEDs.

The LED on the left indicates the speed.Solid green indicates the speed is 100M.Solid amber indicates the speed is 1G.

The LED on the right is green and flashes to indicate activity.

Supported Network Cables and Adaptors

The DGX H100/H200 system is not shipped with network cables or adaptors. You will need to purchase supported cables or adaptors for your network.

The ConnectX-7 firmware determines which cables and adaptors are supported. For a list of cables andadaptors compatible with the NVIDIA ConnectX cards installed in the DGX H100/H200 system,

  1. Visit the NVIDIA Adapter Firmware Release page.

  2. Click the ConnectX model and select the corresponding firmware included in the DGX H100/H200 system.

  3. From the left Topics pane, select the Validated and Supported Cables and Switches topic.

DGX H100/200 System Topology

The following figure shows the DGX H100/H200 system topology.

DGX OS Software

The DGX H100/H200 system comes pre-installed with a DGX software stack incorporating the following components:

  • An Ubuntu server distribution with supporting packages.

  • The following system management and monitoring software:

    • NVIDIA System Management (NVSM)

      Provides active health monitoring and system alerts for NVIDIA DGX nodes in a data center. It also provides simple commands for checking the health of the DGX H100/H200 system from the command line.

    • Data Center GPU Management (DCGM)

      This software enables node-wide administration of GPUs and can be used for cluster and data-center level management.

  • DGX H100/H200 system support packages.

  • The NVIDIA GPU driver

  • Docker Engine

  • NVIDIA Container Toolkit

  • NVIDIA Networking OpenFabrics Enterprise Distribution for Linux (MOFED)

  • NVIDIA Networking Software Tools (MST)

  • cachefilesd (daemon for managing cache data storage)

Customer Support

Contact NVIDIA Enterprise Support for assistance in reporting, troubleshooting, or diagnosing problems with your DGX H100/H200 system. Also contact NVIDIA Enterprise Support for assistance in moving the DGX H100/H200 system.

  • For contracted Enterprise Support questions, you can send an email to [email protected].

  • For additional details about how to obtain support, go to NVIDIA Enterprise Support.

Our support team can help collect appropriate information about your issue and involve internal resources as needed.

Introduction to NVIDIA DGX H100/H200 Systems — NVIDIA DGX H100/H200 User Guide 1 documentation (2024)
Top Articles
Using multiple routers in your house
Is 50K a Good Salary for a Single Person in 2024? | SoFi
11 beste sites voor Word-labelsjablonen (2024) [GRATIS]
Friskies Tender And Crunchy Recall
Fort Morgan Hometown Takeover Map
Umbc Baseball Camp
Swimgs Yuzzle Wuzzle Yups Wits Sadie Plant Tune 3 Tabs Winnie The Pooh Halloween Bob The Builder Christmas Autumns Cow Dog Pig Tim Cook’s Birthday Buff Work It Out Wombats Pineview Playtime Chronicles Day Of The Dead The Alpha Baa Baa Twinkle
The Largest Banks - ​​How to Transfer Money With Only Card Number and CVV (2024)
Craigslist Pets Longview Tx
Restored Republic January 20 2023
Missed Connections Inland Empire
Autobell Car Wash Hickory Reviews
Kent And Pelczar Obituaries
Www.paystubportal.com/7-11 Login
Oriellys St James Mn
Valentina Gonzalez Leak
6th gen chevy camaro forumCamaro ZL1 Z28 SS LT Camaro forums, news, blog, reviews, wallpapers, pricing – Camaro5.com
Dc Gas Login
Hell's Kitchen Valley Center Photos Menu
Rachel Griffin Bikini
Conan Exiles: Nahrung und Trinken finden und herstellen
Forum Phun Extra
How To Level Up Roc Rlcraft
Caledonia - a simple love song to Scotland
Gayla Glenn Harris County Texas Update
What Channel Is Court Tv On Verizon Fios
Tu Pulga Online Utah
Plaza Bonita Sycuan Bus Schedule
UMvC3 OTT: Welcome to 2013!
Craigslist Roseburg Oregon Free Stuff
T Mobile Rival Crossword Clue
Jailfunds Send Message
Vivification Harry Potter
Federal Express Drop Off Center Near Me
91 Octane Gas Prices Near Me
Bursar.okstate.edu
Mrstryst
Blackstone Launchpad Ucf
2012 Street Glide Blue Book Value
آدرس جدید بند موویز
Puffco Peak 3 Red Flashes
Bbc Gahuzamiryango Live
Kelly Ripa Necklace 2022
Albertville Memorial Funeral Home Obituaries
Is The Nun Based On a True Story?
Puretalkusa.com/Amac
Tsbarbiespanishxxl
Suffix With Pent Crossword Clue
Henry Ford’s Greatest Achievements and Inventions - World History Edu
Vindy.com Obituaries
Makes A Successful Catch Maybe Crossword Clue
Craigslist.raleigh
Latest Posts
Article information

Author: Ms. Lucile Johns

Last Updated:

Views: 6148

Rating: 4 / 5 (61 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Ms. Lucile Johns

Birthday: 1999-11-16

Address: Suite 237 56046 Walsh Coves, West Enid, VT 46557

Phone: +59115435987187

Job: Education Supervisor

Hobby: Genealogy, Stone skipping, Skydiving, Nordic skating, Couponing, Coloring, Gardening

Introduction: My name is Ms. Lucile Johns, I am a successful, friendly, friendly, homely, adventurous, handsome, delightful person who loves writing and wants to share my knowledge and understanding with you.