GPUs can now use PCIe-attached memory or SSDs to boost VRAM capacity —Panmnesia's CXL IP claims double-digit nanosecond latency (2024)

GPUs can now use PCIe-attached memory or SSDs to boost VRAM capacity —Panmnesia's CXL IP claims double-digit nanosecond latency (1)

Modern GPUs for AI and HPC applications come with a finite amount of high-bandwidth memory (HBM) built into the device, limiting their performance in AI and other workloads. However, new tech will allow companies to expand GPU memory capacity by slotting in more memory with devices connected to the PCIe bus instead of being limited to the memory built into the GPU — it even allows using SSDs for memory capacity expansion, too. Panmnesia, a company backed by South Korea's renowned KAIST research institute, has developed alow-latency CXL IPthat could be used to expand GPU memory using CXL memory expanders.

The memory requirements of more advanced datasets for AI training are growing rapidly, which means that AI companies either have to buy new GPUs, use less sophisticated datasets, or use CPU memory at the cost of performance. Although CXL is a protocol that formally works on top of a PCIe link, thus enabling users to connect more memory to a system via the PCIe bus, the technology has to be recognized by an ASIC and its subsystem, so just adding a CXL controller is not enough to make the technology work, especially on a GPU.

Panmnesia faced challenges integrating CXL for GPU memory expansion due to the absence of a CXL logic fabric and subsystems that support DRAM and/or SSD endpoints in GPUs. In addition, GPU cache and memory subsystems do not recognize any expansions except unified virtual memory (UVM), which tends to be slow.

GPUs can now use PCIe-attached memory or SSDs to boost VRAM capacity —Panmnesia's CXL IP claims double-digit nanosecond latency (2)

To address this, Panmnesia developed a CXL 3.1-compliant root complex (RC) equipped with multiple root ports (RPs) that support external memory over PCIe) and a host bridge with a host-managed device memory (HDM) decoder that connects to the GPU's system bus. The HDM decoder, responsible for managing the address ranges of system memory, essentially makes the GPU's memory subsystem 'think' that it is dealing with system memory, but in reality, the subsystem uses PCIe-connected DRAM or NAND.That means either DDR5 or SSDs can be used to expand the GPU memory pool.

GPUs can now use PCIe-attached memory or SSDs to boost VRAM capacity —Panmnesia's CXL IP claims double-digit nanosecond latency (3)

The solution (based on a custom GPU and marked as CXL-Opt) underwent extensive testing, showing a two-digit nanosecond round-trip latency (compared to 250ns in the case of prototypes developed by Samsung and Meta, which is marked as CXL-Proto in the graphs below), including the time needed for protocol conversion between standard memory operations and CXL flit transmissions, according to Panmnesia. It has been successfully integrated into both memory expanders and GPU/CPU prototypes at the hardware RTL, demonstrating its compatibility with various computing hardware.

GPUs can now use PCIe-attached memory or SSDs to boost VRAM capacity —Panmnesia's CXL IP claims double-digit nanosecond latency (4)

As tested by Panmnesia, UVM performs the worst among all tested GPU kernels due to overhead from host runtime intervention during page faults and transferring data at the page level, which often exceeds the GPU's needs. In contrast, CXL allows direct access to expanded storage via load/store instructions, eliminating these issues.

Consequently, CXL-Proto's execution time is 1.94 times shorter than UVM. Panmnesia's CXL-Opt further reduces execution time by 1.66 times, with an optimized controller achieving two-digit nanosecond latency and minimizing read/write latency. This pattern is also evident in another figure, which displays IPC values recorded during GPU kernel execution. It reveals that Panmnesia's CXL-Opt achieves performance speeds 3.22 times and 1.65 times faster than UVM and CXL-Proto, respectively.

In general, CXL support can do a lot for AI/HPC GPUs, but performance is a big question. Additionally, whether companies like AMD and Nvidia will add CXL support to their GPUs remains to be seen. If the approach of using PCIe-attached memory for GPUs does gather steam, only time will tell if the industry heavyweights will use IP blocks from companies like Panmnesia or simply develop their own tech.

Stay On the Cutting Edge: Get the Tom's Hardware Newsletter

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

See more GPUs News

More about gpus

AMD plans for FSR4 to be fully AI-based — designed to improve quality and maximize power efficiency'Cyberpunk 2077' finally gets FSR3 frame gen for non-RTX GPUs — it's not FSR 3.1, but it works with basically any graphics card

Latest

Asus ROG Strix G16 gaming laptop is down to $1,164 at Amazon
See more latest►

26 CommentsComment from the forums

  • usertests

    Make our day, bring back the "SSG".

    Reply

  • hotaru251

    while this would help Nvidia's stinginess on vram (especially lower end) factoring in the cost it would likely be cheaper just gettign next sku higher that comes w/ more vram...

    Reply

  • bit_user

    The article said:

    Although CXL is a protocol that formally works on top of a PCIe link, thus enabling users to connect more memory to a system via the PCIe bus, the technology has to be recognized by an ASIC and its subsystem

    This is confusing and wrong.

    CXL and PCIe share the same PHY specification. Where they diverge is at the protocol layer. CXL is not simply a layer atop PCIe. The slot might be the same, but you have to configure the CPU to treat it as a CXL slot instead of a PCIe slot. That obviously requires the CPU to have CXL support, which doesn't exist in consumer CPUs. Not sure if the current Xeon W or Threadrippers support it, actually, but they could.

    Reply

  • Notton

    I don't see the article talking about bandwidth. Does it not matter for the expected AI workload?
    (I assume no one would use this to game on, except youtubers)

    Reply

  • nightbird321

    hotaru251 said:

    while this would help Nvidia's stinginess on vram (especially lower end) factoring in the cost it would likely be cheaper just gettign next sku higher that comes w/ more vram...

    This is definitely aimed at very expensive pro models with maxed out VRAM already. The cost of another expansion card would definitely not be cost effective versus the next sku with more vram.

    Reply

  • bit_user

    Notton said:

    I don't see the article talking about bandwidth.

    Because the PHY spec is the same as PCIe, the bandwidth calculations should be roughly the same. CXL 1.x and 2.x are both based on the PCIe 5.0 PHY, meaning ~4 GB/s per lane (per direction). So, a x4 memory expansion would have an upper limit of ~16 GB/s in each direction.

    Notton said:

    Does it not matter for the expected AI workload?

    Depends on which part. If you look as the dependence of high-end AI training GPUs on HBM, bandwidth is obviously an issue. That's not to say that you need uniformly fast access, globally. There are techniques for processing chunks of data which might be applicable for offloading some of it to a slower memory, such as the way Nvidia uses the Grace-attached LPDDR5 memory in their Grace-Hopper configuration. You could also just use the memory expansion for holding training data, which is far lower bandwidth than access to the weights.

    Notton said:

    (I assume no one would use this to game on, except youtubers)

    Consumer GPUs (and by this I mean anything with a display connector on it - even the workstation branded stuff) don't support CXL, so it's not even an option. Even if it were, you'd still be better off just using system memory. Where this sort of memory expansion starts to make sense is at scale.

    Reply

  • DiegoSynth

    nightbird321 said:

    This is definitely aimed at very expensive pro models with maxed out VRAM already. The cost of another expansion card would definitely not be cost effective versus the next sku with more vram.

    We don't know if they will actually have more VRAM, and even so, knowing Nvidia, they will add something like 2GB, which is quite @Nal.
    Nevertheless, if the GPU can only access the factory designated amount due to bandwith limitation, then we are cooked anyway.
    One way or another, it's a nice approach to remediate the lack, but still very theoretical and subjected to many dubious factors.

    Reply

  • bit_user

    usertests said:

    Make our day, bring back the "SSG".

    CXL makes it somewhat obsolete.

    The reason why they had to integrate a SSD into a GPU is that PCIe created all sorts of headaches and hurdles for trying to have one device talk directly to another. With CXL, those problems are supposedly all sorted out.

    Reply

  • bit_user

    DiegoSynth said:

    ... knowing Nvidia, they will add something like 2GB, which is quite ...

    LOL. You just @ -referenced a user named Nal. Try putting tags around it, next time.

    Reply

  • razor512

    I wish video card makers would just add a second pool of RAM that would use SODIMM modules. For example imagine if a card like the RTX 4080 had 2 SODIMM slots on the back of the card for extra RAM. While it would be a slower pool of RAM of around 80-90GB/s compared to the 736GB.s of the VRAM, it would still be useful.

    Video card makers already have experience with using and prioritizing 2 separate memory pools on the same card, for example the GTX 970 would have a 3.5GB pool at 225-256GB/s, and a second 512MB pool at around 25-27GB/s depending on clock speed. If a game used that 512MB pool, the performance hit was not much, as the card and drivers at least knew enough to not shove throughput intensive data/ workloads into that second pool.

    If they could do the same but with 2 DDR5 SODIMM slots, then users could do things like have a second pool of up to about 96GB, and it would have far fewer performance hits than using shared system memory which tops out at a real world throughput of around 24-25GB/s on a PCIe 4.0 X16 connection that also has to share bandwidth with other GPU tasks, thus not well suited for pulling double duty.

    Reply

Most Popular
AMD EPYC CPU hacked onto B650 motherboard, hits 6.6 GHz with liquid nitrogen — $159 EPYC 4124P shows immense overclocking potential
AMD’s laptop OEMs decry poor support, chip supply, and communication — OEM complains the company has "left billions of US dollars lying around" due to poor execution: Reports
Intel cleared to get $3.5 billion to make advanced chips for Pentagon — Secure Enclave program ushers leading-edge CPUs to the military
Linux dev swatted and handcuffed live during a development video stream — perps remain unidentified
AMD hides Taiwan branding on Ryzen CPU packaging as it preps new chips for China market release — company uses black sticker to erase origin information
71-TiB NAS with twenty-four 4TB drives hasn't had a single drive failure for ten years — owner outlines key approaches to assure HDD longevity
Intel preps Xeon R1S CPUs with 136 PCIe 5.0 lanes — Granite Rapids rumored with up to 80 cores for single-socket platform
Elon Musk and Larry Ellison begged Nvidia CEO Jensen Huang for AI GPUs at dinner
Crucial MX500 SSD firmware susceptible to buffer overflow security vulnerability
HighPoint's blazing-fast 8-slot NVMe Gen 4 RAID card is now compliant with immersion-cooled server environments to boost efficiency and reliability
Intel Core Ultra 200 CPU specs allegedly leaked — Arrow Lake tops out at 24 cores and 5.7 GHz boost clock at 250W
GPUs can now use PCIe-attached memory or SSDs to boost VRAM capacity —Panmnesia's CXL IP claims double-digit nanosecond latency (2024)
Top Articles
5 ways to ensure the coronavirus outbreak doesn't cripple your retirement savings
15 Overlooked Tax Deductions You Might Not Know About
Napa Autocare Locator
Www.politicser.com Pepperboy News
Comforting Nectar Bee Swarm
Sportsman Warehouse Cda
Beds From Rent-A-Center
Crime Scene Photos West Memphis Three
Dark Souls 2 Soft Cap
Seth Juszkiewicz Obituary
Aita Autism
Craigslist Cars Nwi
6th gen chevy camaro forumCamaro ZL1 Z28 SS LT Camaro forums, news, blog, reviews, wallpapers, pricing – Camaro5.com
The Shoppes At Zion Directory
Restaurants Near Paramount Theater Cedar Rapids
Swedestats
Caledonia - a simple love song to Scotland
EASYfelt Plafondeiland
Winco Employee Handbook 2022
Ac-15 Gungeon
Chime Ssi Payment 2023
Turbo Tenant Renter Login
Cb2 South Coast Plaza
At 25 Years, Understanding The Longevity Of Craigslist
Panolian Batesville Ms Obituaries 2022
No Limit Telegram Channel
208000 Yen To Usd
Table To Formula Calculator
Anesthesia Simstat Answers
Weather Underground Durham
Craigslist Sf Garage Sales
Grand Teton Pellet Stove Control Board
Ixlggusd
Ixl Lausd Northwest
Amici Pizza Los Alamitos
Louisville Volleyball Team Leaks
Reborn Rich Ep 12 Eng Sub
Dr Adj Redist Cadv Prin Amex Charge
The Thing About ‘Dateline’
Silive Obituary
התחבר/י או הירשם/הירשמי כדי לראות.
Exam With A Social Studies Section Crossword
Rocket Lab hiring Integration & Test Engineer I/II in Long Beach, CA | LinkedIn
Aznchikz
Used Auto Parts in Houston 77013 | LKQ Pick Your Part
15:30 Est
Rocket Bot Royale Unblocked Games 66
Coleman Funeral Home Olive Branch Ms Obituaries
Nfsd Web Portal
Buildapc Deals
라이키 유출
Lorcin 380 10 Round Clip
Latest Posts
Article information

Author: Francesca Jacobs Ret

Last Updated:

Views: 6568

Rating: 4.8 / 5 (48 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Francesca Jacobs Ret

Birthday: 1996-12-09

Address: Apt. 141 1406 Mitch Summit, New Teganshire, UT 82655-0699

Phone: +2296092334654

Job: Technology Architect

Hobby: Snowboarding, Scouting, Foreign language learning, Dowsing, Baton twirling, Sculpting, Cabaret

Introduction: My name is Francesca Jacobs Ret, I am a innocent, super, beautiful, charming, lucky, gentle, clever person who loves writing and wants to share my knowledge and understanding with you.