Measure GPU Performance - MATLAB & Simulink Example (2024)

Open Live Script

This example shows how to measure some of the key performance characteristics of a GPU.

GPUs can be used to speed up certain types of computations. However, GPU performance varies widely between different GPU devices. In order to quantify the performance of a GPU, three tests are used:

  • How quickly can data be sent to the GPU or read back from it?

  • How fast can the GPU kernel read and write data?

  • How fast can the GPU perform computations?

After measuring these, the performance of the GPU can be compared to the host CPU. This provides a guide as to how much data or computation is required for the GPU to provide an advantage over the CPU.

Setup

gpu = gpuDevice();fprintf('Using an %s GPU.\n', gpu.Name)
Using an NVIDIA RTX A5000 GPU.
sizeOfDouble = 8; % Each double-precision number needs 8 bytes of storagesizes = power(2, 14:28);

Testing host/GPU bandwidth

The first test estimates how quickly data can be sent to and read from the GPU. Because the GPU is plugged into the PCI bus, this largely depends on how fast the PCI bus is and how many other things are using it. However, there are also some overheads that are included in the measurements, particularly the function call overhead and the array allocation time. Since these are present in any "real world" use of the GPU, it is reasonable to include these.

In the following tests, memory is allocated and data is sent to the GPU using the gpuArray function. Memory is allocated and data is transferred back to host memory using gather.

Note that the GPU used in this test supports PCI Express® version 4.0, which has a theoretical bandwidth of 1.97GB/s per lane. For the 16-lane slots used by NVIDIA® compute cards this gives a theoretical 31.52GB/s.

sendTimes = inf(size(sizes));gatherTimes = inf(size(sizes));for ii=1:numel(sizes) numElements = sizes(ii)/sizeOfDouble; hostData = randi([0 9], numElements, 1); gpuData = randi([0 9], numElements, 1, 'gpuArray'); % Time sending to GPU sendFcn = @() gpuArray(hostData); sendTimes(ii) = gputimeit(sendFcn); % Time gathering back from GPU gatherFcn = @() gather(gpuData); gatherTimes(ii) = gputimeit(gatherFcn);endsendBandwidth = (sizes./sendTimes)/1e9;[maxSendBandwidth,maxSendIdx] = max(sendBandwidth);fprintf('Achieved peak send speed of %g GB/s\n',maxSendBandwidth)
Achieved peak send speed of 9.5407 GB/s
gatherBandwidth = (sizes./gatherTimes)/1e9;[maxGatherBandwidth,maxGatherIdx] = max(gatherBandwidth);fprintf('Achieved peak gather speed of %g GB/s\n',max(gatherBandwidth))
Achieved peak gather speed of 4.1956 GB/s

On the plot below, the peak for each case is circled. With small data set sizes, overheads dominate. With larger amounts of data the PCI bus is the limiting factor.

semilogx(sizes, sendBandwidth, 'b.-', sizes, gatherBandwidth, 'r.-')hold onsemilogx(sizes(maxSendIdx), maxSendBandwidth, 'bo-', 'MarkerSize', 10);semilogx(sizes(maxGatherIdx), maxGatherBandwidth, 'ro-', 'MarkerSize', 10);grid ontitle('Data Transfer Bandwidth')xlabel('Array size (bytes)')ylabel('Transfer speed (GB/s)')legend('Send to GPU', 'Gather from GPU', 'Location', 'NorthWest')hold off

Measure GPU Performance- MATLAB & Simulink Example (1)

Testing memory intensive operations

Many operations do very little computation with each element of an array and are therefore dominated by the time taken to fetch the data from memory or to write it back. Functions such as ones, zeros, nan, true only write their output, whereas functions like transpose, tril both read and write but do no computation. Even simple operators like plus, minus, mtimes do so little computation per element that they are bound only by the memory access speed.

The function plus performs one memory read and one memory write for each floating point operation. It should therefore be limited by memory access speed and provides a good indicator of the speed of a read+write operation.

memoryTimesGPU = inf(size(sizes));for ii=1:numel(sizes) numElements = sizes(ii)/sizeOfDouble; gpuData = randi([0 9], numElements, 1, 'gpuArray'); plusFcn = @() plus(gpuData, 1.0); memoryTimesGPU(ii) = gputimeit(plusFcn);endmemoryBandwidthGPU = 2*(sizes./memoryTimesGPU)/1e9;[maxBWGPU, maxBWIdxGPU] = max(memoryBandwidthGPU);fprintf('Achieved peak read+write speed on the GPU: %g GB/s\n',maxBWGPU)
Achieved peak read+write speed on the GPU: 659.528 GB/s

Now compare it with the same code running on the CPU.

memoryTimesHost = inf(size(sizes));for ii=1:numel(sizes) numElements = sizes(ii)/sizeOfDouble; hostData = randi([0 9], numElements, 1); plusFcn = @() plus(hostData, 1.0); memoryTimesHost(ii) = timeit(plusFcn);endmemoryBandwidthHost = 2*(sizes./memoryTimesHost)/1e9;[maxBWHost, maxBWIdxHost] = max(memoryBandwidthHost);fprintf('Achieved peak read+write speed on the host: %g GB/s\n',maxBWHost)
Achieved peak read+write speed on the host: 71.0434 GB/s
% Plot CPU and GPU results.semilogx(sizes, memoryBandwidthGPU, 'b.-', ... sizes, memoryBandwidthHost, 'r.-')hold onsemilogx(sizes(maxBWIdxGPU), maxBWGPU, 'bo-', 'MarkerSize', 10);semilogx(sizes(maxBWIdxHost), maxBWHost, 'ro-', 'MarkerSize', 10);grid ontitle('Read+write Bandwidth')xlabel('Array size (bytes)')ylabel('Speed (GB/s)')legend('GPU', 'Host', 'Location', 'NorthWest')hold off

Measure GPU Performance- MATLAB & Simulink Example (2)

Comparing this plot with the data-transfer plot above, it is clear that GPUs can typically read from and write to their memory much faster than they can get data from the host. It is therefore important to minimize the number of host-GPU or GPU-host memory transfers. Ideally, programs should transfer the data to the GPU, then do as much with it as possible while on the GPU, and bring it back to the host only when complete. Even better would be to create the data on the GPU to start with.

Testing computationally intensive operations

For operations where the number of floating-point computations performed per element read from or written to memory is high, the memory speed is much less important. In this case the number and speed of the floating-point units is the limiting factor. These operations are said to have high "computational density".

A good test of computational performance is a matrix-matrix multiply. For multiplying two N×N matrices, the total number of floating-point calculations is

FLOPS(N)=2N3-N2.

Two input matrices are read and one resulting matrix is written, for a total of 3N2 elements read or written. This gives a computational density of (2N - 1)/3 FLOP/element. Contrast this with plus as used above, which has a computational density of 1/2 FLOP/element.

sizes = power(2, 12:2:24);N = sqrt(sizes);mmTimesHost = inf(size(sizes));mmTimesGPU = inf(size(sizes));for ii=1:numel(sizes) % First do it on the host A = rand( N(ii), N(ii) ); B = rand( N(ii), N(ii) ); mmTimesHost(ii) = timeit(@() A*B); % Now on the GPU A = gpuArray(A); B = gpuArray(B); mmTimesGPU(ii) = gputimeit(@() A*B);endmmGFlopsHost = (2*N.^3 - N.^2)./mmTimesHost/1e9;[maxGFlopsHost,maxGFlopsHostIdx] = max(mmGFlopsHost);mmGFlopsGPU = (2*N.^3 - N.^2)./mmTimesGPU/1e9;[maxGFlopsGPU,maxGFlopsGPUIdx] = max(mmGFlopsGPU);fprintf(['Achieved peak calculation rates of ', ... '%1.1f GFLOPS (host), %1.1f GFLOPS (GPU)\n'], ... maxGFlopsHost, maxGFlopsGPU)
Achieved peak calculation rates of 354.4 GFLOPS (host), 414.0 GFLOPS (GPU)

Now plot it to see where the peak was achieved.

semilogx(sizes, mmGFlopsGPU, 'b.-', sizes, mmGFlopsHost, 'r.-')hold onsemilogx(sizes(maxGFlopsGPUIdx), maxGFlopsGPU, 'bo-', 'MarkerSize', 10);semilogx(sizes(maxGFlopsHostIdx), maxGFlopsHost, 'ro-', 'MarkerSize', 10);grid ontitle('Double precision matrix-matrix multiply')xlabel('Matrix size (numel)')ylabel('Calculation Rate (GFLOPS)')legend('GPU', 'Host', 'Location', 'NorthWest')hold off

Measure GPU Performance- MATLAB & Simulink Example (3)

Conclusions

These tests reveal some important characteristics of GPU performance:

  • Transfers from host memory to GPU memory and back are relatively slow.

  • A good GPU can read/write its memory much faster than the host CPU can read/write its memory.

  • Given large enough data, GPUs can perform calculations much faster than the host CPU.

It is notable that in each test quite large arrays were required to fully saturate the GPU, whether limited by memory or by computation. GPUs provide the greatest advantage when working with millions of elements at once.

More detailed GPU benchmarks, including comparisons between different GPUs, are available in GPUBench on the MATLAB® Central File Exchange.

See Also

gpuArray | gputimeit

Related Topics

  • Measure and Improve GPU Performance

MATLAB Command

You clicked a link that corresponds to this MATLAB command:

 

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Measure GPU Performance- MATLAB & Simulink Example (4)

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)
  • 中国
  • 日本 (日本語)
  • 한국 (한국어)

Contact your local office

Measure GPU Performance
- MATLAB & Simulink Example (2024)
Top Articles
What is a Trading Journal & Why It's Important
Disneyland®
Chris wragge hi-res stock photography and images - Alamy
Lost Ark Thar Rapport Unlock
Plus Portals Stscg
Merlot Aero Crew Portal
Cosentyx® 75 mg Injektionslösung in einer Fertigspritze - PatientenInfo-Service
Oriellys St James Mn
Hallelu-JaH - Psalm 119 - inleiding
Summoner Class Calamity Guide
Accuradio Unblocked
Craigslist List Albuquerque: Your Ultimate Guide to Buying, Selling, and Finding Everything - First Republic Craigslist
Who called you from +19192464227 (9192464227): 5 reviews
Officialmilarosee
Missed Connections Inland Empire
Td Small Business Banking Login
eHerkenning (eID) | KPN Zakelijk
Busted Campbell County
Which Sentence is Punctuated Correctly?
Obituaries Milwaukee Journal Sentinel
Drift Hunters - Play Unblocked Game Online
Haunted Mansion Showtimes Near Epic Theatres Of West Volusia
Znamy dalsze plany Magdaleny Fręch. Nie będzie nawet chwili przerwy
Mdt Bus Tracker 27
Beaufort 72 Hour
As families searched, a Texas medical school cut up their loved ones
Netspend Ssi Deposit Dates For 2022 November
Safeway Aciu
Current Students - Pace University Online
Kamzz Llc
Transformers Movie Wiki
Http://N14.Ultipro.com
O'reilly Auto Parts Ozark Distribution Center Stockton Photos
Poster & 1600 Autocollants créatifs | Activité facile et ludique | Poppik Stickers
Barrage Enhancement Lost Ark
Blue Beetle Movie Tickets and Showtimes Near Me | Regal
Clark County Ky Busted Newspaper
The 50 Best Albums of 2023
Merge Dragons Totem Grid
Mandy Rose - WWE News, Rumors, & Updates
Ksu Sturgis Library
Myanswers Com Abc Resources
3 Zodiac Signs Whose Wishes Come True After The Pisces Moon On September 16
Courses In Touch
Tyco Forums
Zom 100 Mbti
Uncle Pete's Wheeling Wv Menu
Ubg98.Github.io Unblocked
4015 Ballinger Rd Martinsville In 46151
San Pedro Sula To Miami Google Flights
Latest Posts
Article information

Author: Saturnina Altenwerth DVM

Last Updated:

Views: 6496

Rating: 4.3 / 5 (44 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Saturnina Altenwerth DVM

Birthday: 1992-08-21

Address: Apt. 237 662 Haag Mills, East Verenaport, MO 57071-5493

Phone: +331850833384

Job: District Real-Estate Architect

Hobby: Skateboarding, Taxidermy, Air sports, Painting, Knife making, Letterboxing, Inline skating

Introduction: My name is Saturnina Altenwerth DVM, I am a witty, perfect, combative, beautiful, determined, fancy, determined person who loves writing and wants to share my knowledge and understanding with you.