APU on Linux

AGI for everyone everywhere!

'Make' a 16 GB VRAM GPU

⚠️

When there is discrete GPU in the build: AMD APU + AMD discrete GPU system may not able to utilize the Radeon™ Graphics. AMD APU or AMD APU + discrete Nvidia GPU works.

ℹ️

Following were tested on both 4600G and 5600G. 5700G should also work. Products details.

hardware requirements:

APU (older or newer generations than 4600G, 5600G, 5700G have not been tested.)
AM4 motherboard which supports the APU,
The system is recommended to have at least 32 GB RAM installed. It may not allow to allocate 16 GB RAM if less than 32 GB RAM is available.

My hardware in test:

MSI B450-A PRO MAX.
AMD 4600g (with stock cooler)
2x16GB RAM (32GB is enough unless larger RAM needed, e.g. GPT-J-6B loading)
2x16GB RAM Dominator
2TB crucial P1 ssd
650W PSU

Motherboard BIOS setting for APU iGPU

Motherboard brand, BIOS version, ROCm version may affect your mileage. Below is a summary of the tested systems.

Motherboard Brand	BIOS version	Results	Notes
MSI B450-A PRO MAX	E7B86AMS.MC0	working	tested latest BIOS doesn't work (not able to detect gpu)
ASUS ROG Strix B450-A	Latest	working
Gigabyte B450M DS3H Wifi AM4	Latest	working

Allocate 16GB system RAM to the APU in BIOS

Enter the BIOS mode after pressing power on button. A common way to enter BIOS setup is to press a specific key or a combination of keys during the startup process, before the Windows logo appears. The key or keys may differ depending on your computer, but some of the most common ones are F1, F2, F10, F12, Del, or Esc.

Within motherboard BIOS, enable ‘Above 4G encoding’ and auto Re-Size BAR support. There are setting to choose primary graphic, igpu or discrete PCIE gpu. Choose igpu. There is also a mode for 'GAME' or 'FORCE'. 'Force' mode allows you to customize the dedicated iGPU VRAM. There are multiple options to set frame buffer size: AUTO,1,2,4,8,16 GB. Please try to upgrade your BIOS version if you can not see the 8 or 16 GB options. Choose 16 GB to allocate 16 GB dedicated RAM to the iGPU.

To verify, in your terminal,

checkvram.sh

sudo dmesg | grep drm | grep amdgpu

example output:

[ 2.949962] [drm] amdgpu version: 5.18.13
[ 2.968548] [drm] amdgpu: 16384M of VRAM memory ready
[ 2.968549] [drm] amdgpu: 15946M of GTT memory ready

ROCm installation

AMD ROCm (Radeon Open Compute platform) is a software platform designed to enable developers to create high-performance computing applications that can harness the full potential of AMD GPUs (Graphics Processing Units) and CPUs (Central Processing Units). It is an open-source platform that provides a wide range of tools, libraries, and support for developing applications in a variety of programming languages, including C++, Python, and Fortran.

The primary goal of ROCm is to provide a unified and consistent programming environment for both AMD GPUs and CPUs, which enables developers to create high-performance applications that can take full advantage of the performance capabilities of both types of processors. With support for advanced features such as HIP (Heterogeneous-compute Interface for Portability) and OpenCL (Open Computing Language), ROCm provides a flexible and powerful platform for creating a wide range of applications, from machine learning and data analytics to scientific computing and gaming.

Overall, AMD ROCm is a powerful and versatile platform that is well-suited for developers who want to create high-performance computing applications that can run on both GPUs and CPUs, and it is widely used in the scientific computing and machine learning communities.

ℹ️

ROCm is supported on Linux platform (Ubuntu/Debian, RHEL/CentOS, SLES/OpenSUSE). Recently AMD is adding supports to Windows, but it is not mature yet.

Installation instructions on Ubuntu

To install ROCm for APU, the official documentation should be followed. We have tested ROCm 5.4.0 and ROCm 5.6.0.

For video turorial for installation of ROCm, please check out the video tutorials (opens in a new tab).

Below is instructions for installing ROCm 5.4.0 (opens in a new tab) on Ubuntu (20.04.5 LTS). For other Linux distributions and other ROCm versions, please check the official documentation (opens in a new tab).

1 Install required tools and packages

sudo apt-get update
sudo apt-get install libnuma-dev libncurses5
sudo apt-get install wget gnupg2 gawk curl
sudo reboot

2 Setting Permissions for Groups

This step is needed so add any current user can access GPU resources.

sudo usermod -a -G render $LOGNAME
sudo usermod -a -G video $LOGNAME

3 Check if kernel headers and development packages are installed on the system:`

sudo dpkg -l | grep linux-headers
sudo dpkg -l | grep linux-modules-extra

4 Add the gpg key:

curl -fsSL https://repo.radeon.com/rocm/rocm.gpg.key | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/rocm-keyring.gpg

5 Add the AMDGPU repository:

echo 'deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/amdgpu/5.4/ubuntu focal main' | sudo tee /etc/apt/sources.list.d/amdgpu.list

6 Install the kernel-mode driver and reboot the system:

sudo apt-get update
sudo apt install amdgpu-dkms
sudo reboot

7 Add the ROCm repository:

echo 'deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/5.4 focal main' | sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt-get update

8 Install the single-version ROCm meta-packages:

sudo apt install rocminfo rocm-smi rocm-device-libs
sudo apt install rocm-dev

9 Add environment varibles Add following to your ~/.bashrc file so they can persists:

export HSA_OVERRIDE_GFX_VERSION=9.0.0 #see below for comments
export LD_LIBRARY_PATH=/opt/rocm-5.4.0/lib

ℹ️

The HSA_OVERRIDE_GFX_VERSION=9.0.0 is ciritical for APU GPU to work with ROCm.

Verification

You may consider ROCm installation successful if GPUs are listed with the following commands:

/opt/rocm-5.4.0/bin/rocminfo 
# this works if the environmental variables have been set correctly 
rocminfo

An example output for the APUGPU, notice the Name is "gfx900":

*******                  
Agent 2                  
*******                  
  Name:                    gfx900                             
  Uuid:                    GPU-XX                             
  Marketing Name:                                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      1024(0x400) KB                     
  Chip ID:                 5686(0x1636)                       
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1900                               
  BDFID:                   2816                               
  Internal Node ID:        1                                  
  Compute Unit:            7                                  
  SIMDs per CU:            4                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16777216(0x1000000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx900:xnack-   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

Check ROCm version:

apt show rocm-libs -a

Troubleshooting

There may be rare cases that hardware caused the error. For example, I encountered following error:

hsa_status_error_memory_aperture_violation: the agent attempted to access memory beyond the largest legal address

It turned out that I have additional environment variables for ROCm. Remove them then the error went away.

When a discrete AMD GPU installed, when using igpu, it gives error:

Segmentation fault (core dumped) It seems that the APU will not work properly in such builds (tested on 5700XT or 6700XT). But if the discrete GPU is not AMD brand, the APU GPU can work properly.

Create python virtual environment

Virtual environments are recommended. Anaconda or Miniconda can be used to create virtual Python environment. Refer to the official documentation for the installation.

Create an environment called py39torchamd and activate it:

conda create -n py39torchamd python==3.9  
conda activate py39torchamd

Machine Learning, Deep Learning and Artificial Intellegence

tensorflow

It is recommend to create a separate python virtual environment for tensorflow.

Create an environment called py39torchamd and activate it:

conda create -n py39tfrocm python==3.9
conda activate py39tfrocm

Install some dependencies for Ubuntu:

sudo apt install rocm-libs hipcub miopen-hip          
sudo apt install rccl
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/rccl/lib  
sudo ldconfig

Install Tensorflow-ROCm

pip3 install --user tensorflow-rocm --upgrade

Verification You can check all device list using following code:

python -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()"

It should list the GPU device.

Pytorch

To release the full potential of AMD ROCm platform, ROCm compiled Pytorch is available. Pytorch 2.0 ROCm version may not be stable at the time (May 2023), so Pytorch 1.13.1 is used here.

Install Pytorch-RoCm Use following script to install:

# ensure the virtual environment is activated 
conda activate py39torchamd
pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 torchaudio==0.13.1+rocm5.2 --extra-index-url https://download.pytorch.org/whl/rocm5.2

2 Verification To verify the successful installationa of Pytorch-ROCm, run following script in virtual environment. It should return 'True'.

python -c "import torch; print(torch.cuda.is_available())"

Apu Windows