APU on Linux
'Make' a 16 GB VRAM GPU
When there is discrete GPU in the build: AMD APU + AMD discrete GPU system may not able to utilize the Radeon™ Graphics. AMD APU or AMD APU + discrete Nvidia GPU works.
Following were tested on both 4600G and 5600G. 5700G should also work. Products details.
hardware requirements:
- APU (older or newer generations than 4600G, 5600G, 5700G have not been tested.)
- AM4 motherboard which supports the APU,
- The system is recommended to have at least 32 GB RAM installed. It may not allow to allocate 16 GB RAM if less than 32 GB RAM is available.
My hardware in test:
- MSI B450-A PRO MAX.
- AMD 4600g (with stock cooler)
- 2x16GB RAM (32GB is enough unless larger RAM needed, e.g. GPT-J-6B loading)
- 2x16GB RAM Dominator
- 2TB crucial P1 ssd
- 650W PSU
Motherboard BIOS setting for APU iGPU
Motherboard brand, BIOS version, ROCm version may affect your mileage. Below is a summary of the tested systems.
Motherboard Brand | BIOS version | Results | Notes |
---|---|---|---|
MSI B450-A PRO MAX | E7B86AMS.MC0 | working | tested latest BIOS doesn't work (not able to detect gpu) |
ASUS ROG Strix B450-A | Latest | working | |
Gigabyte B450M DS3H Wifi AM4 | Latest | working |
Allocate 16GB system RAM to the APU in BIOS
Enter the BIOS mode after pressing power on button. A common way to enter BIOS setup is to press a specific key or a combination of keys during the startup process, before the Windows logo appears. The key or keys may differ depending on your computer, but some of the most common ones are F1, F2, F10, F12, Del, or Esc.
Within motherboard BIOS, enable ‘Above 4G encoding’ and auto Re-Size BAR support. There are setting to choose primary graphic, igpu or discrete PCIE gpu. Choose igpu. There is also a mode for 'GAME' or 'FORCE'. 'Force' mode allows you to customize the dedicated iGPU VRAM. There are multiple options to set frame buffer size: AUTO,1,2,4,8,16 GB. Please try to upgrade your BIOS version if you can not see the 8 or 16 GB options. Choose 16 GB to allocate 16 GB dedicated RAM to the iGPU.
To verify, in your terminal,
sudo dmesg | grep drm | grep amdgpu
example output:
[ 2.949962] [drm] amdgpu version: 5.18.13
[ 2.968548] [drm] amdgpu: 16384M of VRAM memory ready
[ 2.968549] [drm] amdgpu: 15946M of GTT memory ready
ROCm installation
AMD ROCm (Radeon Open Compute platform) is a software platform designed to enable developers to create high-performance computing applications that can harness the full potential of AMD GPUs (Graphics Processing Units) and CPUs (Central Processing Units). It is an open-source platform that provides a wide range of tools, libraries, and support for developing applications in a variety of programming languages, including C++, Python, and Fortran.
The primary goal of ROCm is to provide a unified and consistent programming environment for both AMD GPUs and CPUs, which enables developers to create high-performance applications that can take full advantage of the performance capabilities of both types of processors. With support for advanced features such as HIP (Heterogeneous-compute Interface for Portability) and OpenCL (Open Computing Language), ROCm provides a flexible and powerful platform for creating a wide range of applications, from machine learning and data analytics to scientific computing and gaming.
Overall, AMD ROCm is a powerful and versatile platform that is well-suited for developers who want to create high-performance computing applications that can run on both GPUs and CPUs, and it is widely used in the scientific computing and machine learning communities.
ROCm is supported on Linux platform (Ubuntu/Debian, RHEL/CentOS, SLES/OpenSUSE). Recently AMD is adding supports to Windows, but it is not mature yet.
Installation instructions on Ubuntu
To install ROCm for APU, the official documentation should be followed. We have tested ROCm 5.4.0 and ROCm 5.6.0.
For video turorial for installation of ROCm, please check out the video tutorials (opens in a new tab).
Below is instructions for installing ROCm 5.4.0 (opens in a new tab) on Ubuntu (20.04.5 LTS). For other Linux distributions and other ROCm versions, please check the official documentation (opens in a new tab).
1 Install required tools and packages
sudo apt-get update
sudo apt-get install libnuma-dev libncurses5
sudo apt-get install wget gnupg2 gawk curl
sudo reboot
2 Setting Permissions for Groups
This step is needed so add any current user can access GPU resources.
sudo usermod -a -G render $LOGNAME
sudo usermod -a -G video $LOGNAME
3 Check if kernel headers and development packages are installed on the system:`
sudo dpkg -l | grep linux-headers
sudo dpkg -l | grep linux-modules-extra
4 Add the gpg key:
curl -fsSL https://repo.radeon.com/rocm/rocm.gpg.key | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/rocm-keyring.gpg
5 Add the AMDGPU repository:
echo 'deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/amdgpu/5.4/ubuntu focal main' | sudo tee /etc/apt/sources.list.d/amdgpu.list
6 Install the kernel-mode driver and reboot the system:
sudo apt-get update
sudo apt install amdgpu-dkms
sudo reboot
7 Add the ROCm repository:
echo 'deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/5.4 focal main' | sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt-get update
8 Install the single-version ROCm meta-packages:
sudo apt install rocminfo rocm-smi rocm-device-libs
sudo apt install rocm-dev
9 Add environment varibles Add following to your ~/.bashrc file so they can persists:
export HSA_OVERRIDE_GFX_VERSION=9.0.0 #see below for comments
export LD_LIBRARY_PATH=/opt/rocm-5.4.0/lib
The HSA_OVERRIDE_GFX_VERSION=9.0.0 is ciritical for APU GPU to work with ROCm.
Verification
You may consider ROCm installation successful if GPUs are listed with the following commands:
/opt/rocm-5.4.0/bin/rocminfo
# this works if the environmental variables have been set correctly
rocminfo
An example output for the APUGPU, notice the Name is "gfx900":
*******
Agent 2
*******
Name: gfx900
Uuid: GPU-XX
Marketing Name:
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 1024(0x400) KB
Chip ID: 5686(0x1636)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1900
BDFID: 2816
Internal Node ID: 1
Compute Unit: 7
SIMDs per CU: 4
Shader Engines: 1
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16777216(0x1000000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx900:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Check ROCm version:
apt show rocm-libs -a
Troubleshooting
- There may be rare cases that hardware caused the error. For example, I encountered following error:
hsa_status_error_memory_aperture_violation: the agent attempted to access memory beyond the largest legal address
It turned out that I have additional environment variables for ROCm. Remove them then the error went away.
- When a discrete AMD GPU installed, when using igpu, it gives error:
Segmentation fault (core dumped) It seems that the APU will not work properly in such builds (tested on 5700XT or 6700XT). But if the discrete GPU is not AMD brand, the APU GPU can work properly.
Create python virtual environment
Virtual environments are recommended. Anaconda or Miniconda can be used to create virtual Python environment. Refer to the official documentation for the installation.
Create an environment called py39torchamd and activate it:
conda create -n py39torchamd python==3.9
conda activate py39torchamd
Machine Learning, Deep Learning and Artificial Intellegence
tensorflow
It is recommend to create a separate python virtual environment for tensorflow.
Create an environment called py39torchamd and activate it:
conda create -n py39tfrocm python==3.9
conda activate py39tfrocm
- Install some dependencies for Ubuntu:
sudo apt install rocm-libs hipcub miopen-hip
sudo apt install rccl
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/rccl/lib
sudo ldconfig
- Install Tensorflow-ROCm
pip3 install --user tensorflow-rocm --upgrade
- Verification You can check all device list using following code:
python -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()"
It should list the GPU device.
Pytorch
To release the full potential of AMD ROCm platform, ROCm compiled Pytorch is available. Pytorch 2.0 ROCm version may not be stable at the time (May 2023), so Pytorch 1.13.1 is used here.
- Install Pytorch-RoCm Use following script to install:
# ensure the virtual environment is activated
conda activate py39torchamd
pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 torchaudio==0.13.1+rocm5.2 --extra-index-url https://download.pytorch.org/whl/rocm5.2
2 Verification To verify the successful installationa of Pytorch-ROCm, run following script in virtual environment. It should return 'True'.
python -c "import torch; print(torch.cuda.is_available())"