XPU Verify Tool

The xpu-verify tool provides a comprehensive set of tests and automated fixes to help ensure that Intel® discrete GPUs have been set up correctly on Linux* operating systems (OS). The tool supports various distributions, such as Ubuntu* 20.04 and Ubuntu* 22.04, Red Hat Enterprise Linux* (RHEL) OS, and Fedora Linux* OS.

Prerequisites

Note

Setup

  1. Log in to Intel® Tiber™ AI Cloud.

  2. In the left side menu, click Catalog > Hardware.

  3. In the Hardware tab, select an instance from the options.

  1. Check the filter for Bare metal type.

  2. Check the filter for GPU processor.

  3. Click Select to select a GPU compute instance. In this example, we use the Intel® Max Series GPU.

  4. Connnect to your instance via SSH.

  5. Continue in the next section.

Installation

Clone the Intel GPU sanity tests repository and navigate to the directory:

git clone https://github.com/unrahul/xpu_verify && cd xpu_ver

Usage

For help, run:

./xpu_verify.sh help

Check System Setup

To check if the system is set up correctly for Intel discrete GPUs, run the script with the -c option:

./xpu_verify.sh -c

Fix System Setup

To fix and augment the system setup with essential tools and libraries for Intel discrete GPUs, run the script with the -f option:

./xpu_verify.sh -f

Upon successful completion, a dialog appears: Which services should be restarted? To accept all defaults, press the Tab key to navigate to OK and press Enter.

Check and Fix System Setup

To check and fix the system setup for Intel discrete GPUs, run the script with the -p option:

./xpu_verify.sh -p

AI Libraries Installation

To install specific AI packages with XPU support (e.g., openvino_xpu, pytorch_xpu, tensorflow_xpu, ai_xpu), run:

./xpu_verify.sh -i pkg1, pkg2,...

Supported Tests

You can perform the following tests.

Linux Kernel i915 Module and Graphics Microcode

This test checks if the Linux Kernel i915 module is loaded and the Graphics microcode for the GPU is loaded.

./check_device.sh

Check OS kernel and version

./check_os_kernel.sh

Compute Drivers

This test checks if the necessary Intel compute drivers are installed.

./check_compute_drivers.sh

GPU Devices Listing

This test verifies if sycl-ls can list the GPU devices for OpenCL` and Level-Zero backends. The oneAPI basekit is required for this test.

./syclls.sh --force

Check if Intel basekit is installed

./check_intel_basekit.sh

SYCL Programs Compilation

This test checks if sycl programs can be compiled using icpx. The oneAPI basekit is required for this test.

./check_sycl.sh

Check scaling governer

./scaling_governor.sh

PyTorch and TensorFlow XPU Device Detection

This test checks if PyTorch* software and TensorFlow* software can detect the XPU device and run workloads using the device. Docker computer software is required for this test.

Tip

The test_tensorflow and pytorch scripts only work using a Docker image of the tensorflow and pytorch frameworks. The main purpose is to verify that these frameworks can access the GPU. Results may differ based on your python env.

For PyTorch:

./check_pytorch.sh

For TensorFlow:

./check_tensorflow.sh

Additional Checks

Check if network proxy is setup, print the proxy, remove proxy settings and restore proxy settings:

./proxy_helper.sh