# GPU 使用指南

# 一、安装 NVIDIA 显卡驱动

使用 GPU 前需安装 NVIDIA 驱动:

  • Linux:版本 ≥ 525.60.13
  • Windows:版本 ≥ 528.33

# 二、安装 CUDA 和 cuDNN

建议安装以下版本:

  • CUDA:v12.4
  • cuDNN:v8.9.7

下载安装链接:

安装好之后需要重启电脑,验证安装是否成功:

nvcc -V

若输出包含 v12.4,则说明安装成功。

# 三、添加 GPU 离线依赖(推荐)

仅适用于Pytorch引擎的模型

默认情况下,DJL 会自动下载 GPU 依赖,但速度较慢,建议手动添加 Maven 依赖:

# Windows(x86_64)

<dependency>
    <groupId>ai.djl.pytorch</groupId>
    <artifactId>pytorch-native-cu124</artifactId>
    <classifier>win-x86_64</classifier>
    <version>2.5.1</version>
    <scope>runtime</scope>
</dependency>

<dependency>
    <groupId>ai.djl.pytorch</groupId>
    <artifactId>pytorch-jni</artifactId>
    <version>2.5.1-0.32.0</version>
    <scope>runtime</scope>
</dependency>

# Linux(x86_64)

<dependency>
    <groupId>ai.djl.pytorch</groupId>
    <artifactId>pytorch-native-cu124</artifactId>
    <classifier>linux-x86_64</classifier>
    <version>2.5.1</version>
    <scope>runtime</scope>
</dependency>

<dependency>
    <groupId>ai.djl.pytorch</groupId>
    <artifactId>pytorch-jni</artifactId>
    <version>2.5.1-0.32.0</version>
    <scope>runtime</scope>
</dependency>

# 四、配置系统环境变量(Windows)

# 代码中指定GPU

SmartJavaAI 默认使用 CPU。如需使用 GPU,需要手动指定设备类型:

FaceModelConfig config = new FaceModelConfig();
config.setModelEnum(FaceModelEnum.RETINA_FACE); // 人脸模型
config.setDevice(DeviceEnum.GPU);// 指定 GPU
FaceModel faceModel = FaceModelFactory.getInstance().getModel(config);

首次运行时,程序会自动解压依赖库,你将看到如下日志,即使后面有报错也没有关系:

即使随后程序抛出异常也无需担心,此步骤的目的是为了完成依赖文件解压到缓存路径。

 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/asmjit.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/c10.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/c10_cuda.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/caffe2_nvrtc.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cublas64_12.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cublasLt64_12.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cudart64_12.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cudnn64_9.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cudnn_adv64_9.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cudnn_cnn64_9.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cudnn_engines_precompiled64_9.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cudnn_engines_runtime_compiled64_9.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cudnn_graph64_9.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cudnn_heuristic64_9.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cudnn_ops64_9.dll to cache ...
 [main] INFO  ai.djl.pytorch.jni.LibUtils - Extracting pytorch/cu124/win-x86_64/cufft64_11.dll to cache ...

# 缓存目录说明

系统 缓存目录
Windows C:/Users/{user}/smartjavaai_cache
Linux /root/smartjavaai_cache
macOS /Users/{user}/smartjavaai_cache

# 配置步骤

  1. 打开缓存路径,定位至目录:

    pytorch/2.5.1-20241113-cu124-win-x86_64
    

注意事项

如果在缓存目录中找不到 pytorch/2.5.1-20241113-cu124-win-x86_64 目录,请检查前面的步骤是否完成

  1. 将该目录添加到 系统环境变量 PATH 中。
  2. 删除原有 CUDA 路径,避免冲突。
  3. 修改环境变量后一定要重启你的IDE或者重启电脑。

示例图:

5、前面的步骤操作完成后,重新运行程序,运行成功示例:

# 五、Seetaface6模型GPU使用指南

  • 1、使用 Seetaface6 模型需要安装CUDAv11.6.2
  • 2、将CUDA加入到系统环境变量(PATH)中

按照如上的步骤,即可正常使用Seetaface6的GPU模式

# 六、OCR模块GPU使用指南

OCR 模块使用的推理引擎为 ONNX Runtime。在完成前述 GPU 配置步骤(前 4 步)后,还需执行以下操作以启用 GPU :

  • 1、排除onnxruntime的CPU版本
  • 2、引用onnxruntime_gpu

注意: 如果项目中还引入了其他 SmartJavaAI 模块,务必确保统一排除其传递依赖中的 onnxruntime(CPU 版本),否则可能导致运行时冲突或 GPU 加速失效。

<dependency>
   <groupId>cn.smartjavaai</groupId>
   <artifactId>smartjavaai-ocr</artifactId>
   <scope>runtime</scope>
   <exclusions>
      <exclusion>
         <groupId>com.microsoft.onnxruntime</groupId>
         <artifactId>onnxruntime</artifactId>
      </exclusion>
   </exclusions>
</dependency>
<dependency>
   <groupId>com.microsoft.onnxruntime</groupId>
   <artifactId>onnxruntime_gpu</artifactId>
   <version>1.20.0</version>
   <scope>runtime</scope>
</dependency>

# 七、常见错误与解决方法

# 示例错误日志1:

ai.djl.engine.EngineException: Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. 
This could be because the operator doesn't exist for this backend, 
or was omitted during the selective/custom build process (if using custom build). 
If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 
'aten::empty_strided' is only available for these backends: [CPU...].

问题原因: 安装的cuda/cudnn版本不匹配

解决方案: 请使用文档中要求的版本安装

# 示例错误日志2:

Caused by: java.lang.UnsatisfiedLinkError: C:\Users\Administrator\smartjavaai_cache\pytorch\2.5.1-20241113-cu124-win-x86_64\torch_cuda.dll: Can't find dependent libraries
at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)
at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2437)
at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2494)
at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2694)
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2624)
at java.base/java.lang.Runtime.load0(Runtime.java:765)
at java.base/java.lang.System.load(System.java:1852)
at ai.djl.pytorch.jni.LibUtils.loadNativeLibrary(LibUtils.java:379)
at ai.djl.pytorch.jni.LibUtils.loadLibTorch(LibUtils.java:195)
at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:82)
at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53)
... 39 more

问题原因: cuda环境变量配置不正确

解决方案: 可以查看配置系统环境变量

# 示例错误日志3:

Caused by: java.lang.Exception: Compute device gpu has no memory device registered. Please call RegisterMemoryDevice firstly.
 at com.seeta.sdk.FaceDetector.construct(Native Method)
 at com.seeta.sdk.FaceDetector.<init>(FaceDetector.java:17)
 at com.seeta.pool.FaceDetectorPool$1.makeObject(FaceDetectorPool.java:37)
 at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:566)
 at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:306)
 at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:233)
 at cn.smartjavaai.face.model.facerec.SeetaFace6Model.extractFeatures(SeetaFace6Model.java:853)
 ... 29 more

问题原因: Seetaface6没有正确加载到gpu的依赖库

解决方案: 请使用SmartJavaAI最新版本,历史版本有可能存在兼容性问题

# 示例错误日志4:

java.lang.UnsatisfiedLinkError: C:\Users\Administrator\smartjavaai_cache\seetaface6\tennis.dll: Can't find dependent libraries

问题原因: 使用Seetaface6模型,cuda未安装或版本不正确

解决方案: 请安装cuda v11.6.2版本,并配置系统环境变量