由于GPU机器重启后gpu的device并不会主动挂载,所以需要开机后执行一个脚本,开机自动挂载,以便于后面Docker进行挂载。执行的脚本gpu-service如下:
#!/bin/bash
/sbin/modprobe nvidia
if [ "$?" -eq 0 ]; then
# Count the number of NVIDIA controllers found.
NVDEVS=`lspci | grep -i NVIDIA`
N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`
N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i
done
mknod -m 666 /dev/nvidiactl c 195 255
else
exit 1
fi
/sbin/modprobe nvidia-uvm
if [ "$?" -eq 0 ]; then
# Find out the major device number used by the nvidia-uvm driver
D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
mknod -m 666 /dev/nvidia-uvm c $D 0
else
exit 1
fi
需要加入新的system服务,方法为
touch /etc/systemd/system/gpu.service
chmod 664 /etc/systemd/system/gpu.service
修改gpu.service文件为
[Unit]
Description=auto run gpu construct
[Service]
Type=simple
ExecStart=/usr/sbin/gpu-service
[Install]
WantedBy=multi-user.target
将gpu-service脚本拷贝到/usr/sbin/gpu-service
mv gpu-service usr/sbin/
chmod 554 /usr/sbin/gpu-service
通过systemctl命令,将gpu-service作为开机自启动命令
systemctl daemon-reload
systemctl enable gpu.service