记录一下最近跑TinaFace代码在原来服务器跑没有问题,新服务器跑遇到的错误
首先,按照官网步骤安装相关包:
本人环境:
显卡驱动版本: NVIDIA-SMI 460.73.01 Driver Version: 460.106.00 CUDA Version: 11.2
CUDA版本:nvcc -V: Cuda compilation tools, release 11.1, V11.1.74
pytorch 1.8.1 torchvision 0.9.1 mmcv-full 1.4.6mmdet 2.22.0cudatoolkit 11.1.1
ps:如果没有安装上mmcv或者mmdet,不要怀疑,肯定是你的版本有问题。这个问题博主也遇到了。
检查上面版本,通常来讲,没有任何问题。
cudatoolkit严格按照cuda版本安装的,mmdet也是根据cuda版本和pytorch版本安装的。
不出意外的话出意外了,最后一步执行命令 pip install -v -e .
编辑vedadet报错:
Installing collected packages: vedadet Running setup.py develop for vedadet Running command /mnt/data/cbm/software/anaconda3/envs/lw/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/mnt/data1/lw/vedadet-main/setup.py'"'"'; __file__='"'"'/mnt/data1/lw/vedadet-main/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps running develop running egg_info writing vedadet.egg-info/PKG-INFO writing dependency_links to vedadet.egg-info/dependency_links.txt writing requirements to vedadet.egg-info/requires.txt writing top-level names to vedadet.egg-info/top_level.txt reading manifest file 'vedadet.egg-info/SOURCES.txt' adding license file 'LICENSE' writing manifest file 'vedadet.egg-info/SOURCES.txt' running build_ext building 'vedadet.ops.nms.nms_ext' extension Emitting ninja build file /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/1] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/vedadet/ops/nms/src/cuda/nms_kernel.o.d -DWITH_CUDA -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/TH -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/mnt/data/cbm/software/anaconda3/envs/lw/include/python3.8 -c -c /mnt/data1/lw/vedadet-main/vedadet/ops/nms/src/cuda/nms_kernel.cu -o /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/vedadet/ops/nms/src/cuda/nms_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=nms_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14 FAILED: /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/vedadet/ops/nms/src/cuda/nms_kernel.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/vedadet/ops/nms/src/cuda/nms_kernel.o.d -DWITH_CUDA -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/TH -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/mnt/data/cbm/software/anaconda3/envs/lw/include/python3.8 -c -c /mnt/data1/lw/vedadet-main/vedadet/ops/nms/src/cuda/nms_kernel.cu -o /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/vedadet/ops/nms/src/cuda/nms_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=nms_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14 nvcc fatal : Unsupported gpu architecture 'compute_86' ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1667, in _run_ninja_build subprocess.run( File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "", line 1, in <module> File "/mnt/data1/lw/vedadet-main/setup.py", line 119, in <module> setup( File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup return distutils.core.setup(**attrs) File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/core.py", line 148, in setup dist.run_commands() File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run self.install_for_development() File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/setuptools/command/develop.py", line 114, in install_for_development self.run_command('build_ext') File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run _build_ext.run(self) File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run _build_ext.build_ext.run(self) File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 708, in build_extensions build_ext.build_extensions(self) File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions _build_ext.build_ext.build_extensions(self) File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions self._build_extensions_serial() File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial self.build_extension(ext) File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension _build_ext.build_extension(self, ext) File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension objects = self.compiler.compile(sources, File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 529, in unix_wrap_ninja_compile _write_ninja_file_and_compile_objects( File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1354, in _write_ninja_file_and_compile_objects _run_ninja_build( File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1683, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error compiling objects for extensionERROR: Command errored out with exit status 1: /mnt/data/cbm/software/anaconda3/envs/lw/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/mnt/data1/lw/vedadet-main/setup.py'"'"'; __file__='"'"'/mnt/data1/lw/vedadet-main/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.
总结一下,就是报这个错误:nvcc fatal : Unsupported gpu architecture 'compute_86'
意思就是我的GPU算力架构太高了,不支持computer_86。博主GPU算力是3090显卡
网上很多博主说pytorch暂不支持computer_86,有很多建议说是将算力改为computer_75等。
这种自降身价的事,博主是不会做,万一改不回来了,岂不是亏大了。而且也有很多3090显卡的博主降算力后,推理代码时出现各种问题。
解决方案:
虽然 nvcc -V 显示的版本是11.1,但是cuda有个编译版本:一般在/usr/loacal/cuda/
文件夹下:
这可以可以发现,本地安装了多个cuda版本。cuda文件是编译时采用的cuda版本文件的软件链接。即虽然是nvcc -V版本是11.1,但是编译版本是cuda10.2,导致出错。
临时解决方案:
将conda环境从cuda11.1版本降低到10.2,执行指令pip install -v -e .
能够编译完成,但是执行代码时候会报错:
UserWarning: GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. If you want to use the GeForce RTX 30
pytorch 3090显卡对应的cuda版本过低,可以直接将conda环境中的cuda版本升级到11.1,临时解决
注意:虽然两次都是报错cuda问题,第一次需要重新加载编辑本地代码,因此执行了编辑时的cuda版本,但是代码运行时执行是nvcc -V的版本。又会报版本过低。
但是在使用mmdetection框架时,通常都会需要自己设计网络框架后,需要重新编辑本地代码,同样还是需要执行pip install -v -e .
问题不会被解决。
永久解决方案:
将编译时用到的cuda版本升级到11.1
相关参考博客如下:
nvcc fatal : Unsupported gpu architecture ‘compute_86‘
安装CUDA时,nvcc –version和cat /usr/local/cuda/version.txt版本不一致