类型一:安装包问题
1、包内部代码报错,如:某方法没有
检查版本:
python -> import 包名 -> 包名.__version__ -> 包名.__file__
若import 的包名与 pip安装的包的路径不一致,则copy过去
具体:
1、key llama error transformers问题,需安装4.28
2、deepspeed没有adam_cuda属性,需要安装0.8.3 (torch的版本要1.12.1的)
类型二:目录没有权限
sudo chmod 777 dir
sudo chmod 777 dir/* (目录下的文件负最大权限)
类型三、python使用版本不对
sudo python与python用的不是同一个python
解决办法:
iii. sudo cp /usr/bin/python /usr/bin/python_bak
iv. sudo rm /usr/bin/python
v. sudo ln -s /opt/conda/bin/python /usr/bin/python
类型三、cpu与gpu上的半精度问题
def normalize(x, axis=-1):
x = 1. * x / (torch.norm(x, 2, axis, keepdim=True).expand_as(x)+1e-12)
return x
embed 目前在显卡上且是半精度
normalize(embed_f).cpu() 与normalize(embed_f.cpu()) 的结果会不一样,后者会改变精度
解决办法:采用normalize(embed_f.cpu().float())
tips:这三种结果都有细微 的差别,只是normalize(embed_f.cpu())差别很明显
numpy与tensor 取值问题:
[str(i.item()) for i in normalize(embed_f[0]).cpu().numpy()]result:['0.02435302734375', '0.0052337646484375'][str(i) for i in normalize(embed_f[0]).cpu().numpy()]result:['0.02435', '0.005234']
cuda error:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
解决办法:python 前面加入CUDA_VISIBLE_DEVICES=3
llama1 加入这一串也能解决:
if tokenizer.pad_token is None:DEFAULT_PAD_TOKEN = "[PAD]"DEFAULT_EOS_TOKEN = ""DEFAULT_BOS_TOKEN = ""DEFAULT_UNK_TOKEN = ""tokenizer.add_special_tokens({"eos_token": DEFAULT_EOS_TOKEN,"bos_token": DEFAULT_BOS_TOKEN,"unk_token": DEFAULT_UNK_TOKEN,"pad_token": DEFAULT_PAD_TOKEN,})tokenizer.add_eos_token = False