PYTORCH_CUDA_ALLOC_CONF max_split_size_mb | Shell ( Linux ) 环境下的解决措施

参考文献如下

[1] 通过设置PYTORCH_CUDA_ALLOC_CONF中的max_split_size_mb解决Pytorch的显存碎片化导致的CUDA:Out Of Memory问题
https://blog.csdn.net/MirageTanker/article/details/127998036
[2] shell环境变量说明
https://blog.csdn.net/JOJOY_tester/article/details/90738717

具体解决步骤

报错信息如下:

RuntimeError: CUDA out of memory. Tried to allocate 6.18 GiB (GPU 0; 24.00 GiB total capacity; 11.39 GiB already allocated; 3.43 GiB free; 17.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

计算 reserved – allocated = 17.62 – 11.39 = 6.23 > 6.18 (暂且不用管如何来的，更多说明参考文献[1])

查看CUDA中管理缓存的环境变量

echo $PYTORCH_CUDA_ALLOC_CONF

设置环境变量的值（这里用到6.18这个数了，简单理解6.18表示缓存空间6.18GB）

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:6110

（6110的由来简单理解为6110MB，我们要选择比6.18GB小的最大空间，推荐直接设置为6.1*1000MB）

问题圆满解决，可喜可贺可喜可贺

文章版权归作者所有，未经允许请勿转载。

THE END

文章

PYTORCH_CUDA_ALLOC_CONF max_split_size_mb | Shell ( Linux ) 环境下的解决措施

error An unexpected error occurred: “https://registry.npm..taobao.org/@vue%2fcil: getaddrinfo ENOTFO

CDN体系架构及部署方案探索

【IIS搭建网站】本地电脑做服务器搭建web站点并公网访问「内网穿透」

FPGA远程更新/远程调试的一种简单方法

Apache Maven；会话技术

SAP集成技术（九）集成能力中心（ICC）