HuggingFace | 边际效应 - 杨文博的个人博客

很多人在 DeepSeek-V3/R1 爆火之后，都希望体验本地运行“满血版”模型。但是满血版模型的权重参数文件有 600 多个 G，光权重文件就拆成了 163 个。

当你受不了 HuggingFace 官网的下载速度，用其它方法或者渠道获得了权重文件后，怎么确认这些权重文件是完整无损坏的呢？

这里介绍一个最简单的方法，仅需要 2 行代码。

环境

前提 1，你已经 clone 了不含权重文件的模型 git 仓库。以 DeepSeek-R1 为例，通过下面命令可以仅 clone 代码文件到 DeepSeek-R1 目录下：

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/deepseek-ai/DeepSeek-R1

前提 2，你已经用某种方法下载好了权重文件。请将这些权重文件放到已 clone 的 git 仓库目录内，以 DeepSeek-R1 为例，就是将 163 个 *.safetensors 文件移动到 DeepSeek-R1 目录下。

你也可以不移动权重文件，那么你就需要在执行第 2 行命令前将 checksum 文件移动到权重文件所在目录。

第 1 行代码

获得所有官方权重文件的 sha256 checksum，并保存成一个标准的 checksum 文件。这行代码需要在 git 仓库目录下执行

git lfs ls-files -l | awk '{print $1"  "$3}' > large_files.sha256

这行命令输出的文件内容形如：

c2388e6b127ce6664e35c5e2529c3ce4bfc99f4f7fb6fa48e92b29ed5e4922af  model-00001-of-000163.safetensors
5f450c75da7eb897b74a092eee65df8bb115fce81cccd2bbaeb220bd97197875  model-00002-of-000163.safetensors
...
913177d9e0dfb228769e0a13a386c34b919dcbb32a430ce230979f53bf7ae5bc  model-00163-of-000163.safetensors

第 2 行代码

根据官方权重文件的 checksum，检查本地文件的完整性。这个检查的执行速度会非常慢，因为它需要为每个文件计算 sha256sum，然后再与 checksum 文件做比对。

sha256sum -c large_files.sha256

这行命令的输出形如：

model-00001-of-000163.safetensors: OK
model-00002-of-000163.safetensors: FAILED
...
model-00163-of-000163.safetensors: OK

如果所有行的输出都是 OK，那么恭喜你，所有权重文件都没有损坏；如果有某行输出为 FAILED，就代表该文件没有通过完整性校验，你需要重新下载它。

此方法对所有标记为 LFS 的文件均有效，并不仅限于 *.safetensors 文件，比如量化模型 *gguf 权重文件，也可以同样用此方法校验。

在之前的博客《DeepSeek V3 模型各子模块参数量精算》中，我计算的模型激活参数量跟官方 README_WEIGHT.md 中的说明对不上。之后有读者跟我说，官方更新了激活参数量的数字。我查了一下 commit history，具体修改如下：

可以看到，V3 模型激活参数量从 36.7 改成了 36.6，并且去掉了包含 0.9B Embedding 的说明，那基本上跟我的计算完全对上了。MTP 激活参数量从 2.4B 改成了 1.5B，也去掉了 0.9B 的 Embedding，跟我的计算还是有 0.1B 的差异。

Anyway，这种总量统计只是为了揭示计算的大约规模，有点差异也不影响定性结论。真正有用的是你在拆分 TP、EP 等权重矩阵时，矩阵的形状是多大，要拆多少份，每份大概多大。

为了分析像 DeepSeek V3 这样的超大模型具体参数，我写了一个小脚本，可以将 safetensors 文件里面的权重 Shape 提取出来，并且可以按不同的层级做参数量的聚合计算：

https://github.com/solrex/solrex/blob/master/snippets/show_safetensors.py

#!/usr/bin/env python3
import os
import argparse
import torch

from safetensors import safe_open

def print_tensor_tsv(model_dir, depth):
    '''Print tensor info in .safetensors into tsv format'''
    TENSOR_CLASS = {
        'weight': 'weight',
        'e_score_correction_bias': 'weight',
        'weight_scale_inv': 'scale'
    }
    print('SafetensorsFile\tTensorKey\tTensorParams\tTensorType\tTensorShape')
    safetensor_files = sorted([f for f in os.listdir(model_dir) if f.endswith('.safetensors')])
    summary = {}
    for filename in safetensor_files:
        file_path = os.path.join(model_dir, filename)
        with safe_open(file_path, framework='pt') as f:
            for key in f.keys():
                tensor = f.get_tensor(key)
                print(f'{filename}\t{key}\t{tensor.numel()}\t{tensor.dtype}\t{tensor.shape}')
                lst = key.split('.')
                # Get suffix: .weight or .weight_scale_inv
                tclass = TENSOR_CLASS[lst[-1]]
                # Limit prefix to dep
                dep = min(len(lst), depth+1) if depth > 0 else len(lst)
                # Get summary of prefixes
                for prefix in ['.'.join(lst[:i]) for i in range(0, dep)]:
                    summary[f'{tclass}[{prefix}]'] = summary.get(f'{tclass}[{prefix}]', 0) + tensor.numel()
    for key in sorted(summary):
        print(f'Summary\t{key}\t{summary[key]}\t\t')

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Print tensor shape and dtype of .safetensors file')
    parser.add_argument('model_dir', nargs='?', default='.', help='Model directory (default: $PWD)')
    parser.add_argument('--summary_depth', '-d', type=int, default=3, help='Summary depth of weights')
    args = parser.parse_args()
    print_tensor_tsv(args.model_dir, args.summary_depth)

在 HuggingFace 模型根目录下执行 ./show_safetensors.py ，即可获得当前模型的所有权重 Shape 和前 3 层的聚合权重规模。可以通过 “-d” 参数调整最大聚合的层级。输出的文件是 tsv 格式的，可以导入表格进行再计算。

以下是使用 show_safetensors.py 分析 DeepSeek V3 参数量的示例：

$ ./show_safetensors.py -d 2
SafetensorsFile	TensorKey	TensorParams	TensorType	TensorShape
model-00001-of-000163.safetensors	model.embed_tokens.weight	926679040	torch.bfloat16	torch.Size([129280, 7168])
model-00001-of-000163.safetensors	model.layers.0.input_layernorm.weight	7168	torch.bfloat16	torch.Size([7168])
...
model-00163-of-000163.safetensors	model.layers.61.shared_head.head.weight	926679040	torch.bfloat16	torch.Size([129280, 7168])
model-00163-of-000163.safetensors	model.layers.61.shared_head.norm.weight	7168	torch.bfloat16	torch.Size([7168])
Summary	scale[]	41540496
Summary	scale[model.layers]	41540496
Summary	scale[model]	41540496
Summary	weight[]	684489845504
Summary	weight[lm_head]	926679040
Summary	weight[model.embed_tokens]	926679040
Summary	weight[model.layers]	682636480256
Summary	weight[model.norm]	7168
Summary	weight[model]	683563166464

可以看到第一列为文件名（像 model-00001-of-000163.safetensors）的行是该文件中的具体权重信息，包含 Shape 信息；第一列为 Summary 的行，是根据模型的 tensor key 名字结构，例如 “model.layers.0.input_layernorm.weight”，按照 “.” 切成前缀，按照前缀聚合模型参数量的结果，不包含 Shape 信息。

标签： HuggingFace

2 行代码校验大模型（如DeepSeek-R1）权重文件下载完整性

环境

第 1 行代码

第 2 行代码

DeepSeek 官方修正了 V3 的激活参数量说明