DeepSeek 官方修正了 V3 的激活参数量说明 | 边际效应

在之前的博客《DeepSeek V3 模型各子模块参数量精算》中，我计算的模型激活参数量跟官方 README_WEIGHT.md 中的说明对不上。之后有读者跟我说，官方更新了激活参数量的数字。我查了一下 commit history，具体修改如下：

可以看到，V3 模型激活参数量从 36.7 改成了 36.6，并且去掉了包含 0.9B Embedding 的说明，那基本上跟我的计算完全对上了。MTP 激活参数量从 2.4B 改成了 1.5B，也去掉了 0.9B 的 Embedding，跟我的计算还是有 0.1B 的差异。

Anyway，这种总量统计只是为了揭示计算的大约规模，有点差异也不影响定性结论。真正有用的是你在拆分 TP、EP 等权重矩阵时，矩阵的形状是多大，要拆多少份，每份大概多大。

为了分析像 DeepSeek V3 这样的超大模型具体参数，我写了一个小脚本，可以将 safetensors 文件里面的权重 Shape 提取出来，并且可以按不同的层级做参数量的聚合计算：

https://github.com/solrex/solrex/blob/master/snippets/show_safetensors.py

#!/usr/bin/env python3
import os
import argparse
import torch

from safetensors import safe_open

def print_tensor_tsv(model_dir, depth):
    '''Print tensor info in .safetensors into tsv format'''
    TENSOR_CLASS = {
        'weight': 'weight',
        'e_score_correction_bias': 'weight',
        'weight_scale_inv': 'scale'
    }
    print('SafetensorsFile\tTensorKey\tTensorParams\tTensorType\tTensorShape')
    safetensor_files = sorted([f for f in os.listdir(model_dir) if f.endswith('.safetensors')])
    summary = {}
    for filename in safetensor_files:
        file_path = os.path.join(model_dir, filename)
        with safe_open(file_path, framework='pt') as f:
            for key in f.keys():
                tensor = f.get_tensor(key)
                print(f'{filename}\t{key}\t{tensor.numel()}\t{tensor.dtype}\t{tensor.shape}')
                lst = key.split('.')
                # Get suffix: .weight or .weight_scale_inv
                tclass = TENSOR_CLASS[lst[-1]]
                # Limit prefix to dep
                dep = min(len(lst), depth+1) if depth > 0 else len(lst)
                # Get summary of prefixes
                for prefix in ['.'.join(lst[:i]) for i in range(0, dep)]:
                    summary[f'{tclass}[{prefix}]'] = summary.get(f'{tclass}[{prefix}]', 0) + tensor.numel()
    for key in sorted(summary):
        print(f'Summary\t{key}\t{summary[key]}\t\t')

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Print tensor shape and dtype of .safetensors file')
    parser.add_argument('model_dir', nargs='?', default='.', help='Model directory (default: $PWD)')
    parser.add_argument('--summary_depth', '-d', type=int, default=3, help='Summary depth of weights')
    args = parser.parse_args()
    print_tensor_tsv(args.model_dir, args.summary_depth)

在 HuggingFace 模型根目录下执行 ./show_safetensors.py ，即可获得当前模型的所有权重 Shape 和前 3 层的聚合权重规模。可以通过 “-d” 参数调整最大聚合的层级。输出的文件是 tsv 格式的，可以导入表格进行再计算。

以下是使用 show_safetensors.py 分析 DeepSeek V3 参数量的示例：

$ ./show_safetensors.py -d 2
SafetensorsFile	TensorKey	TensorParams	TensorType	TensorShape
model-00001-of-000163.safetensors	model.embed_tokens.weight	926679040	torch.bfloat16	torch.Size([129280, 7168])
model-00001-of-000163.safetensors	model.layers.0.input_layernorm.weight	7168	torch.bfloat16	torch.Size([7168])
...
model-00163-of-000163.safetensors	model.layers.61.shared_head.head.weight	926679040	torch.bfloat16	torch.Size([129280, 7168])
model-00163-of-000163.safetensors	model.layers.61.shared_head.norm.weight	7168	torch.bfloat16	torch.Size([7168])
Summary	scale[]	41540496
Summary	scale[model.layers]	41540496
Summary	scale[model]	41540496
Summary	weight[]	684489845504
Summary	weight[lm_head]	926679040
Summary	weight[model.embed_tokens]	926679040
Summary	weight[model.layers]	682636480256
Summary	weight[model.norm]	7168
Summary	weight[model]	683563166464

可以看到第一列为文件名（像 model-00001-of-000163.safetensors）的行是该文件中的具体权重信息，包含 Shape 信息；第一列为 Summary 的行，是根据模型的 tensor key 名字结构，例如 “model.layers.0.input_layernorm.weight”，按照 “.” 切成前缀，按照前缀聚合模型参数量的结果，不包含 Shape 信息。

相关阅读

发表回复 取消回复

发表回复取消回复