Caffe 神经网络配置 - All in one network | 边际效应

很多人使用 Caffe 配置神经网络的时候，习惯于撰写两个配置文件，一个叫 train_val.prototxt，在训练的时候使用；一个叫 deploy.prototxt，在预测的时候使用。这两个文件的本质区别，往往在输入、输出层不同。train_val.prototxt 里包含 train/test 的输入数据和标签，但出于效率考虑，train/test 都是分 batch 进行的，而输出的往往是 acc/loss；deploy.prototxt 里只包含 test 的输入，而且一般是每次输入一个数据（没有标签），输出的也不是 acc/loss，而是预测值（Top N 类别或者预测概率）。可以把 deploy.prototxt 看成可以往线上部署的网络配置文件，来一个用户请求，执行 network 的 forward，预测返回给用户结果。

这样做没什么不可以，而且很多开源的例子都是这么做的。但实际操作中，有一个很麻烦的地方是，当你在频繁调整模型的时候，每次修改隐层都要同时修改两个 .prototxt 让人很烦恼。Caffe 的配置文件不像 Keras 那样，每层就是简单的一行代码，而是一个 Protobuf 的 txt message，有很多行，这样电脑的一屏显示不全，就需要花精力去仔细 diff 两个文件。

其实我们有更好的办法，使用 Caffe 的 proto 协议实现 All in one network。那就是充分利用 NetStateRule 这个结构，结合 phase 和 stage/not_stage，实现不同场合下 layer 的过滤。

message NetStateRule {
  // Set phase to require the NetState have a particular phase (TRAIN or TEST)
  // to meet this rule.
  optional Phase phase = 1;

  // Set the minimum and/or maximum levels in which the layer should be used.
  // Leave undefined to meet the rule regardless of level.
  optional int32 min_level = 2;
  optional int32 max_level = 3;

  // Customizable sets of stages to include or exclude.
  // The net must have ALL of the specified stages and NONE of the specified
  // "not_stage"s to meet the rule.
  // (Use multiple NetStateRules to specify conjunctions of stages.)
  repeated string stage = 4;
  repeated string not_stage = 5;
}

以 Caffe 里的 example/minist/lenet_train_test.prototxt 为例，那怎么把它改成 all in one 的 prototxt 呢？

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}

首先，我们要明确解决的是 TEST phase 的冲突（验证集和测试集的 input/output 不同），不用去管 TRAIN phase。而为了解决 TEST phase 的冲突，就需要通过为 NetStateRule 增加参数来实现。min_level/max_level 和 stage/not_stage 都可以做这个事情，但我习惯用 stage，因为文字看起来比数字更直观一些。所以我会在原来的 train_val.prototxt 里再增加一个 TEST 输入层，通过 stage 区分不同的应用场景，如下所示：

layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
    not_stage: "predict"    # 在 predict 时过滤掉这一层
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
# 增加 deploy 的输入层
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { shape: { dim: 1 dim: 1 dim: 28 dim: 28 } }
  include {
    phase: TEST
    stage: "predict"    # 在 predict 时加上这一层
  }
}

在 caffe.bin train 时，由于 solver.prototxt 没有提供特殊的参数，所以只包含 batch_size 100 的 TEST 输入层；在预测的时候，设置 stage='predict' 参数（设置方式下文有介绍），网络的输入层就变成了 dim: 1 的 TEST 输入层了。

同理，对输出层也是一样，在 loss layer 加上 exclude stage: "predict" 的参数，预测时就无需提供 label 和计算 loss 了，如下所示：

layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {               #
    phase: TEST           #
    not_stage: "predict"  # 在 predict 时过滤掉 accuracy 层
  }                       #
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
  exclude {           # 注意是 exclude
    phase: TEST       #
    stage: "predict"  # 在 predict 时过滤掉 loss 层
  }                   #
}

这样，你就能得到一个 all in one 的网络配置 lenet_train_val_deploy.prototxt，可以统一用它进行训练和预测，修改隐层再也不用拷贝来拷贝去了。其实使用 NetStateRule 可以进行各种组合，其它的参数组合也能实现 all in one 的网络设置，但我上面介绍的这种配置方法有个好处是完全不用修改原来的 solver.prototxt。也就是 default 走 non-predict，显式走 predict。

那怎样显式提供 stage='predict' 参数呢？在 caffe.bin 命令行可以使用：

$ caffe.bin test --stage="predict" --model="train_val_deploy.prototxt" \
--weights="iter_N.caffemodel"

当然，这时候输入层可能要换成其它的类型，不能是 Input 类型，不然 caffe 没法读取数据。使用 Input 类型时，就得用 Python/C++ 来加载数据。使用 stage="predict" 初始化 Python 和 C++ 的方法如下：

Python:
net = caffe.Net("train_val_deploy.prototxt", caffe.TEST, stages=['predict'],
                weights="iter_N.caffemodel")
C++:
caffe::vector<caffe::string> stages;
stages.push_back("predict");
caffe::Net *net = new caffe::Net("train_val_deploy.prototxt", caffe::TEST, 0, &stages);

相关阅读

发表回复 取消回复

发表回复取消回复