VGG16的结构图

### 网络细节

3×3卷积层的优点

First, we incorporate three non-linear rectification layers instead of a single one, which makes the decision function more discriminative. Second, we decrease the number of parameters: assuming that both the input and the output of a three-layer 3 × 3 convolution stack has C channels, the stack is parametrised by $3(3^2 C^2) = 27C^2$ weights; at the same time, a single 7 × 7 conv. layer would require $7^2 C2 = 49C^2$ parameters, i.e. 81% more. This can be seen as imposing a regularisation on the 7 × 7 conv. filters, forcing them to have a decomposition through the 3 × 3 filters (with non-linearity injected in between).

### 网络训练、测试

#### 网络训练

VGG网络的训练使用 具有动量的mini-batch梯度下降法（mini-batch gradient descent with momentum.）优化多项Logistic回归目标。这里 batch 的大小设置为256，动量（momentum）的大小设置为0.9。

The training was regularised by weight decay (the L2 penalty multiplier set to 5· 10−4) and dropout regularisation for the first two fully-connected layers (dropout ratio set to 0.5).

It is worth noting that after the paper submission we found that it is possible to initialise the weights without pre-training by using the random initialisation procedure of Glorot & Bengio (2010，Understanding the difficulty of training deep feedforward neural networks )

#### 网络测试

