博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
CS231n assignment2 Q3 Dropout
阅读量:5154 次
发布时间:2019-06-13

本文共 19047 字,大约阅读时间需要 63 分钟。

Dropout

see

完成前向传播

def dropout_forward(x, dropout_param):    """    Performs the forward pass for (inverted) dropout.    Inputs:    - x: Input data, of any shape    - dropout_param: A dictionary with the following keys:      - p: Dropout parameter. We keep each neuron output with probability p.      - mode: 'test' or 'train'. If the mode is train, then perform dropout;        if the mode is test, then just return the input.      - seed: Seed for the random number generator. Passing seed makes this        function deterministic, which is needed for gradient checking but not        in real networks.    Outputs:    - out: Array of the same shape as x.    - cache: tuple (dropout_param, mask). In training mode, mask is the dropout      mask that was used to multiply the input; in test mode, mask is None.    NOTE: Please implement **inverted** dropout, not the vanilla version of dropout.    See http://cs231n.github.io/neural-networks-2/#reg for more details.    NOTE 2: Keep in mind that p is the probability of **keep** a neuron    output; this might be contrary to some sources, where it is referred to    as the probability of dropping a neuron output.    """    p, mode = dropout_param['p'], dropout_param['mode']    if 'seed' in dropout_param:        np.random.seed(dropout_param['seed'])    mask = None    out = None    if mode == 'train':        #######################################################################        # TODO: Implement training phase forward pass for inverted dropout.   #        # Store the dropout mask in the mask variable.                        #        #######################################################################        keep_prob = 1 - p        mask = (np.random.rand(*x.shape) < keep_prob) / keep_prob        #首先,代码 (np.random.rand(*x.shape),表示根据输入数据矩阵x,亦即经过”激活”后的得分,生成一个相同shape的随机矩阵,其为均匀分布的随机样本[0,1)。然后将其与可被保留神经元的概率 keep_prob 做比较,就可以得到一个随机真值表作为随机失活遮罩(mask)。原始的办法是:由于在训练模式时,我们丢掉了部分的激活值,数值调整 out = mask * x 后造成整体分布的期望值的下降,因此在预测时就需要乘上一个概率 1/keep_prob,才能保持分布的统一。不过,我们用一种叫做inverted dropout的技巧,就是如上面代码所示,直接在训练模式下多除以一个概率 keep_prob,那么在测试模式下就不用做任何操作了,直接让数据通过dropout层即可。        out = mask * x        #######################################################################        #                           END OF YOUR CODE                          #        #######################################################################    elif mode == 'test':        #######################################################################        # TODO: Implement the test phase forward pass for inverted dropout.   #        #######################################################################        out = x        #######################################################################        #                            END OF YOUR CODE                         #        #######################################################################    cache = (dropout_param, mask)    out = out.astype(x.dtype, copy=False)    return out, cache

Running tests with p = 0.25

Mean of input: 10.000207878477502
Mean of train-time output: 9.998198947788465
Mean of test-time output: 10.000207878477502
Fraction of train-time output set to zero: 0.250168
Fraction of test-time output set to zero: 0.0

Running tests with p = 0.4

Mean of input: 10.000207878477502
Mean of train-time output: 9.976910758765856
Mean of test-time output: 10.000207878477502
Fraction of train-time output set to zero: 0.401368
Fraction of test-time output set to zero: 0.0

Running tests with p = 0.7

Mean of input: 10.000207878477502
Mean of train-time output: 9.98254739313744
Mean of test-time output: 10.000207878477502
Fraction of train-time output set to zero: 0.700496
Fraction of test-time output set to zero: 0.0

完成后向传播

def dropout_backward(dout, cache):    """    Perform the backward pass for (inverted) dropout.    Inputs:    - dout: Upstream derivatives, of any shape    - cache: (dropout_param, mask) from dropout_forward.    """    dropout_param, mask = cache    mode = dropout_param['mode']    dx = None    if mode == 'train':        #######################################################################        # TODO: Implement training phase backward pass for inverted dropout   #        #######################################################################        dx = mask * dout        #梯度反向传播时使用同样的 mask将被遮罩的梯度置零。        #######################################################################        #                          END OF YOUR CODE                           #        #######################################################################    elif mode == 'test':        dx = dout    return dx

dx relative error: 5.445612718272284e-11

带有dropout的全连接网络:

class FullyConnectedNet(object):    """    A fully-connected neural network with an arbitrary number of hidden layers,    ReLU nonlinearities, and a softmax loss function. This will also implement    dropout and batch/layer normalization as options. For a network with L layers,    the architecture will be    {affine - [batch/layer norm] - relu - [dropout]} x (L - 1) - affine - softmax    where batch/layer normalization and dropout are optional, and the {...} block is    repeated L - 1 times.    Similar to the TwoLayerNet above, learnable parameters are stored in the    self.params dictionary and will be learned using the Solver class.    """    def __init__(self, hidden_dims, input_dim=3*32*32, num_classes=10,                 dropout=1, normalization=None, reg=0.0,                 weight_scale=1e-2, dtype=np.float32, seed=None):        """        Initialize a new FullyConnectedNet.        Inputs:        - hidden_dims: A list of integers giving the size of each hidden layer.        - input_dim: An integer giving the size of the input.        - num_classes: An integer giving the number of classes to classify.        - dropout: Scalar between 0 and 1 giving dropout strength. If dropout=1 then          the network should not use dropout at all.        - normalization: What type of normalization the network should use. Valid values          are "batchnorm", "layernorm", or None for no normalization (the default).        - reg: Scalar giving L2 regularization strength.        - weight_scale: Scalar giving the standard deviation for random          initialization of the weights.        - dtype: A numpy datatype object; all computations will be performed using          this datatype. float32 is faster but less accurate, so you should use          float64 for numeric gradient checking.        - seed: If not None, then pass this random seed to the dropout layers. This          will make the dropout layers deteriminstic so we can gradient check the          model. 默认无随机种子,若有会传递给dropout层。        """        self.normalization = normalization        self.use_dropout = dropout != 1        self.reg = reg        self.num_layers = 1 + len(hidden_dims)        self.dtype = dtype        self.params = {}        ############################################################################        # TODO: Initialize the parameters of the network, storing all values in    #        # the self.params dictionary. Store weights and biases for the first layer #        # in W1 and b1; for the second layer use W2 and b2, etc. Weights should be #        # initialized from a normal distribution centered at 0 with standard       #        # deviation equal to weight_scale. Biases should be initialized to zero.   #        #                                                                          #        # When using batch normalization, store scale and shift parameters for the #        # first layer in gamma1 and beta1; for the second layer use gamma2 and     #        # beta2, etc. Scale parameters should be initialized to ones and shift     #        # parameters should be initialized to zeros.                               #        ############################################################################        #初始化所有隐藏层的参数        in_dim = input_dim #D        for i,h_dim in enumerate(hidden_dims): #(0,H1)(1,H2)            self.params['W%d' %(i+1,)] = weight_scale * np.random.randn(in_dim,h_dim)            self.params['b%d' %(i+1,)] = np.zeros((h_dim,))            if self.normalization=='batchnorm':                self.params['gamma%d' %(i+1,)] = np.ones((h_dim,)) #初始化为1                self.params['beta%d' %(i+1,)] = np.zeros((h_dim,)) #初始化为0            in_dim = h_dim #将该层的列数传递给下一层的行数                    #初始化所有输出层的参数        self.params['W%d' %(self.num_layers,)] = weight_scale * np.random.randn(in_dim,num_classes)        self.params['b%d' %(self.num_layers,)] = np.zeros((num_classes,))        ############################################################################        #                             END OF YOUR CODE                             #        ############################################################################        #  当开启 dropout 时,我们需要在每一个神经元层中传递一个相同的 dropout 参数字典 self.dropout_param ,以保证每一层的神经元们 都知晓失活概率p和当前神经网络的模式状态mode(训练/测试)。         self.dropout_param = {} #dropout的参数字典        if self.use_dropout:            self.dropout_param = {'mode': 'train', 'p': dropout}            if seed is not None:                self.dropout_param['seed'] = seed        #  当开启批量归一化时,我们要定义一个BN算法的参数列表 self.bn_params , 以用来跟踪记录每一层的平均值和标准差。其中,第0个元素 self.bn_params[0] 表示前向传播第1个BN层的参数,第1个元素 self.bn_params[1] 表示前向传播 第2个BN层的参数,以此类推。        self.bn_params = [] #BN的参数字典        if self.normalization=='batchnorm':            self.bn_params = [{'mode': 'train'} for i in range(self.num_layers - 1)]        if self.normalization=='layernorm':            self.bn_params = [{} for i in range(self.num_layers - 1)]        # Cast all parameters to the correct datatype        for k, v in self.params.items():            self.params[k] = v.astype(dtype)    def loss(self, X, y=None):        """        Compute loss and gradient for the fully-connected net.        Input / output: Same as TwoLayerNet above.        """        X = X.astype(self.dtype)        mode = 'test' if y is None else 'train'        # Set train/test mode for batchnorm params and dropout param since they        # behave differently during training and testing.        if self.use_dropout:            self.dropout_param['mode'] = mode        if self.normalization=='batchnorm':            for bn_param in self.bn_params:                bn_param['mode'] = mode        scores = None        ############################################################################        # TODO: Implement the forward pass for the fully-connected net, computing  #        # the class scores for X and storing them in the scores variable.          #        #                                                                          #        # When using dropout, you'll need to pass self.dropout_param to each       #        # dropout forward pass.                                                    #        #                                                                          #        # When using batch normalization, you'll need to pass self.bn_params[0] to #        # the forward pass for the first batch normalization layer, pass           #        # self.bn_params[1] to the forward pass for the second batch normalization #        # layer, etc.                                                              #        ############################################################################        fc_mix_cache = {} # # 初始化每层前向传播的缓冲字典        if self.use_dropout: # 如果开启了dropout,初始化其对应的缓冲字典            dp_cache = {}        # 从第一个隐藏层开始循环每一个隐藏层,传递数据out,保存每一层的缓冲cache        out = X        for i in range(self.num_layers - 1): # 在每个hidden层中循环            w,b = self.params['W%d' %(i+1,)],self.params['b%d' %(i+1,)]            if self.normalization == 'batchnorm':                gamma = self.params['gamma%d' %(i+1,)]                beta = self.params['beta%d' %(i+1,)]                out,fc_mix_cache[i] = affine_bn_relu_forward(out,w,b,gamma,beta,self.bn_params[i])            else:                out,fc_mix_cache[i] = affine_relu_forward(out,w,b)            if self.use_dropout:                out,dp_cache[i] = dropout_forward(out,self.dropout_param)        #最后的输出层        w = self.params['W%d' %(self.num_layers,)]        b = self.params['b%d' %(self.num_layers,)]        out,out_cache = affine_forward(out,w,b)        scores = out        ############################################################################        #                             END OF YOUR CODE                             #        ############################################################################        # If test mode return early        if mode == 'test':            return scores        loss, grads = 0.0, {}        ############################################################################        # TODO: Implement the backward pass for the fully-connected net. Store the #        # loss in the loss variable and gradients in the grads dictionary. Compute #        # data loss using softmax, and make sure that grads[k] holds the gradients #        # for self.params[k]. Don't forget to add L2 regularization!               #        #                                                                          #        # When using batch/layer normalization, you don't need to regularize the scale   #        # and shift parameters.                                                    #        #                                                                          #        # NOTE: To ensure that your implementation matches ours and you pass the   #        # automated tests, make sure that your L2 regularization includes a factor #        # of 0.5 to simplify the expression for the gradient.                      #        ############################################################################        loss,dout = softmax_loss(scores,y)        loss += 0.5 * self.reg * np.sum(self.params['W%d' %(self.num_layers,)] ** 2)        # 在输出层处梯度的反向传播,顺便把梯度保存在梯度字典 grad 中:        dout,dw,db = affine_backward(dout,out_cache)        grads['W%d' %(self.num_layers,)] = dw + self.reg * self.params['W%d' %(self.num_layers,)]        grads['b%d' %(self.num_layers,)] = db        # 在每一个隐藏层处梯度的反向传播,不仅顺便更新了梯度字典 grad,还迭代算出了损失值loss        for i in range(self.num_layers - 1):            ri = self.num_layers - 2 - i #倒数第ri+1隐藏层            loss += 0.5 * self.reg * np.sum(self.params['W%d' %(ri+1,)] ** 2) #迭代地补上每层的正则项给loss            if self.use_dropout:                dout = dropout_backward(dout,dp_cache[ri])            if self.normalization == 'batchnorm':                dout,dw,db,dgamma,dbeta = affine_bn_relu_backward(dout,fc_mix_cache[ri])                grads['gamma%d' %(ri+1,)] = dgamma                grads['beta%d' %(ri+1,)] = dbeta            else:                dout,dw,db = affine_relu_backward(dout,fc_mix_cache[ri])            grads['W%d' %(ri+1,)] = dw + self.reg * self.params['W%d' %(ri+1,)]            grads['b%d' %(ri+1,)] = db        ############################################################################        #                             END OF YOUR CODE                             #        ############################################################################        return loss, grads

Running check with dropout = 1

Initial loss: 2.3004790897684924
W1 relative error: 1.48e-07
W2 relative error: 2.21e-05
W3 relative error: 3.53e-07
b1 relative error: 5.38e-09
b2 relative error: 2.09e-09
b3 relative error: 5.80e-11

Running check with dropout = 0.75

Initial loss: 2.2924325088330475
W1 relative error: 2.74e-08
W2 relative error: 2.98e-09
W3 relative error: 4.29e-09
b1 relative error: 7.78e-10
b2 relative error: 3.36e-10
b3 relative error: 1.65e-10

Running check with dropout = 0.5

Initial loss: 2.3042759220785896
W1 relative error: 3.11e-07
W2 relative error: 1.84e-08
W3 relative error: 5.35e-08
b1 relative error: 5.37e-09
b2 relative error: 2.99e-09
b3 relative error: 1.13e-10

dropout可以视为一种正则化手段

1

(Iteration 1 / 125) loss: 7.856643
(Epoch 0 / 25) train acc: 0.260000; val_acc: 0.184000
(Epoch 1 / 25) train acc: 0.404000; val_acc: 0.259000
(Epoch 2 / 25) train acc: 0.468000; val_acc: 0.248000
(Epoch 3 / 25) train acc: 0.526000; val_acc: 0.247000
(Epoch 4 / 25) train acc: 0.646000; val_acc: 0.273000
(Epoch 5 / 25) train acc: 0.686000; val_acc: 0.257000
(Epoch 6 / 25) train acc: 0.690000; val_acc: 0.260000
(Epoch 7 / 25) train acc: 0.758000; val_acc: 0.255000
(Epoch 8 / 25) train acc: 0.832000; val_acc: 0.264000
(Epoch 9 / 25) train acc: 0.856000; val_acc: 0.268000
(Epoch 10 / 25) train acc: 0.914000; val_acc: 0.289000
(Epoch 11 / 25) train acc: 0.922000; val_acc: 0.293000
(Epoch 12 / 25) train acc: 0.948000; val_acc: 0.307000
(Epoch 13 / 25) train acc: 0.960000; val_acc: 0.313000
(Epoch 14 / 25) train acc: 0.972000; val_acc: 0.311000
(Epoch 15 / 25) train acc: 0.964000; val_acc: 0.309000
(Epoch 16 / 25) train acc: 0.966000; val_acc: 0.295000
(Epoch 17 / 25) train acc: 0.984000; val_acc: 0.306000
(Epoch 18 / 25) train acc: 0.988000; val_acc: 0.332000
(Epoch 19 / 25) train acc: 0.996000; val_acc: 0.318000
(Epoch 20 / 25) train acc: 0.992000; val_acc: 0.313000
(Iteration 101 / 125) loss: 0.000961
(Epoch 21 / 25) train acc: 0.996000; val_acc: 0.311000
(Epoch 22 / 25) train acc: 0.994000; val_acc: 0.304000
(Epoch 23 / 25) train acc: 0.998000; val_acc: 0.308000
(Epoch 24 / 25) train acc: 1.000000; val_acc: 0.316000
(Epoch 25 / 25) train acc: 0.998000; val_acc: 0.320000
0.25
(Iteration 1 / 125) loss: 11.299055
(Epoch 0 / 25) train acc: 0.234000; val_acc: 0.187000
(Epoch 1 / 25) train acc: 0.382000; val_acc: 0.228000
(Epoch 2 / 25) train acc: 0.490000; val_acc: 0.247000
(Epoch 3 / 25) train acc: 0.534000; val_acc: 0.228000
(Epoch 4 / 25) train acc: 0.648000; val_acc: 0.298000
(Epoch 5 / 25) train acc: 0.676000; val_acc: 0.316000
(Epoch 6 / 25) train acc: 0.752000; val_acc: 0.285000
(Epoch 7 / 25) train acc: 0.774000; val_acc: 0.252000
(Epoch 8 / 25) train acc: 0.818000; val_acc: 0.288000
(Epoch 9 / 25) train acc: 0.844000; val_acc: 0.326000
(Epoch 10 / 25) train acc: 0.864000; val_acc: 0.311000
(Epoch 11 / 25) train acc: 0.920000; val_acc: 0.293000
(Epoch 12 / 25) train acc: 0.922000; val_acc: 0.282000
(Epoch 13 / 25) train acc: 0.960000; val_acc: 0.303000
(Epoch 14 / 25) train acc: 0.966000; val_acc: 0.290000
(Epoch 15 / 25) train acc: 0.948000; val_acc: 0.277000
(Epoch 16 / 25) train acc: 0.970000; val_acc: 0.324000
(Epoch 17 / 25) train acc: 0.950000; val_acc: 0.295000
(Epoch 18 / 25) train acc: 0.970000; val_acc: 0.316000
(Epoch 19 / 25) train acc: 0.972000; val_acc: 0.296000
(Epoch 20 / 25) train acc: 0.990000; val_acc: 0.293000
(Iteration 101 / 125) loss: 0.556808
(Epoch 21 / 25) train acc: 0.990000; val_acc: 0.303000
(Epoch 22 / 25) train acc: 0.990000; val_acc: 0.306000
(Epoch 23 / 25) train acc: 0.992000; val_acc: 0.301000
(Epoch 24 / 25) train acc: 0.994000; val_acc: 0.303000
(Epoch 25 / 25) train acc: 0.998000; val_acc: 0.289000

1250085-20181228150409054-197567013.png

这张图真的能看出来什么吗。。train上的准确率几乎相同,validation上的准确率也差不多。。

转载于:https://www.cnblogs.com/bernieloveslife/p/10190741.html

你可能感兴趣的文章
跨域请求
查看>>
灌水导论——灌水法初步
查看>>
Vim 使用教程(搬运)
查看>>
常问面试题
查看>>
《构建之法》课程总结及建议
查看>>
echarts使用
查看>>
SQL2005触发器和存储过程
查看>>
poj 2186 Popular Cows 有向图强连通分量 tarjan
查看>>
hdu 2545 并查集
查看>>
[BZOJ4568][SCOI2016]幸运数字(倍增LCA,点分治+线性基)
查看>>
尤金·卡巴斯基:卡巴斯基实验室调查内网遭黑客攻击事件
查看>>
android之Handler Runnable实现倒计时
查看>>
putty修改编码
查看>>
安全版字符串操作函数
查看>>
数据库msqlserver的几种类型及解决MSSQLServer服务启动不了的问题
查看>>
CSS轮廓 边距 填充 分组和嵌套
查看>>
JAVA多线程--线程阻塞与唤醒
查看>>
JavaSE语法基础总结
查看>>
python自动化测试之mysql5.0版本数据库查询数据时出现乱码问题分析
查看>>
线性表9 - 数据结构和算法14
查看>>