What is the difference among stochastic, batch and mini-batch learning styles?

by Joka   Last Updated March 11, 2018 16:19 PM

So far as I know, we have the following scenario:

stochastic: The error is calculated for each sample s. So, we can calculate the gradients for s. And we can update the weights of the network according to these gradients. An epoch involves a complete iteration over the whole training set. So, if we have N samples, in each epoch we will have N updates of the weights.

batch: The algorithm calculates the error of each sample s. And the gradients of each sample s are accumulated. At he the end of an epoch, we take the average of this accumulated gradient and we update the weights of the network. So, we have only one update of weights in each epoch.

mini-bach: The algorithm divides the training set in subsets called mini-batches. So, for each mini-batch m, the algorithm calculates the error of each sample s, and it accumulates the gradients of the samples in m. Once every sample of a mini-batch m is processed, we take the average of the accumulated gradient for m and we update the weights of the network. So, if we divide the training set in X mini-batches, at the end of each epoch we will have X updates of the weights of the network, one for each mini-batch.

Is this correct?

In the case of batch or mini-batch back-propagation we really use the "average gradient"? Or, instead, we only use the sum of the gradients that were calculated?

If this is not correct, what is the actual sequence of operations (error calculation, gradient calculation, update of weights) for each style of back-propagation?



Related Questions


Inefficient Gradient Calculation in Neural Nets

Updated January 05, 2018 07:19 AM