Neural Network - Why use Derivative

by user3491493   Last Updated January 21, 2018 03:20 AM

Good Day

I am trying to get an understanding of Neural Network. Have gone through few web sites. Came to know the following:

1) One of main objective of neural network is to “predict” based on data. 2) To predict a. Train the network with known data b. Calculate weights by finding difference between “Target Output” and “Calculated Output”. c. To do that we use derivative, partial derivative(chain rule etc..)

I can understand the overall concept of neural network a) I can also understand “Derivative” is nothing but Rate of change of one quantity over another(at a given point). b) Partial derivative is Rate of change of one quantity over another, irrespective of another quantity , if more than two factors are in equation.

The point that I canNOT relate or understand clearly is, a) why should we use derivative in neural network, how exactly does it help b) Why should we activation function, in most cases its Sigmoid function. c) I could not get a complete picture of how derivatives helps neural network.

Can you guys please help me understand the complete picture, iff possible try not to use mathematical terms, so that it will be easy for me to grasp.

Thanks, Satheesh



Answers 2


As you said: "Partial derivative is Rate of change of one quantity over another, irrespective of another quantity , if more than two factors are in equation."

It means that we can measure the rate of change of the output error w.r.t. network weights. If we know how the error changes w.r.t. weights, we can change those weights in a direction that decreases the error. But as @user1952009 said, it is just gradient descent. Neural networks combine it with the chain rule to update non-output layers.

Regarding sigmoid activations, it has 2 uses: 1) to bound the neuron output; 2) to introduce nonlinearities into the network. This last item is essential to make the neural network solve problems not solvable by simple linear/logistic regression. If neurons hadn't nonlinear activation functions, you could rewrite your entire network as a single layer, which is not as useful. For instance, suppose a 2-layer neural network. Its output would be $y = W_o(W_i\mathbf{x})$ ($W_i$ = input weights, $W_o$ = output weights, $\mathbf{x}$ = input), which can be rewritten as $y = (W_oW_i)\mathbf{x}$. Let $W = W_oW_i$, it leaves us with a single layer neural network $y = W\mathbf{x}$.

rcpinto
rcpinto
June 22, 2016 18:06 PM

  • Partial Derivative comes into play because we train neural network with gradient descent, which involves partial derivative when dealing with multivariable case
  • In the final output layer, you can do a sigmoid transformation or tanh or ReLu or nothing at all! It all depends on you. That flexibility is exactly what makes neural networks so powerful in expression capability.

In fact, neural works are nothing but a fancy, popular nonlinear estimator.

Augustin Newton
Augustin Newton
January 21, 2018 02:47 AM

Related Questions



Understanding backpropagation

Updated January 10, 2018 20:20 PM


Back-propagation with ReLU

Updated May 28, 2017 01:20 AM

Gradient calculation via backprop in RNN

Updated April 03, 2017 05:20 AM