Activation Function

In Nerual network:

$Y = \sum(weight * input) + bias$

The value of Y can be anything ranging from -inf to +inf. The neuron really doesn’t know the bounds of the value

We decided to add “activation functions” for this purpose. To check the Y value produced by a neuron and decide whether outside connections should consider this neuron as “fired” or not. Or rather let’s say — “activated” or not.

step function

Binary activation

Linear Function

If all are linear in model, the final activation function of last layer is nothing but just a linear function of the input of first layer!

Sigmoid

$y = \frac{1}{1+e^{-x}}$

It is nonlinear in nature. Combinations of this function are also nonlinear!

When $x\rightarrow \infty$ , the change of $y$ becomes small or the gradient is small. Then the Backpropagation It gives rise to a problem of “vanishing gradients”

Tanh

It is a scaled sigmoid function! Where it go through (0,0)

ReLu (rectifield linear unit)

123

Because of the horizontal line in ReLu( for negative X ), the gradient can go towards 0. For activations in that region of ReLu, gradient will be 0 because of which the weights will not get adjusted during descent. That means, those neurons which go into that state will stop responding to variations in error/ input ( simply because gradient is 0, nothing changes ). This is called dying ReLu problem.

ReLu is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations. That is a good point to consider when we are designing deep neural nets.

Softmax

This is finding a probability for each class that this data will be.