I am unable to understand when to use ReLU, Leaky ReLU and ELU. How do they compare to other activation functions (like the sigmoid and the tanh) and their pros and cons.
ReLU vs Leaky ReLU vs ELU with pros and cons - Data Science Stack Exchange
About ELU: ELU has a log curve for all negative values which is $ y = \alpha ( e^x - 1 )$. It does not produce a saturated firing for some extent but saturates for larger negative values. See here for more information. Hence, $ y = log ( 1 + e^x ) $ is not used because of early saturation for negative values and also non linearity for values > 0.
ELU and SELU are typically used for the hidden layers of a Neural Network, I personally never heard of an application of ELU or SELU for final outputs. Both choices of final activation and loss function depend on the task, this is the only criterion to follow to implement a good Neural Network.
I am a trying to understand the SELU activation function and I was wondering why deep learning practitioners keep using RELU, with all its issues, instead of SELU, which enables a neural network to
Why deep learning models still use RELU instead of SELU, as their ...
There is a popular custom for boys to start their Gemara studies with Elu Metzios (the 2nd Perek in Bava Metzia). The Gemara (Bava Basra 175b) does say that financial laws are conducive to becomin...
Referring to the Stanford course notes on Convolutional Neural Networks for Visual Recognition, a paragraph says: "Unfortunately, ReLU units can be fragile during training and can "die". For ex...