ReLU as a Switch
The ReLU (Rectified Linear Unit) activation function is one of the most commonly used activation functions in neural networks, especially in deep learning.
Definition
The ReLU function is defined as:
This means:
-
If the input x is , the output is .
-
If the input x is , the output is .
Graphically
It looks like a straight line with a slope of 1 for positive inputs and flat (zero) for negative inputs.
Switching Viewpoint
ReLU can also be understood from an alternative perspective.
Consider that an electrical switch behaves linearly when "on" (e.g., 1 V in gives 1 V out, 2 V in gives 2 V out) and outputs zero when "off."
From this viewpoint, ReLU acts like a switch that is "on" when and "off" otherwise. The switching decision is (x≥0)?
More generally (outside of ReLU) other switching decisions are possible.
This switching interpretation can help demystify the behavior of ReLU-based neural networks. It highlights that ReLU units are effectively enabling or disabling connections based on the sign of their input. Once the switching states (i.e., which ReLUs are active) are known, the overall computation in the network simplifies: each neuron's output becomes a linear function of the input, and the entire network behaves as a piecewise linear system.
In each linear region, standard linear algebra can be used to simplify the connected weighted sums through the active paths of the network. Giving a simple square matrix mapping of neural network input to output for each linear region. The same reasoning applies layer-wise during the feed forward phase.
During training (with SGD) all the switching states become known during the feed forward phase. That results in back-propagation only ever updating a simple linear system.
Comments
Post a Comment