Information flow in neural networks
Input-manifold information must propagate forward through the network and output-manifold information must propagate backward through the network during training and then forward again during recall.
Here’s how it’s typically framed in the literature:
1. Forward Information Flow (Input Manifold)
-
In the forward pass, the geometry and structure of the input manifold (the set of possible inputs, often low-dimensional in a high-dimensional space) needs to be preserved and transformed in useful ways through each layer.
-
If the network “breaks” the manifold — for example, by collapsing it onto a lower-dimensional set too early (rank collapse, saturation, dead ReLUs) — then information useful for later decision boundaries is lost irreversibly.
-
In deep learning theory, this is connected to:
-
Expressivity and manifold embedding theory — ensuring that layers preserve enough variation to distinguish different classes or outputs.
-
Signal propagation theory (e.g., Poole et al. 2016, Schoenholz et al. 2017) — analyzing whether signals blow up, vanish, or preserve variance as they move forward.
-
2. Backward Information Flow (Output Manifold)
-
During training, the output manifold is being constructed in the parameter space layer-by-layer through backpropagation.
-
Gradients carry information about how the current layer’s transformation should change so that the output manifold aligns with the target space.
-
If gradients vanish, explode, or become decorrelated from the forward signal, the network can’t “connect” the target geometry to earlier layers.
-
This is connected to:
-
Gradient flow analysis — ensuring the Jacobians don’t degenerate across layers.
-
Neural tangent kernel (NTK) perspectives — where both forward and backward flows are jointly considered for stable learning.
-
3. Bidirectional Preservation
-
Some recent works explicitly talk about mutual information flow in both directions:
-
In information bottleneck theory (Tishby et al.), too much compression too early kills forward signal; too little kills generalization.
-
In invertible networks (e.g., RevNets), forward and backward flows are explicitly symmetric, guaranteeing no information loss.
-
In critical initialization research, both forward activations and backward gradients are kept at “criticality” to ensure simultaneous signal propagation.
-
4. Why It Matters
-
If forward manifold information doesn’t make it to the output, you can’t make fine-grained predictions.
-
If backward manifold information (gradients) doesn’t reach the early layers, those layers can’t learn to shape the forward manifold properly.
-
The healthiest networks maintain two-way information highways until the very end of training — only near the output do you intentionally collapse the manifold to decision boundaries.
Comments
Post a Comment