Information flow in neural networks

Input-manifold information must propagate forward through the network and output-manifold information must propagate backward through the network during training and then forward again during recall.

Here’s how it’s typically framed in the literature:


1. Forward Information Flow (Input Manifold)

  • In the forward pass, the geometry and structure of the input manifold (the set of possible inputs, often low-dimensional in a high-dimensional space) needs to be preserved and transformed in useful ways through each layer.

  • If the network “breaks” the manifold — for example, by collapsing it onto a lower-dimensional set too early (rank collapse, saturation, dead ReLUs) — then information useful for later decision boundaries is lost irreversibly.

  • In deep learning theory, this is connected to:

    • Expressivity and manifold embedding theory — ensuring that layers preserve enough variation to distinguish different classes or outputs.

    • Signal propagation theory (e.g., Poole et al. 2016, Schoenholz et al. 2017) — analyzing whether signals blow up, vanish, or preserve variance as they move forward.


2. Backward Information Flow (Output Manifold)

  • During training, the output manifold is being constructed in the parameter space layer-by-layer through backpropagation.

  • Gradients carry information about how the current layer’s transformation should change so that the output manifold aligns with the target space.

  • If gradients vanish, explode, or become decorrelated from the forward signal, the network can’t “connect” the target geometry to earlier layers.

  • This is connected to:

    • Gradient flow analysis — ensuring the Jacobians don’t degenerate across layers.

    • Neural tangent kernel (NTK) perspectives — where both forward and backward flows are jointly considered for stable learning.


3. Bidirectional Preservation

  • Some recent works explicitly talk about mutual information flow in both directions:

    • In information bottleneck theory (Tishby et al.), too much compression too early kills forward signal; too little kills generalization.

    • In invertible networks (e.g., RevNets), forward and backward flows are explicitly symmetric, guaranteeing no information loss.

    • In critical initialization research, both forward activations and backward gradients are kept at “criticality” to ensure simultaneous signal propagation.


4. Why It Matters

  • If forward manifold information doesn’t make it to the output, you can’t make fine-grained predictions.

  • If backward manifold information (gradients) doesn’t reach the early layers, those layers can’t learn to shape the forward manifold properly.

  • The healthiest networks maintain two-way information highways until the very end of training — only near the output do you intentionally collapse the manifold to decision boundaries.



Comments

Popular posts from this blog

Neon Bulb Oscillators

23 Circuits you can Build in an Hour - Free Book

Q Multiplier Circuits