Fast Transform Layers for Neural Networks
In a fast transform layer you replace the weight matrix with a fast transform matrix (which is enacted by a fast algorithm.)
In both dense layers and fast-transform layers, each neuron’s activation is essentially a coefficient that gets multiplied by a pattern into the next layer’s neurons:
-
Dense layer:
-
The pattern is given by that neuron’s outgoing weight vector.
-
These patterns are learned, so the network can tailor them to the task.
-
Each neuron can develop a very specialized “projection” pattern that routes its information selectively.
-
-
Fast-transform layer (WHT, FFT, etc.):
-
The pattern is fixed by the transform’s structure — e.g., Walsh–Hadamard patterns of ±1, sine/cosine waves for FFT.
-
A neuron’s activation just scales one of these fixed patterns, so it “lights up” the next layer in a predetermined way.
-
Adjustability has to come from switching, permutation, or activation parameterization, not from learning the projection pattern itself.
-
Why that difference matters for width
Since fast-transform networks can’t tailor each neuron’s projection pattern through learned weights, they need:
-
More channels (wider layers) so enough different fixed patterns are available to encode diverse features.
-
Extra nonlinearity or switching to combine patterns in richer ways over multiple layers.
In short:
Both systems send patterns forward; dense layers learn them, fast transforms predefine them. Wider fast-transform layers compensate for the lost flexibility in learned patterns.
Comments
Post a Comment