Sum then Project Vs Project then Sum for Neural Network Ensembles
You have:
-
A set of output vectors from an ensemble of neural networks (or similar models),
-
A fixed, invertible random projection matrix
-
Properties:
-
One-to-all connectivity: every input dimension affects all output dimensions.
-
Invertible: you can recover the original vector from the projection.
-
Dense: typically behaves like a random orthogonal transform, Hadamard-based transform, or dense Gaussian projection.
-
-
Two strategies for combining the ensemble outputs:
1. Sum-then-project
-
Compute the element-wise sum of all output vectors:
-
Apply a single random projection to the sum:
Effect:
-
The random projection mixes all neuron activation's across dimensions.
-
However, before projection, the combination is dimension-wise pooling: neuron in each model contributes only an equal part to a sum combination.
-
This means inter-neuron information sharing happens only after the pooling step, so neurons within the same output index are “competing” or “cooperating” directly before being mixed.
2. Project-then-sum
-
Apply unique random projections independently to each output vector:
-
Sum the projected vectors:
Effect:
-
Each neuron’s contribution is immediately spread across all output dimensions before aggregation.
-
No per-index pooling—each original output neuron influences the combined vector in a fully distributed fashion.
-
Equal opportunity for all neurons in all models to affect the final output, reducing the risk of certain dimensions being underrepresented.
Further Insights
-
Information Mixing Timing
-
In sum-then-project, information mixing happens after aggregation. This preserves some structure (dimensional correspondence) across models but may waste representational capacity if correlations between dimensions are important.
-
In project-then-sum, information mixing happens before aggregation, meaning model outputs are decorrelated earlier and combined in a richer space.
-
-
Variance and Interference
-
Random projections preserve distances approximately (Johnson–Lindenstrauss lemma), but summing before projecting can cause destructive interference in certain dimensions.
-
Project-then-sum tends to avoid localized destructive interference because contributions are spread over the entire vector space before addition.
-
-
Interpretation in Signal Processing Terms
-
Sum-then-project ≈ pooling in the original signal basis, then applying a unitary mixing transform.
-
Project-then-sum ≈ mixing each signal independently, then combining in a mixed basis.
-
Analogy:
-
First approach: mix after adding channels.
-
Second approach: mix each channel individually before combining.
-
-
-
Scalability & Parallelism
-
Project-then-sum is embarrassingly parallel (projections are independent).
-
Sum-then-project has a single large projection step, which may be faster for very large ensembles.
-
-
When to Use Which
-
Sum-then-project: If you want models to share features per-output-dimension and then jointly transform them.
-
Project-then-sum: If you want maximum inter-model independence before combination, avoiding dimension-specific competition.
-
Comments
Post a Comment