Sum then Project Vs Project then Sum for Neural Network Ensembles

August 13, 2025

You have:

A set of $n$ output vectors from an ensemble of neural networks (or similar models),
$y^{(1)}, y^{(2)}, \dots, y^{(n)}, y^{(i)} \in R^{d}$
A fixed, invertible random projection matrix $R$
- Properties:
  - One-to-all connectivity: every input dimension affects all output dimensions.
  - Invertible: you can recover the original vector from the projection.
  - Dense: typically behaves like a random orthogonal transform, Hadamard-based transform, or dense Gaussian projection.

Two strategies for combining the ensemble outputs:

1. Sum-then-project

Compute the element-wise sum of all $n$ output vectors:
$s = \sum_{i = 1}^{n} y^{(i)}$
Apply a single random projection to the sum:
$z = R s$

Effect:

The random projection mixes all neuron activation's across dimensions.
However, before projection, the combination is dimension-wise pooling: neuron $j$ in each model contributes only an equal part to a sum combination.
This means inter-neuron information sharing happens only after the pooling step, so neurons within the same output index are “competing” or “cooperating” directly before being mixed.

2. Project-then-sum

Apply unique random projections independently to each output vector:
Sum the projected vectors:
$z = \sum_{i = 1}^{n} z^{(i)}$

Effect:

Each neuron’s contribution is immediately spread across all output dimensions before aggregation.
No per-index pooling—each original output neuron influences the combined vector in a fully distributed fashion.
Equal opportunity for all neurons in all models to affect the final output, reducing the risk of certain dimensions being underrepresented.

Further Insights

Information Mixing Timing
- In sum-then-project, information mixing happens after aggregation. This preserves some structure (dimensional correspondence) across models but may waste representational capacity if correlations between dimensions are important.
- In project-then-sum, information mixing happens before aggregation, meaning model outputs are decorrelated earlier and combined in a richer space.
Variance and Interference
- Random projections preserve distances approximately (Johnson–Lindenstrauss lemma), but summing before projecting can cause destructive interference in certain dimensions.
- Project-then-sum tends to avoid localized destructive interference because contributions are spread over the entire vector space before addition.
Interpretation in Signal Processing Terms
- Sum-then-project ≈ pooling in the original signal basis, then applying a unitary mixing transform.
- Project-then-sum ≈ mixing each signal independently, then combining in a mixed basis.
- Analogy:
  - First approach: mix after adding channels.
  - Second approach: mix each channel individually before combining.
Scalability & Parallelism
- Project-then-sum is embarrassingly parallel (projections are independent).
- Sum-then-project has a single large projection step, which may be faster for very large ensembles.
When to Use Which
- Sum-then-project: If you want models to share features per-output-dimension and then jointly transform them.
- Project-then-sum: If you want maximum inter-model independence before combination, avoiding dimension-specific competition.

Search This Blog

Science Limelight

Sum then Project Vs Project then Sum for Neural Network Ensembles

1. Sum-then-project

2. Project-then-sum

Further Insights

Comments

Post a Comment

Popular posts from this blog

Neon Bulb Oscillators

23 Circuits you can Build in an Hour - Free Book

Q Multiplier Circuits