Sum then Project Vs Project then Sum for Neural Network Ensembles



You have:

  • A set of n output vectors from an ensemble of neural networks (or similar models),

    y(1),y(2),,y(n),y(i)Rd
  • A fixed, invertible random projection matrix RRd×dR

    • Properties:

      • One-to-all connectivity: every input dimension affects all output dimensions.

      • Invertible: you can recover the original vector from the projection.

      • Dense: typically behaves like a random orthogonal transform, Hadamard-based transform, or dense Gaussian projection.

Two strategies for combining the ensemble outputs:


1. Sum-then-project

  • Compute the element-wise sum of all n output vectors:

    s=i=1ny(i)
  • Apply a single random projection to the sum:

    z=Rs

Effect:

  • The random projection mixes all neuron activation's across dimensions.

  • However, before projection, the combination is dimension-wise pooling: neuron j in each model contributes only an equal part to a sum combination.

  • This means inter-neuron information sharing happens only after the pooling step, so neurons within the same output index are “competing” or “cooperating” directly before being mixed.


2. Project-then-sum

  • Apply unique random projections independently to each output vector:



  • Sum the projected vectors:

    z=i=1nz(i)

Effect:

  • Each neuron’s contribution is immediately spread across all output dimensions before aggregation.

  • No per-index pooling—each original output neuron influences the combined vector in a fully distributed fashion.

  • Equal opportunity for all neurons in all models to affect the final output, reducing the risk of certain dimensions being underrepresented.


Further Insights

  1. Information Mixing Timing

    • In sum-then-project, information mixing happens after aggregation. This preserves some structure (dimensional correspondence) across models but may waste representational capacity if correlations between dimensions are important.

    • In project-then-sum, information mixing happens before aggregation, meaning model outputs are decorrelated earlier and combined in a richer space.

  2. Variance and Interference

    • Random projections preserve distances approximately (Johnson–Lindenstrauss lemma), but summing before projecting can cause destructive interference in certain dimensions.

    • Project-then-sum tends to avoid localized destructive interference because contributions are spread over the entire vector space before addition.

  3. Interpretation in Signal Processing Terms

    • Sum-then-project ≈ pooling in the original signal basis, then applying a unitary mixing transform.

    • Project-then-sum ≈ mixing each signal independently, then combining in a mixed basis.

    • Analogy:

      • First approach: mix after adding channels.

      • Second approach: mix each channel individually before combining.

  4. Scalability & Parallelism

    • Project-then-sum is embarrassingly parallel (projections are independent).

    • Sum-then-project has a single large projection step, which may be faster for very large ensembles.

  5. When to Use Which

    • Sum-then-project: If you want models to share features per-output-dimension and then jointly transform them.

    • Project-then-sum: If you want maximum inter-model independence before combination, avoiding dimension-specific competition.

Comments

Popular posts from this blog

Neon Bulb Oscillators

23 Circuits you can Build in an Hour - Free Book

Q Multiplier Circuits