Inward & Outward Facing Fast Random Projections.

August 13, 2025

Fast Random Projections Using the Walsh–Hadamard Transform and Sign-Flips

Good afternoon everyone. Today, I want to walk you through a fast and elegant way to construct random projections, using the Walsh–Hadamard transform combined with simple sign-flip operations.

Background – Why Random Projections?

Random projections are a powerful tool in machine learning and signal processing. They help reduce dimensionality, spread information evenly, and preserve distances in high-dimensional space — all while avoiding the cost of storing large dense random matrices.

Traditionally, a random projection is implemented as:

y = R x

where $R$ is a dense random matrix. The drawback? Computing $Rx$ is $O(n^2)$ in time and storage.

We can do better.

The HD Construction

Here’s the idea:
Let $H$ be the Walsh–Hadamard matrix, and $D$ be a diagonal matrix whose entries are random $\pm 1$ . Then:

y = H D x

This is a fast random projection because:

It is a random change of basis transform rather that the structured change of basis of the Walsh-Hadamard matrix alone.
Applying $D$ is just a pattern of sign flips — $O(n)$ time.
Applying $H$ can be done in $O(n \log n)$ time via the Fast Walsh–Hadamard Transform (FWHT).
No multiplications are needed; only additions, subtractions, and sign changes.

Orthogonality and Inverse

An important property: each row of $H D$ is still orthogonal. And both $H$ and $D$ are self-inverse:

H^{-1} = H, \quad D^{-1} = D

So the inverse of $y = H D x is:$

x = D H y

Here’s the derivation:

y = H D x \quad \Rightarrow \quad x = D H (H D x) = D (I) D x = D D x = I x = x

$Inward vs Outward Facing Projections$

H D x

Why Outward Facing is Useful for Backpropagation

Suppose you apply $y = D H x$ as the final step in a neural network’s forward pass.
During backpropagation, the error vector must be multiplied by the inverse:

(D H)^{- 1} = H D

So the error signal is automatically redistributed using an inward-facing projection — ensuring the gradient flows back in a randomized but well-spread manner. This can improve robustness and reduce overfitting tendencies.

Multiple-Stage Projections for Sparse Data

For dense data, a single $H D$ works well.
But for sparse data, one stage might not sufficiently mix the information. We can chain multiple stages:

H D_1 H D_2, \quad \text{or} \quad H D_1 H D_2 H D_3

Each $D_i$ is an independent random diagonal $\pm 1$ matrix. This compounding improves mixing for sparse inputs.

Self-Inverse Random Projections

We can even design self-inverse projections:

$y = D H D x$

The inverse is:

$u = D H D y \quad \Rightarrow \quad u = x$

This is handy when you need the forward and backward transforms to be identical — for example, in certain symmetric network architectures.

For sparse input data, one option is:

$y = D_{1} H D_{2} H D_{1} x$

which retains self-inverse properties while enhancing mixing.

Summary

$H D$ gives a fast, orthogonal, low-storage random projection.
$H D x$ is inward-facing — great for forward mixing.
$D H x$ is outward-facing — great for preparing error signals in backprop.
Multi-stage and self-inverse versions extend these ideas for sparse data.

In short, combining the Walsh–Hadamard transform with sign flips is a simple but extremely versatile trick for machine learning and signal processing.

Closing

This method gives you $O(n \log n)$ projections without storing large random matrices, while keeping all the nice orthogonality properties of dense random projections.
It’s a tool worth adding to your mental toolbox for efficient high-dimensional transformations.

Thank you.

Search This Blog

Science Limelight