Hadamard + random sign diagonal gives a tiny, very fast, structured reservoir that’s easy to analyze and implement.

Quick restatement of design

A reservoir where the recurrent map is:

r_{t+1} \leftarrow H D r_t \;+\; \text{(input addition)},

where

$H$ is a (normalized) Walsh–Hadamard orthogonal matrix ( $H^\top H = I$ ),
$D$ is diagonal with random $\pm 1$ on the diagonal,
you add the incoming data into the reservoir vector $r$ over time (some injection/drive).

That’s basically an orthogonal, sign-flipping structured reservoir with super-fast transforms (FWHT $O(N\log N)$ ).

What this buys you (benefits)

Extremely cheap mixing.
FWHT is very fast and memory-friendly, so you can make very large reservoirs cheaply.
Norm / energy preservation (before scaling).
Because $H$ and $D$ are orthogonal, the linear map $HD$ is orthogonal → it preserves the vector norm. So information isn’t immediately lost by contraction.
Good information propagation / vanishing-gradient avoidance.
Orthogonal dynamics avoid short-term vanishing of signal energy, which is desirable for long-memory tasks.
Deterministic structure + reproducibility.
You can re-seed reproducibly; no need to store dense random matrices.
Reservoir as a structured filterbank.
The action is like a global mixing / permutation + sign pattern; with nonlinearity it creates rich features.
Easy to combine with ELM-style readout.
Collect reservoir states $r_t$ (or nonlinear transforms) and learn a linear readout by ridge regression.

Because $HD$ is orthogonal, its spectral radius is 1 (eigenvalues lie on the unit circle).
That means the pure linear map is marginally stable — signals do not decay. In practice this can cause persistent oscillations and make the reservoir overly sensitive or non-forgetting.

Pure linear orthogonal recurrence preserves information but won’t create the nonlinear features needed for many tasks. Add an elementwise nonlinearity (tanh, relu, clipped linear) after the transform or on the state before readout.
Alternatively, use a memory+nonlinearity pipeline: apply HD, then elementwise nonlinearity, then maybe a diagonal nonlinearity or subsampling.

Walsh–Hadamard is highly structured. Depending on how input injection is done and how nonlinearities are placed, you might see periodic or quasi-periodic dynamics (not necessarily bad, but worth testing).
If you observe poor mixing, an easy fix is to alternate transforms: e.g. use $H D_1$ at step t, $H D_2$ at step t+1 (two different diagonal sign patterns), or intersperse small permutations.

Options:
- Additive injection: $r \leftarrow \rho H D r + W_{\text{in}} u_t$ . Simple and common.
- Concatenation + projection: add input into a subset of reservoir nodes each step.
- Periodic refresh: replace a fraction of $r$ with transformed inputs occasionally to avoid destructive interference.

Orthogonal (unitary) recurrence tends to preserve memory well and supports long effective memory when combined with small leak or gain near 1. This often yields high linear memory capacity.
However, raw orthogonal linear reservoirs can be too “linear” — the effective computational power for nonlinear tasks depends on the nonlinearity and input projection.