Fast Walsh Hadamard Transform (out-of-place, fixed A-stride)
Fast WHT Algorithm (out-of-place, fixed A-stride)
Let . For each stage
-
Read in adjacent pairs: .
-
Write sums sequentially into .
-
Write diffs sequentially into .
-
Swap .
Equivalently, the read stride is always 1; the write pointers are two linear cursors: wSum
starting at 0 and wDiff
starting at .

Practical notes & perks
-
Cache/stream-friendly: Writing all sums contiguously and all diffs contiguously improves write combining and sequential bandwidth versus interleaving.
-
Vectorization: The inner loop is a pure stride-1 read with two monotone write cursors — perfect for SIMD and wide loads/stores.
2*memory: Uses twice the memory of the in-place algorithm which will either increase cache memory usage or slow the algorithm down for wide inputs.
Mixed strategy: Optimal algorithms for the WHT use mixed strategies to obtain maximum compute speed by combining naive, out-of-place and in-place operations.
Comments
Post a Comment