Fast Walsh Hadamard Transform (out-of-place, fixed A-stride)

 

Fast WHT Algorithm (out-of-place, fixed A-stride)

Fast Walsh Hadamard transform (out-of-place):

Let n=2mn=2^m. For each stage t=1,,m:

  • Read AA in adjacent pairs: (A[0],A[1]),(A[2],A[3]),,(A[n2],A[n1])(A[0],A[1]),(A[2],A[3]),\ldots,(A[n-2],A[n-1]).

  • Write sums sequentially into B[0],B[1],,B[n/21]B[0],B[1],\ldots,B[n/2-1].

  • Write diffs sequentially into B[n/2],B[n/2+1],,B[n1]B[n/2],B[n/2+1],\ldots,B[n-1].

  • Swap ABA\leftrightarrow B.

Equivalently, the read stride is always 1; the write pointers are two linear cursors: wSum starting at 0 and wDiff starting at n/2n/2.





Practical notes & perks

  • Cache/stream-friendly: Writing all sums contiguously and all diffs contiguously improves write combining and sequential bandwidth versus interleaving.

  • Vectorization: The inner loop is a pure stride-1 read with two monotone write cursors — perfect for SIMD and wide loads/stores.

  • 2*memory: Uses twice the memory of the in-place algorithm which will either increase cache memory usage or slow the algorithm down for wide inputs.

  • Mixed strategy: Optimal algorithms for the WHT use mixed strategies to obtain maximum compute speed by combining naive, out-of-place and in-place operations. 





Comments

Popular posts from this blog

Neon Bulb Oscillators

23 Circuits you can Build in an Hour - Free Book

Q Multiplier Circuits