Change of Basis Transforms - Machine Learning & Neural Networks




Change-of-basis transforms are interesting in machine learning and neural networks because they re-express data in a new coordinate system, and sometimes the new basis reveals patterns, structure, or computational shortcuts that were hidden in the original one.

Here’s why they matter and where the “interesting” comes from:


1. Geometry: different coordinates, same space

  • A change of basis is just a rotation/reflection (orthogonal case) or a more general invertible linear transformation.

  • It preserves all the information in the vector — nothing is lost — but the representation changes.

  • In ML, this means you can choose a basis that makes the structure of your problem easier to work with.


2. Sparsity and interpretability

  • Some bases make signals sparse — most coefficients are zero or small.

    • Example: images are not sparse in the pixel basis, but often sparse in a wavelet or DCT basis.

  • Sparsity is gold in ML: it helps with compression, denoising, and learning from fewer examples.


3. Separating important from irrelevant components

  • In the right basis, important signal directions can be isolated from noise directions.

  • PCA (Principal Component Analysis) is essentially a data-driven change of basis that aligns axes with the directions of largest variance.

  • This is useful for dimensionality reduction and avoiding overfitting.


4. Computational efficiency

  • Some fixed transforms (FFT, FWHT, DCT) can be applied in O(NlogN)O(N \log N) instead of O(N2)O(N^2) — a huge speed gain.

  • In neural networks, these can be structured weight matrices:

    • Example: Replace a dense layer with HDP (Hadamard × diagonal random sign × permutation) for fast random projections.

  • You get something like the same “expressiveness” but with far fewer multiplications.


5. Random features and kernels

  • Change-of-basis transforms with randomly chosen basis vectors can approximate kernels or preserve geometry (JL lemma).

  • Random Fourier features, Fastfood transform, and related techniques are structured changes of basis that make kernel methods scalable.


6. Information flow in networks

  • In a deep net, each layer can be seen as a change of basis + nonlinearity.

  • In the NTK/infinite-width perspective, the initial random weights define a fixed basis in function space, and learning adjusts coefficients.

  • Orthogonal basis changes can improve gradient flow and help avoid signal collapse.


7. Connections to physics and signal processing

  • In physics, choosing the right basis (Fourier, eigenmodes) can turn hard problems into separable ones.

  • In ML, the same logic applies: if your task has a symmetry or repeating structure, picking the right basis makes it easier for the network to learn.


Core takeaway:
Change-of-basis transforms in ML are powerful because they can make patterns more visible, computations faster, features sparser, and training easier — all without losing information. In many cases, the “magic” is choosing a basis that matches the problem’s structure, whether that’s learned (PCA, learned embeddings) or fixed (Fourier, Hadamard, wavelets).

Comments

Popular posts from this blog

Neon Bulb Oscillators

23 Circuits you can Build in an Hour - Free Book

Q Multiplier Circuits