The Convex Hull and the Manifold - neural network training and test data
1. Convex Hull of Training Data
-
Definition (geometry view):
The convex hull of a set of points is the smallest convex set containing them. Imagine stretching a rubber band around your scattered data points in feature space — the shape you get is the convex hull. -
In the context of image data:
If each image is a point in a very high-dimensional space (e.g., a RGB image = 3,072-D vector), the convex hull is a gigantic, convex polytope containing all your training examples.
Importantly:-
Many points inside the convex hull do not correspond to valid, natural images.
-
Linear combinations of training images (as in convex combinations) often produce weird blends, not real-world photos.
-
-
Neural network connection:
The convex hull is relevant for linear models or models that behave nearly linearly in some regions: if the test point lies outside the convex hull of the training data, you’re extrapolating, which is risky.
But most deep nets learn nonlinear decision boundaries that can handle points outside the convex hull if they lie on the same low-dimensional manifold.
2. Low-Dimensional Manifold for Natural Images
-
Definition (manifold hypothesis):
Natural images, despite being high-dimensional vectors, occupy a thin, curved subset of the huge ambient space.
This manifold has much lower intrinsic dimensionality — perhaps dozens or hundreds of degrees of freedom, instead of thousands. -
Why this happens:
Physical constraints (lighting, shapes, textures, viewpoint, etc.) and semantics greatly limit the possible configurations of pixels.
Random points in pixel space are almost always noise — far off the natural image manifold. -
Training and test data location:
Both lie on or near the same manifold.
Good generalization happens because the test set samples the same manifold as the training set, even if it lies outside the convex hull of the training samples.
3. Relationship Between Convex Hull & Manifold
-
Key idea:
The convex hull is a crude, convex wrapper around your training points. The low-dimensional manifold is a much thinner, curved structure winding through space.
The manifold typically pokes outside the convex hull in some places, and is mostly empty space inside the convex hull except along its surface. -
Reasoning intuition in high dimensions:
-
In very high dimensions, the convex hull of many points still occupies an infinitesimally small fraction of the space.
-
The manifold is even smaller — it’s like a filament or sheet inside the hull.
-
The hull contains many “interpolations” between real images that aren’t realistic at all — they’re just valid convex combinations, not valid points on the manifold.
-
Neural networks don’t simply interpolate in the convex hull; they learn to follow the manifold structure.
-
Comments
Post a Comment