Geometric Representations of High Dimensional Random Data
Add to Google Calendar
Abstract: This thesis introduces geometric representations relevant to the analysis of datasets of random vectors in high dimension. These representations are used to study the behavior of near-neighbor clusters in the dataset, shortest paths through the dataset, and evolution of multivariate probability distributions over the dataset. The results in this thesis have wide applicability to machine learning problems and are illustrated for problems including: spectral clustering; dimensionality reduction; activity recognition; and video indexing and retrieval.
This thesis makes several contributions. The first contribution is the shortest path over random points in a Riemannian manifold. More precisely, we establish complete convergence results of power-weighted shortest path lengths in compact Riemannian manifolds to conformal deformation distances. These shortest path results are used to interpret and extend Coiffman's anisotropic diffusion maps for clustering and dimensionality reduction. The second contribution is statistical manifolds that describe differences between curves evolving over a space of probability measures. A statistical manifold is a space of probability measures induced by the Fisher-Riemann metric. We propose to compare smoothly evolving probability distributions in statistical manifold by the surface area of the region between a pair of curves. The surface area measure is applied to activity classification for human movements. The third contribution proposes a dimensionality reduction and cluster analysis framework that uses a quantum mechanical model. This model leads to a generalization of geometric clustering methods such as k-means and Laplacian eigenmap in which the logical equivalence relation "two points are in the same cluster" is relaxed to a probabilistic equivalence relation.