Communications and Signal Processing Seminar

Streaming PCA with Missing Data

Laura BalzanoAssistant ProfessorUniversity of Michigan Department of Electrical Engineering and Computer Science

For many modern applications in science and engineering, data are collected in a streaming fashion carrying time-varying information, and practitioners need to process them with a limited amount of memory and computational resources in a timely manner for decision making. This often is coupled with the missing data problem, such that only a small fraction of data attributes are observed. These complications impose significant, and unconventional, constraints on the problem of streaming Principal Component Analysis (PCA) and subspace tracking, which is an essential building block for many inference tasks in signal processing. In this talk I will survey a few approaches to this problem, including Brand's algorithm (incremental SVD for missing data), Oja's algorithm, the PETRELS algorithm (RLS subspace tracking), and the GROUSE algorithm (LMS subspace tracking constrained to the Grassmannian). I will discuss convergence theory for the GROUSE algorithm. Our work considers two sampling cases: where each data vector of the streaming matrix is fully sampled, or where it is undersampled by a sampling matrix At Â^^ Rm—n with m ‰ā n. For this work, we propose an adaptive stepsize scheme that depends only on the sampled data and algorithm outputs. We prove that with fully sampled exactly low-rank data, the stepsize scheme maximizes the improvement of our convergence metric at each iteration, and this method converges from any random initialization to the true subspace, despite the non-convex formulation and orthogonality constraints. In this case, however, a memory-limited incremental SVD is exact. Even so, these results provide the foundation for analyzing the case of noisy or undersampled data, in which we establish expected monotonic improvement on the defined convergence metric for each iteration, along with local convergence results. With time, I will also introduce a novel adaptive sampling framework that admits global convergence guarantees.
I am an Assistant Professor in the Electrical Engineering and Computer Science department at the University of Michigan, Ann Arbor.

My research projects are in statistical signal processing, matrix factorization, and optimization, particularly dealing with large and messy data. I have worked in the areas of online algorithms, non-convex formulations for matrix factorization, compressed sensing and matrix completion, network inference, and sensor networks. My interests are theoretical, however, my favorite mathematical problems are motivated by fascinating and important engineering problems.

Sponsored by


Faculty Host

Dave Neuhoff