Communications and Signal Processing Seminar
Clustering and Classification via Lossy Data Compression
For many problems in computer vision, image processing, and pattern recognition, we need to process and analyze massive amounts of high-dimensional mixed data such as images and gene expression data. By “mixed data,” we mean that the given data set consists of multiple heterogeneous subsets (which have different geometric or statistical characteristics) but each subset can be more easily modeled or represented than the whole data set together.
In this talk, we address two fundamental questions: “How to cluster and classify such high-dimensional mixed data?” We contend that both the (unsupervised) clustering and (supervised) classification problems can be cast as a lossy data compression problem and solved efficiently within a unified mathematical framework. In theory, this approach offers some distinguished advantages over conventional methods for clustering and classification, especially in dealing with several difficult issues that often arise in practice: regularization of degenerate distributions, selection of models with different complexities, and rejection of outliers.
Our work establishes a strong connection between information theory, especially the rate-distortion theory, with data clustering and classification, and it leads to extremely simple but effective algorithms. We will demonstrate the success of these algorithms in a few popular but difficult problems, including but not limited to natural image segmentation, microarray data clustering, handwritten digits and face recognition.
Yi Ma is an associate professor at the Electrical & Computer Engineering Department of the University of Illinois at Urbana-Champaign. His research interests include computer vision and systems theory. Yi Ma received two Bachelors’ degrees in Automation and Applied Mathematics from Tsinghua University (Beijing, China) in 1995, a Master of Science degree in EECS in 1997, a Master of Arts degree in Mathematics in 2000, and a PhD degree in EECS in 2000, all from the University of California at Berkeley. Yi Ma received the David Marr Best Paper Prize at the International Conference on Computer Vision 1999 and the Longuet-Higgins Best Paper Prize at the European Conference on Computer Vision 2004. He also received the CAREER Award from the National Science Foundation in 2004 and the Young Investigator Award from the Office of Naval Research in 2005. He is an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence.