Systems Seminar - ECE
Innovation Pursuit and Spatial Random Sampling: New Unsupervised Data Clustering and Data Summarization Tools
Add to Google Calendar
Unsupervised learning is mainly concerned with discovering the hidden structure of data from the unlabeled data in which the data categorization is not included. The hidden and low-dimensional structures that prevail much of the data can yield representations that are more concise than the original observations. In this talk, two new unsupervised learning tools are presented. The first tool, Innovation Pursuit, is a data clustering method which is based on a new geometrical solution for the subspace clustering problem. Innovation Pursuit finds the subspaces by solving a set of simple linear optimization problems, each searching for some direction of innovation in the span of the data that is potentially orthogonal to all subspaces except for the one to be identified in one step of the algorithm. Innovation pursuit is the first provable subspace clustering method whose complexity is linear with the number of data points and it often outperforms the state-of-the-art subspace clustering algorithms, more so for subspaces with significant intersections. Innovation Pursuit can be integrated with spectral clustering and it yields the state-of-the-art result for the problem of face clustering using subspace segmentation. In the second part of the talk, an unsupervised data summarization tool, dubbed Spatial Random Sampling (SRS), is presented. SRS addresses an important shortcoming of the unsupervised column sampling approaches in their ability to preserve the spatial distribution of the data. SRS introduces a new data sketching idea in which the random data sampling is performed in the spatial domain. The most compelling feature of SRS is that the corresponding probability of sampling from a given data cluster is proportional to the surface area the cluster occupies on the unit sphere, independently from the size of the cluster population. Although it is fully randomized, SRS is shown to provide descriptive and balanced data representations.
Mostafa Rahmani, PhD candidate, received the B.S. and M.S. degrees in 2010 and 2012, respectively, from Sharif University of Technology, both in electrical engineering. He started his PhD at the University of Central Florida in 2014. His research interests include provable high dimensional matrix decomposition, anomaly detection, data summarization, and dynamic deep neural networks. Mostafa Rahmani was the UCF College of Engineering and Computer Science nominee for the Order of Pegasus and his research findings served as the solid preliminary results for a recently awarded NSF grant. He is the recipient of multiple awards, including the UCF Donaldson Scholarship and the Doctoral Research Innovation Scholarship.