Communications and Signal Processing Seminar
Principles of Deep Representation Learning via Neural Collapse
This event is free and open to the publicAdd to Google Calendar
Abstract: We provide the first global optimization landscape analysis of Neural Collapse — an intriguing empirical phenomenon that arises in the last-layer classifiers and features of neural networks during the terminal phase of training. As recently reported by Papyan et al., this phenomenon implies that (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within- class variability of last-layer activations collapses to zero. We study the problem based on a simplified unconstrained feature model, which isolates the topmost layers from the classifier of the neural network. In this context, we show that the classical cross-entropy loss with weight decay has a benign global landscape, in the sense that the only global minimizers are the Simplex ETFs while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. In contrast to existing landscape analysis for deep neural networks which is often disconnected from practice, our analysis of the simplified model not only does it explain what kind of features are learned in the last layer, but it also shows why they can be efficiently optimized in the simplified settings, matching the empirical observations in practical deep network architectures. These findings could have profound implications for optimization, generalization, and robustness of broad interests and beyond.
The talk is based upon one NeurIPS’21 spotlight paper, one ICML’22 paper, and two other works under submission.
Bio: Qing Qu is an assistant professor in EECS department at the University of Michigan. Prior to that, he was a Moore-Sloan data science fellow at Center for Data Science, New York University, from 2018 to 2020. He received his Ph.D from Columbia University in Electrical Engineering in Oct. 2018. He received his B.Eng. from Tsinghua University in Jul. 2011, and a M.Sc.from the Johns Hopkins University in Dec. 2012, both in Electrical and Computer Engineering. He interned at U.S. Army Research Laboratory in 2012 and Microsoft Research in 2016, respectively. His research interest lies at the intersection of foundation of data science, machine learning, numerical optimization, and signal/image processing, with focus on developing efficient nonconvex methods and global optimality guarantees for solving representation learning and nonlinear inverse problems in engineering and imaging sciences. He is the recipient of Best Student Paper Award at SPARS’15 (with Ju Sun, John Wright), and the recipient of Microsoft PhD Fellowship in machine learning. He is the recipient of the NSF Career Award in 2022.
***Event will take place in hybrid format. The location for in-person attendance will be room 1008 EECS. Attendance will also be available via Zoom.
Join Zoom Meeting https: https://umich.zoom.us/j/91414297851
Meeting ID: 914 1429 7851
Passcode: XXXXXX (Will be sent via e-mail to attendees)
Zoom Passcode information is also available upon request to Michele Feldkamp (email@example.com).
This seminar will be recorded. The recording will be posted to the CSP Seminar website