Electrical and Computer Engineering
menu MENU

Communications and Signal Processing Seminar

(Remote) A mean-field theory of lazy training in two-layer neural nets: entropic regularization and controlled McKean-Vlasov dynamics

Maxim RaginskyAssociate Professor, William L. Everitt FellowDepartment of Electrical and Computer Engineering, Coordinated Science Laboratory, University of Illinois at Urbana-ChampaignAffiliateCenter for Advanced Electronics through Machine LearningAffiliateCenter for the Science of Information
WHERE:
Remote/Virtual
SHARE:

Abstract:  This talk, based on joint work with Belinda Tzen, will focus on the problem of universal approximation of functions by two-layer neural nets with random weights that are “nearly Gaussian” in the sense of Kullback-Leibler divergence. This problem is motivated by recent works on lazy training, where the weight updates generated by stochastic gradient descent do not move appreciably from the i.i.d. Gaussian initialization. We first consider the mean-field limit, where the finite population of neurons in the hidden layer is replaced by a continual ensemble, and show that our problem can be phrased as global minimization of a free-energy functional on the space of probability measures over the weights. This functional trades off the $L^2$ approximation risk against the KL divergence with respect to a centered Gaussian prior. We characterize the unique global minimizer and then construct a controlled nonlinear dynamics in the space of probability measures over weights that solves a McKean–Vlasov optimal control problem. This control problem is closely related to the Schrödinger bridge (or entropic optimal transport) problem, and its value is proportional to the minimum of the free energy. Finally, we show that SGD in the lazy training regime (which can be ensured by jointly tuning the variance of the Gaussian prior and the entropic regularization parameter) serves as a greedy approximation to the optimal McKean–Vlasov distributional dynamics and provide quantitative guarantees on the $L^2$ approximation error.

Biography:  Maxim Raginsky received the B.S. and M.S. degrees in 2000 and the Ph.D. degree in 2002 from Northwestern University, all in Electrical Engineering. He has held research positions with Northwestern, the University of Illinois at Urbana-Champaign (where he was a Beckman Foundation Fellow from 2004 to 2007), and Duke University. In 2012, he has returned to the UIUC, where he is currently an Associate Professor and William L. Everitt Fellow with the Department of Electrical and Computer Engineering. He also holds appointments in the Coordinated Science Laboratory and in the Department of Computer Science. His research interests cover probability and stochastic processes, deterministic and stochastic control, machine learning, optimization, and information theory. Much of his recent research is motivated by fundamental questions in modeling, learning, and simulation of nonlinear dynamical systems, with applications to advanced electronics, autonomy, and artificial intelligence.

REMOTE MEETING INFORMATION:

Join Zoom Meeting
https://umich.zoom.us/j/907824926

Meeting ID: 907 824 926

One tap mobile
+13126266799,,907824926# US (Chicago)
+16468769923,,907824926# US (New York)

Dial by your location
+1 312 626 6799 US (Chicago)
+1 646 876 9923 US (New York)
+1 346 248 7799 US (Houston)
+1 669 900 6833 US (San Jose)
+1 253 215 8782 US
+1 301 715 8592 US
Meeting ID: 907 824 926
Find your local number: https://umich.zoom.us/u/adltF4bqLf