Dissertation Defense

Toward Effective Neural Architectures and Algorithms for Generalizable Deep Learning

Mingchen Li
WHERE:
1180 DuderstadtMap
SHARE:
Mingchen Li Defense Photo

Passcode: 532493

 

The research presented in this thesis demonstrates that overparameterized models can be guided to robust generalization through the use of (i) regularization, (ii) bilevel optimization, and (iii) principled architecture design. Through a series of theoretical and experimental studies, we demystify how early stopping regularization can prevent optimization processes from overfitting to label noise in overparameterized models. We then investigate bilevel optimization, a framework that contains two levels of hierarchical objectives (upper and lower levels), where optimizing the upper-level objective requires solving the lower-level problem. By applying bilevel optimization to hyperparameter optimization, our efficient algorithms for imbalanced data and neural architecture search demonstrate that optimizing a minimal set of hyperparameters can prevent overfitting and improve generalization. Finally, we study principled architecture design to facilitate generalization. Through mechanistic tasks in language modeling, such as associative recall (AR) and copying, we propose a convolution-augmented transformer (CAT) architecture that provably solve these tasks using a single layer and also guarantee length generalization. Experiments show CAT also improves language modeling performance on real datasets.

 

Chair: Professor Samet Oymak