Dissertation Defense
Toward Effective Neural Architectures and Algorithms for Generalizable Deep Learning
This event is free and open to the publicAdd to Google Calendar
Passcode: 532493
The research presented in this thesis demonstrates that overparameterized models can be guided to robust generalization through the use of (i) regularization, (ii) bilevel optimization, and (iii) principled architecture design. Through a series of theoretical and experimental studies, we demystify how early stopping regularization can prevent optimization processes from overfitting to label noise in overparameterized models. We then investigate bilevel optimization, a framework that contains two levels of hierarchical objectives (upper and lower levels), where optimizing the upper-level objective requires solving the lower-level problem. By applying bilevel optimization to hyperparameter optimization, our efficient algorithms for imbalanced data and neural architecture search demonstrate that optimizing a minimal set of hyperparameters can prevent overfitting and improve generalization. Finally, we study principled architecture design to facilitate generalization. Through mechanistic tasks in language modeling, such as associative recall (AR) and copying, we propose a convolution-augmented transformer (CAT) architecture that provably solve these tasks using a single layer and also guarantee length generalization. Experiments show CAT also improves language modeling performance on real datasets.
Chair: Professor Samet Oymak