Compression and Curriculum Strategies for Efficient Learning in Deep Neural Networks
This event is free and open to the publicAdd to Google Calendar
The life cycle of developing deep learning solutions consists of five phases: 1) Data collection, 2) Architecture prototyping, 3) Training, 4) Analysis, and 5) Deployment. There is a significant cost—both human and computational—in all phases of this life cycle. Given the increasing dominance of deep learning applications, understanding and reducing these costs while maintaining high levels of performance and robustness where possible would have a significant impact. To that end, this dissertation proposes new techniques that improve the efficiency and performance of two core phases of the life cycle-prototyping architectures and model training.
Prototyping deep neural network (DNN) architectures for edge devices is often done manually or through architecture search. However, these methods have slow convergence times, on the order of days or more, and require large-scale resources. In this dissertation, we propose using conditional mutual information for DNN pruning, which efficiently adapts existing architectures to match hardware constraints. Using a probabilistic modeling approach to measure the amount of information passed between layers and balancing its contributions with those from the underlying weight matrix, we obtain a hybrid formulation that effectively reduces the redundancy between layers. The results showcase an automated pruning pipeline that shortens the development and testing cycle for DNNs, without compromising performance; while simultaneously reducing their memory footprint and computational load.
Continuing the emphasis on efficiency, in this dissertation, we apply feature-based curriculum learning to concurrently tackle an array of goals associated with training: Performance, Efficiency, and adversarial Robustness. The difference in features obtained by adding noise to the input highlights samples close to the decision boundary and having a strong influence on the learning process of a DNN. By highlighting and removing such samples, our results show an improvement in performance and adversarial robustness while decreasing the number of FLOPs consumed during training. By imposing multiple constraints during development and training, we enable shorter and more resource-friendly development of DNNs.
Co-Chairs: Professor Jason Corso and Assistant Professor Andrew Owens