Dissertation Defense

Design Techniques for Energy-Efficient and Scalable Machine Learning Accelerators

Teyuh Chou

Passcode: 001011

Machine learning is a key application driver of new computing hardware. The high-performance machine learning hardware requires a large number of operations and a high memory bandwidth. The energy efficiency of the hardware is often limited by the data movement and the memory access bottleneck. This dissertation presents two energy-efficient processing in memory approaches to overcome the memory access bottleneck. The first design presents a PIM architecture that connects multiply-accumulate RRAM arrays with buffer RRAM arrays to extend the processing dataflow. The second design presents an adaptive-range PIM to take advantage of bit-level sparsity in DNNs.

As machine learning models evolve with scale over time, the model size and complexity growths have outpaced the chip upgrades. Making large monolithic chips to keep up with the model evaluations and meet the computational requirements of these models is challenging. This dissertation presents a scalable chiplet-based integration approach that scales up machine learning hardware by reusing chiplets. Multiple modular chiplets are connected for data streaming and integrated into a package using 2.5D technology to build larger DNN hardware. The prototype chip accelerates the perception task for autonomous navigation efficiently by optimizing the dataflow and exploiting parallelism and process scheduling.

Chair: Professor Zhengya Zhang