Dissertation Defense

Design of Configurable and Extensible Accelerator Architectures for Machine Learning Algorithms

Chester Liu

Machine learning has gained a lot of attention over the past few years because of the wide range of applications it can be applied to. However, machine learning algorithms are typically computation-intensive and require hardware acceleration for them to be usable in real-time. In this work, a comprehensive comparison of different accelerator architectures for sparse coding is conducted to identify the most efficient architecture. And a novel convolution computation method was proposed to support convolution for a variable kernel size using a fixed number of compute elements.

As the technology node continues to shrink, the design effort and manufacturing cost of a chip are becoming prohibitively high, thereby limiting the scale of a single chip hardware accelerator. A 2.5D integration technology allows one to construct a scalable and extensible hardware system using chiplets. In this work, an Advanced Interface Bus (AIB) chiplet is designed and fabricated. A silicon interposer is built to demonstrate homogeneous integration of chiplets. The chiplet is also verified with an Intel Stratix 10 FPGA, demonstrating heterogeneous integration of chiplets and the inter-operability of the AIB interface. A chiplet data transfer protocol, called University of Michigan AIB Interface (UMAI), is designed as an IP that provides a clean and simple interface to the user applications.

Chair: Professor Zhengya Zhang