Dissertation Defense

Domain-Specific Hardware/Architecture for Emerging Applications

Zhehong Wang
3316 EECS BuildingMap


Technology scaling has driven the development of the computing industry during the past 50 years. However, as soon as we reach the power and memory wall, Moore’s Law will start to lose its magic power. The lack of “free” performance gain by simply scaling the technology implies that architecture and circuit designers will have to squeeze every single drop of juice from available technology nodes to face the challenge, requiring significant effort to develop innovative architectures and circuits. One solution is to trade off the programmability and flexibility of current microprocessors for a more optimized data and control flow of a specific application, and thus, the concept of domain-specific hardware came about. Though not a brand-new concept, as epitomized by graphics processors, highly computation-hungry ML applications, which have thrived in recent years, have benefited greatly from it with respect to both performance and energy.

This thesis presents three different domain-specific solutions for various emerging applications, including DNA sequencing, ML, and post quantum cryptography/homomorphic encryption, each of which employs different optimization schemes. The first application-specific solution demonstrates a seed-extension accelerator for next-generation sequencing in 55nm process technology with a recently proposed automata architecture. With an array of 25×25 custom-designed processing elements, it performs 2.46M reads/s, rendering a 1581x improvement in power efficiency compared to a system with dual-socket Xeon E5-2597 v3 server processors. The second prototype presents an RRAM and model compression-based DNN accelerator in 22nm process that features algorithm, architecture, and circuit optimizations. It achieves 16 million 8bit (decompressed) on-chip weights with the 24Mb RRAM, eliminating the energy-consuming off-chip memory access. The last work proposes and implements an architecture for accelerating third-generation FHE with AWS cloud FPGAs. A novel unbalanced PSI protocol based on third-generation FHE, optimized for the proposed hardware architecture, is introduced. The measurement results show that the proposed accelerator achieves >21× performance improvement compared to a software implementation for various crucial subroutines of third-generation FHE and the proposed PSI.

Chair: Professor David Blaauw