Dissertation Defense

Efficient Deep Learning Accelerator Architectures by Model Compression and Data Orchestration

Jie-Fang Zhang

Modern neural networks (NNs) are often optimized at the algorithm level to reduce model complexity or expand their capability for new applications. Model compression techniques, such as data quantization, network sparsification, and tensor decomposition, can largely reduce the size and complexity of a model. Novel operations, like edge convolution, are incorporated into NNs to expand its capabilities in new areas, such as point-cloud recognition.  These algorithm-level optimizations introduce computation challenges at the hardware level, causing a low processing efficiency for existing hardware architectures.

In my dissertation, I present three accelerator architecture works that explore and overcome these computation challenges by exploiting model compression characteristics and data orchestration techniques.  The first work focuses on the challenge of irregular computation in unstructured sparse NN inference. I present the SNAP architecture that uses associative index matching and two-level partial-sum reduction to achieve high performance and efficiency.  The second work addresses challenges in graph-based point-cloud network processing. I present the Point-X architecture which extracts spatial locality from the point-cloud data to maximize intra-compute parallelism and minimize inter-compute data movement.  The third work focuses on the multi-dimensional tensor contractions in tensorized NNs. I present TetriX, an architecture and workload mapping co-design that can process all types of tensorized NNs flexibly and efficiently.

These architecture techniques will enable more efficient model storage and computation to capture the full benefits of compressed NNs and specific applications in point-cloud networks.

Chair: Professor Zhengya Zhang