Exploiting Sparsity, Compression and In-Memory Compute in Designing Data-Intensive Accelerators
Add to Google Calendar
Enabled by technology scaling, processing parallelism has been continuously increased to meet the demand of large-scale and data-intensive computations. However, the effort to increase processing parallelism is largely hindered by the von Neumann bottleneck, which refers to the memory bandwidth that cannot be scaled up efficiently to keep up with the processing parallelism. To reduce the data transfer cost, closer integration between memory and computation is needed, which ultimately leads to the in-memory computing approach. This thesis work presents two approaches to address the von Neumann bottleneck: 1) reducing the amount of data that needs to be moved by sparsity and data compression; and 2) robust multi-bit in-memory compute design to extend the applicability to a wider range of applications. This work shows the importance of algorithm-architecture-circuit co-design for uncovering opportunities to mitigate and remove the von Neumann bottleneck. The design techniques and approaches can apply to a wide array of applications for improving performance and efficiency.