CSE researchers win best paper award at HPCA 2021

The paper introduces new hardware and software design principles to improve the performance of several important large-scale irregular workloads.

Nihil Talati Enlarge
CSE PhD student Nihil Talati was the paper's lead author.

A research team based at CSE has won the best paper award at the IEEE International Symposium on High-Performance Computer Architecture (HPCA) 2021 for their paper entitled, Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design.”

The paper’s lead author is CSE PhD student Nishil Talati. Additional authors include CSE PhD students Armand Behroozi, Yichen Yang, Tarunesh Verma, and Brandon Nguyen; CSE faculty Ron Dreslinski, Trevor Mudge, Todd Austin, and Scott Mahlke; and CSE alumni Kyle May and Agreen Ahmadi.

The award was given based on unanimous agreement by five program committee members, who noted that “this work opens new doors in defining the hardware/software interface and is likely to inspire significant additional future work.

Big data analytics form core components of the digital services we use everyday. These services analyze usage data to improve their offerings and provide personalized experiences. For example, delivery services (e.g., Amazon, Instacart) optimize their inventory and delivery routes to fulfill our orders quickly and cheaply; streaming platforms like Netflix employ powerful recommendation systems to keep us engaged.

Such services capture relationships between the huge amounts of raw data as “graphs,” which are then encoded and stored in sparse representations to optimize the memory utilization. For example, social media (e.g., Facebook, Twitter) graphs store user information with Terabyes of data, which is difficult to store and access efficiently in fast computer memory (called caches) that has limited capacity (typically in Megabytes). Consequently, these applications have irregular and unpredictable memory accesses over large amounts of data. This is a problem, because (a) the memory where this information is stored is very slow to access and (b) modern computers are well-suited for simpler encodings and workloads with regular memory access patterns, since they can predict what data the application would need over the course of its computation and prefetch it so that the computer is not starved for data.

This paper discusses these effects and shows that modern computers are constantly stalled on such applications due to the memory system. To alleviate this problem, the authors present Prodigy, a low-cost hardware-software co-design solution. In addition to graph based workloads, Prodigy is shown to be effective for other irregular workloads from machine learning and scientific computing domains as well. Prodigy adapts a “best of both worlds” approach by employing both software and hardware techniques to do what they are best at. The heart of their proposal is a specialized hardware-software contract called a Data Indirection Graph (DIG) representation that efficiently encodes a program’s behavior. 

Prodigy has been rigorously evaluated using 29 irregular applications from graph analytics, machine learning, and scientific computing domains. The detailed evaluation shows that Prodigy cuts down the execution time of these workloads by more than two times, on average, at a negligible hardware cost. Additionally, Prodigy also outperforms other recently published works on hardware/software-based mechanisms addressing the same problem.

The full paper can be viewed here.

Explore:
Chip Design, Architecture, and Emerging Devices; Honors and Awards; Research News; Ronald Dreslinski; Scott Mahlke; Todd Austin; Trevor Mudge