Faculty Candidate Seminar

Scaling Software and Hardware for Thousand-Core Systmes

Daniel SanchezPhD CandidateStanford University
SHARE:

Scaling multicores to thousands of cores efficiently requires significant
innovation across the software-hardware stack. On one hand, to expose ample
parallelism, many applications will need to be divided in fine-grain tasks of
a few thousand instructions each, and scheduled dynamically in a manner that
addresses the three major difficulties of fine-grain parallelism: locality,
load imbalance, and excessive overheads. On the other hand, hardware resources
must scale efficiently, even as some of them are shared among thousands of
threads. In particular, the memory hierarchy is hard to scale in several ways:
conventional cache coherence techniques are prohibitively expensive beyond a
few tens of cores, and caches cannot be easily shared among multiple threads
or processes. Ideally, software should be able to configure these shared
resources to provide good overall performance and quality of service (QoS)
guarantees under all possible sharing scenarios.

In this talk, I will present several techniques to scale both software and
hardware. First, I will describe a scheduler that uses high-level information
from the programming model about parallelism, locality, and heterogeneity to
perform scheduling dynamically and at fine granularity to avoid load
imbalance. This fine-grain scheduler can use lightweight, flexible hardware
support to keep overheads small as we scale up. Second, I will present a set
of techniques that, together, enable scalable memory hierarchies that can be
shared efficiently: ZCache, a cache design that achieves high associativity
cheaply (e.g., 64-way associativity with the latency, energy and area of a
4-way cache) and is characterized by simple and accurate analytical models;
Vantage, a cache partitioning technique that leverages the analytical
guarantees of ZCache to implement scalable and efficient partitioning,
enabling hundreds of threads to share the cache in a controlled manner,
providing configurability and isolation; and SCD, which leverages ZCache to
implement scalable cache coherence with QoS guarantees.
Daniel Sanchez is a PhD candidate in the Electrical Engineering Department at
Stanford University. His research focuses on large-scale multicores,
specifically on scalable and dynamic fine-grain runtimes and schedulers,
hardware support for scheduling, scalable and efficient memory hierarchies,
and architectures with QoS guarantees. He has earned an MS in Electrical
Engineering from Stanford, and a BS in Telecommunication Engineering from the
Technical University of Madrid (UPM).

Sponsored by

CSE