AbstractsComputer Science

Embracing Explicit Communication in Work-Stealing Runtime Systems

by Andreas Prell




Institution: Universität Bayreuth
Department:
Year: 2016
Posted: 02/05/2017
Record ID: 2132348
Full text PDF: https://epub.uni-bayreuth.de/2990/


Abstract

Parallel computers are commonplace. The trend of increasing the number of processor cores highlights the importance of parallel computing: a single-threaded program uses a fraction of a modern processor's resources and potential, and that fraction will only decrease over the coming processor generations. Existing abstractions for writing parallel programs, such as threads and mutual exclusion locks, are difficult to understand, use, and reason about, making them a poor choice for mainstream parallel programming. Higher-level abstractions aim to achieve a more favorable division of labor between programmers and compilers/runtime systems, with programmers expressing and exposing parallelism and compilers/runtime systems managing parallel execution. A popular and effective abstraction is that of a task, a piece of work, usually a function or a closure, that is safe to execute in parallel with other tasks. Scheduling decisions, including the mapping of tasks to threads, are made by the runtime system and are not imposed on the programmer. Tasks are well-suited to express fine-grained parallelism, but whether fine-grained parallelism brings performance gains depends on the runtime system and its implementation. State-of-the-art runtime systems employ the scheduling and load balancing technique of work stealing, which is known to be efficient, both in theory and practice. In work stealing, idle workers, called thieves, request tasks from busy workers, called victims, thereby balancing the load. Most implementations of work stealing take advantage of shared memory by letting thieves 'steal' tasks from the double-ended queues (deques) of their victims. Modern multiprocessors feature increasingly complex architectures that make it challenging to implement efficient yet flexible work-stealing schedulers. Future manycore processors may have limited support for shared memory, or may rely on message passing for scalable inter-core communication, such as Intel's SCC research processor, a recent example of a 'cluster-on-a-chip'. This thesis aims to put work stealing based on message passing on a better, more practical foundation, developing techniques to rival the performance of concurrent deque-based implementations, while remaining more flexible. Work stealing based on message passing has been studied before, notably in the context of distributed systems, where MPI still dominates. We present a work-stealing scheduler in which workers communicate with each other through channels, a lightweight message passing abstraction that goes back to Hoare's Communicating Sequential Processes (CSP). Channels feature prominently in modern programming languages such as Go and Rust, which advocate messages to communicate, synchronize, and share state between threads. The advantage of using channels, as opposed to using low-level synchronization primitives, is that channels decouple the scheduler from processor-specific features, thereby increasing its flexibility. Large parts of this thesis are dedicated to making channel-based… Advisors/Committee Members: Rauber, Thomas (advisor).