AbstractsComputer Science

Simulation-Aided Performance Evaluation of Input/Output Optimizations for Distributed Systems

by Michael Kuhn

Institution: Universit├Ąt Heidelberg ; Thes
Year: 0
Record ID: 1099862
Full text PDF: http://archiv.ub.uni-heidelberg.de/volltextserver/9871/


The performance of parallel cluster file systems suffers from many clients executing a large number of operations in parallel, because the I/O subsystem can be easily overwhelmed by the sheer amount of incoming I/O operations. This, in turn, can slow down the whole distributed system. Many optimizations exist that try to alleviate this problem. Client-side optimizations do preprocessing to minimize the amount of work the file servers have to do. Server-side optimizations use server-internal knowledge to improve performance. The PIOsimHD framework contains components to simulate, trace and visualize applications. It is used as a testbed to implement optimizations that could later be implemented in real-life projects. The main focus of this thesis lies on comparing existing client-side optimizations and newly implemented server-side optimizations like Server-Directed I/O, which provides server-side optimizations for both read and write operations. It chooses the order of I/O operations and tries to aggregate as many operations as possible to decrease the load on the I/O subsystem and improve overall performance. The Interleaved Two-Phase protocol is a modification of ROMIO's Two-Phase protocol, which only accesses contiguous file regions. HDSunshot is used to visualize and analyze some of the results. It is also used to evaluate different optimization techniques by analyzing the resulting traces. The results show that client-side optimizations do not necessarily beat server-side optimizations in terms of performance, but suggest that even simple server-side optimizations are good enough for many use cases. Integrating such optimizations into parallel cluster file systems could alleviate the need for sophisticated client-side optimizations. Due to their additional knowledge of internal workflows server-side optimizations may be better suited to provide high performance in general.