|Institution:||University of California – Irvine|
|Keywords:||Computer engineering; Data Shepherding; dynamic mapping; large scale cache design; multi-bank cache design; power efficiency; scratchpad memory|
|Full text PDF:||http://www.escholarship.org/uc/item/4bq2t8q5|
The issue of the power wall has had a drastic impact on many aspects of system design. Even though frequency scaling is limited because Dennard scaling stopped a few years ago, new technologies and design trends such as 3D stacking and heterogeneous multi-core architectures still offer us opportunities to integrate more logic to increase performance. However, these new trends have increased design complexity significantly, especially in terms of satisfying various system requirements and system goals: power efficiency must be handled carefully while scaled resources must be managed to improve performance. Sometimes system scaling itself conflict with system goals. Increased chip resources and core counts make fair resource sharing more notable issues, and the increased number of heterogeneous architecture makes programming more challenging by requiring more programming styles (e.g., stream programming, SPMD style programming), API, etc. We suggest here that multiple design goal issues could be handled more efficiently via explicit resource addressing. This idea is applied to the shared last level cache design and we call this approach the Data Shepherding cache. We show how the Data Shepherding cache design solves some of the design issues of integrating mechanisms from multiple goals especially when we consider scaling in capacity and performance. The properties of the Data Shepherding cache are evaluated in two ways with open source tools. CACTI from HP.com simulates cache models to obtain cache parameters such as access time, power consumption, etc. Further analysis with the model parameters shows that the Data Shepherding cache has 10% smaller area footage, which leads to a 7% leakage power reduction over the Static NUCA cache. We analyze performance using the GEM5 simulator from gem5.org which models system architecture. The results show a 12.6% performance improvement with selected Spec2000 benchmarks with 2MB bank, 128 MB total shared cache size.