AbstractsComputer Science

Temperature, energy and performance: addressing embedded system challenges through fast cache simulation

by Josef Schneider




Institution: University of New South Wales
Department: Computer Science & Engineering
Year: 2015
Keywords: FPGA; CPU Cache; Simulation
Record ID: 1070949
Full text PDF: http://handle.unsw.edu.au/1959.4/54412


Abstract

Temperature, energy and performance are essential design considerations during the conception of modern digital systems. The work presented in this thesis focusses on three aspects that can be used to overcome these limitations. First an evaluation of the suitability of the dynamic application adaptation method is researched with the aim of using it to control the temperature of a Field Programmable Gate Array (FPGA) device. Despite the use of an extremely adaptive custom JPEG encoder, it was determined that application adaptation alone is ineffective in an FPGA for thermal management. Next, a study is performed which aims to assess which components are principally responsible for the rise in temperatures in FPGAs. It was found that the external memory interface is a significant heat-source in FPGA-based embedded systems, and that device temperature correlates with CPU cache miss rate. The third and main aspect covered in this dissertation is the speeding up of CPU cache simulation. Single pass cache simulation is a tool that can be employed at design time to select a cache yielding acceptable temperature, system performance and energy consumption. Three Multiple cAche Simulators in Hardware (MASH) or in Software (MASS) are proposed for three cache replacement policies: MASH{lru} for the Least Recently Used (LRU) cache algorithm, MASH{fifo} for First In First Out (FIFO) and MASS{plrut} for Pseudo Least Recently Used tree (PLRUt). The former two are novel in that they are implemented in hardware and are respectively 53x and 11.10x faster than software counterparts. The PLRUt simulator presents for the first time an optimised hash table-based algorithm yielding a speedup of 1.93x over an unoptimised solution. All cache simulators employ cache properties specific to their replacement policies to improve simulator characteristics. Additionally, it is shown that the hardware (or MASH) simulators can be implemented in-system alongside an embedded system, allowing for the direct trace extraction and cache simulation from within an FPGA. Using in-system simulation, large speedups can be achieved as trace generation and multiple cache simulation happen at the same time at high frequencies.