AbstractsComputer Science

Runtime Adaptive Scrubbing in Fault-Tolerant Network-on-Chips (NoC) Architectures

by Travis H. Boraten

Institution: Ohio University
Department: Electrical Engineering & Computer Science (Engineering and Technology)
Degree: MS
Year: 2014
Keywords: Computer Engineering; Electrical Engineering; RAS; NoC; Network on Chip; Fault tolerance; Error correction codes; Adaptive error correction; Adaptive encoding scheme; Fault scrubbing
Record ID: 2025315
Full text PDF: http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1397488496


As aggressive scaling continues to push Multi-Processor System-on-Chips (MPSoCs) to new limits, complex hardware structures and stringent area and power constraints will continue to diminish reliability. Waning reliability in integrated circuits will increase the susceptibility of transient and permanent faults. There is an urgent demand for adaptive Error Correction Coding (ECC) schemes in Network-on-Chips (NoCs) to provide fault tolerance and improve overall resiliency of MPSoC architectures. The goal of adaptive ECC schemes should be to maximize power savings when faults are infrequent and increase application speedup by boosting fault coverage when faults are frequent. In this thesis, I propose Runtime Adaptive scrubbing (RAS), a novel multi-layered error correction and detection scheme with a three mode area efficient configurable encoder for encoding packets on the switch-to-switch (s2s) layer, thus preventing faults from accumulating up the network stack and onto the end-to-end (e2e) layer. As fault rates fluctuate I propose a dynamic methodology for improving fault localization and intelligently adapting fault coverage on demand to sustain graceful network degradation. RAS successfully improves network resiliency, fault localization, and fault coverage compared to traditional static switch-to-switch (s2s) schemes. Simulation results demonstrate that static switching RAS improves network speedup by 10% for Splash-2/PARSEC benchmarks on a 8 x 8 mesh network while reducing area overhead by 15% and incurring on average 6.6% power penalty. Further, my dynamic ECC scheme maintains 97.88% of performance and incurs on average 20% power penalty.