AbstractsComputer Science

Data Partitioning and Placement Mechanisms for Elastic Key-Value Stores

by Han Li




Institution: University of New South Wales
Department: Computer Science & Engineering
Year: 2014
Keywords: Elasticity; Cloud computing; Key-value store; Data partitioning; Replica placement
Record ID: 1031898
Full text PDF: http://handle.unsw.edu.au/1959.4/53543


Abstract

Key-Value Stores (KVSs) have become a standard component for many web services and applications due to their inherent scalability, availability, and reliability. Many enterprises are now adopting them for use on servers leased from Infrastructure-as-a-Service (IaaS) providers. The defining characteristic of IaaS is resource elasticity. KVSs benefit from elasticity, when they incorporate new resources on-demand as KVS nodes to deal with increasing workload, and decommission excess resources to save on operational costs. Elasticity of a KVS poses challenges in allowing efficient, dynamic node arrivals and departures. On one hand, the workload needs to be quickly balanced among the KVS nodes. However, current data partitioning and migration schemes provide low priority to populate new nodes, thereby reducing the effect of adding resources on increasing workload. On the other hand, dynamic node changes downgrade data durability at multiple node failures caused by hardware failure in IaaS Cloud, which is built from commodity components that fail as the norm at large scales; but current replica placement strategies tend to rely on static mapping of data to nodes for high durability. This thesis proposes a set of data management schemes to address these issues. Firstly, it presents a decentralised automated partitioning algorithm and a lightweight migration strategy, to improve the efficiency of node changes. Secondly, it presents the design of ElasCass, an elastic KVS that incorporates these schemes, implemented atop Apache Cassandra. Finally, it presents a replica placement algorithm with a proof that shows its correctness, to fill the gap of allowing dynamic node changes while maintaining high data durability at multiple node failures. Contributions of this thesis lie in this set of novel schemes for data partitioning, placement, and migration, which provide efficient elasticity for decentralised, shared-nothing KVSs. The evaluations of ElasCass, conducted on Amazon EC2, revealed that, the proposed schemes reduce node incorporation time and improve load- balancing, thereby increasing scalability and query performance. The other evaluation simulated thousands of KVS nodes and demonstrated that the proposed placement algorithm maintains a close to minimised data loss probability under different failure scenarios, and exhibits better scalability and elasticity than state-of-the-art placement schemes.