Abstracts Computer Science

Add abstract

Want to add your dissertation abstract to this database? It only takes a minute!

Search abstract

Search for abstracts by subject, author or institution

Share this abstract

Hadoop job scheduling with dynamic task splitting

by Yongliang Xu

Institution: Nanyang Technological University
Year: 2015
Keywords: DRNTU::Engineering::Computer science and engineering::Information systems::Information systems applications
Posted: 02/05/2017
Record ID: 2129812
Full text PDF: http://hdl.handle.net/10356/65309


Abstract

Job scheduling affects the fairness and performance of shared Hadoop clusters. Fairness measures how fair the resources in the cluster are shared among different users in the Hadoop cluster. In Hadoop, schedulers will always attempt to maximize data locality. Data locality refers to the processing of data by tasks on nodes where the data is stored. Processing of data on data-local nodes improves performance, as there is no need to transfer data from one node to another. However, fairness and data locality are often in conflict. During scheduling, it is not always possible that the available nodes contain the data that a user’s job requires. In such cases, a scheduler may choose to schedule the tasks on these nodes regardless of data locality thus sacrificing performance. Alternatively, a scheduler may choose to give up the user’s slot and wait for a data-local node thus sacrificing fairness. Achieving pure fairness may compromise the data locality of the tasks that will in turn negatively affects performances, and vice-versa. Delay scheduling is a technique that attempts to improve data locality by waiting for a data-local node to be available. It violates the fairness criteria. The Dynamic Task Splitting Scheduler (DTSS) is proposed to mitigate the tradeoffs between fairness and data locality during job scheduling. DTSS does so by dynamically splitting a task and executing the split task immediately, on a non-data-local node, to improve the fairness. Analysis and experiments results show that it is possible to improve both fairness and the performance by adjusting the proportion of the task split. DTSS is shown to improve the makespan of different users in a cluster by 2% to 11% as compared to delay scheduling under conditions that is difficult to obtain data-local nodes on a cluster. Lastly, experiments show that DTSS is not a suitable scheduler under conditions where jobs are able to obtain data-local nodes easily.

Add abstract

Want to add your dissertation abstract to this database? It only takes a minute!

Search abstract

Search for abstracts by subject, author or institution

Share this abstract

Relevant publications

Book cover thumbnail image
Prediction of Upper Body Power of Cross-Country Sk...
by Ozciloglu, Mustafa Mikail
   
Book cover thumbnail image
Bitcoins Mining, Transaction, Security Challenges and Futur...
by Zahid, Muhammad Aslam
   
Book cover thumbnail image
Applying User-Centered Interface Design Methods to...
by Mburu, Lucy Waruguru
   
Book cover thumbnail image
Head-Order Techniques and Other Pragmatics of Lamb...
by Troullinos, Nikos B.
   
Book cover thumbnail image
Visualization of Interface Metaphor for Software An Engineering Approach
by Katre, Dinesh S.
   
Book cover thumbnail image
Indoor Wireless Metering Networks A Collection of Algorithms Enabling Low Power/Low ...
by Altan, Nicola
   
Book cover thumbnail image
Automated Generation of Geometrically-Precise and ...
by Mekni, Mehdi
   
Book cover thumbnail image
A Study on the Tone-Reservation Technique for Peak...
by Butt, Umer Ijaz