Abstracts Category : Other

Add abstract

Want to add your dissertation abstract to this database? It only takes a minute!

Search abstract

Search for abstracts by subject, author or institution

Share this abstract

dissertation.com
on Facebook

High Utility Itemsets Identification in Big Data

by Ashish Tamrakar

Institution:	University of Nevada Las Vegas
Year:	2017
Keywords:	Data mining; itemset mining; parallel computing; spark; Computer Sciences
Posted:	02/01/2018
Record ID:	2151495
Full text PDF:	http://digitalscholarship.unlv.edu/thesesdissertations/3044

Abstract

High utility itemset mining is an important data mining problem which considers profit factors besides quantity from the transactional database. It helps find the most valuable products/items that are difficult to track using only the frequent data mining set. An item that has a high-profit value might be rare in the transactional database despite its tremendous importance. While there are many existing algorithms which generate comparatively large candidate sets while finding high utility itemsets, the major focus is to reduce the computational time significantly with the introduction of pruning strategies. Another aspect of high utility itemset mining is to compute the large dataset. There are very few algorithms that can handle a large dataset to find high utility itemset mining in a parallel (distributed) system. In this thesis, there are two proposed methods: 1) High utility itemset mining using pruning strategies approach (HUI-PR) and 2) Parallel EFIM (EFIM-Par). In the method I, the proposed algorithm constructs the candidate sets in the form of a tree structure, which traverses the itemsets with High Transaction-Weighted Utility (HTWUIs). It uses a pruning strategies to reduce the computational time by refraining the visit to unnecessary nodes of an itemset to reduce the search space. It significantly minimizes the transaction database generated on each node. In the method II, the distributed approach is proposed dividing the search space among different worker nodes to compute high utility itemsets which are aggregated to find the result. The experimental results for both methods show that they significantly improve the execution time for computing the high utility itemsets. Advisors/Committee Members: Justin Zhan, Laxmi Gewali, Fatma Nasoz, Ge Lin Kan.