Header menu link for other important links
X
Pattern mining for large distributed dataset: A parallel approach (PMLDD)
, M. Kumar
Published in Korean Society for Internet Information
2018
Volume: 12
   
Issue: 11
Pages: 5287 - 5303
Abstract
Handling vast amount of data found in large transactional datasets is an obvious challenge for the conventional data mining algorithms. Addressing this challenge, our paper proposes a parallel approach for proper decomposition of mining problem into sub-problems in order to find frequent patterns from these datasets. The proposed, Pattern Mining for Large Distributed Dataset (PMLDD) approach, ensures minimum dependencies as well as minimum communications among sub-problems. It establishes a linear aggregation of the intermediate results so that it can be adapted to large-scale programming models like MapReduce. In this context, an algorithmic structure for MapReduce programming model is presented. PMLDD guarantees an efficient load balancing among the sub-problems by a specific selection criterion. Further, it optimizes the number of required iterations over the dataset for mining frequent patterns as compared to the existing approaches. Finally, we believe that our approach is scalable enough to handle larger datasets in terms of performance evaluation, and the result analysis justifies all these mentioned concerns. © 2018 KSII.
About the journal
JournalKSII Transactions on Internet and Information Systems
PublisherKorean Society for Internet Information
ISSN19767277