Header menu link for other important links
X
Regular expressions in big data analytics
R. Chowdhury, , V. Mishra, H. Jain
Published in Institute of Electrical and Electronics Engineers Inc.
2018
Volume: 2018-January
   
Pages: 1 - 10
Abstract
Content examination frameworks, for example, IBM's SystemT programming, depend on standard expressions (regexs) and word references for changing unstructured information into an organized arrangement. Dissimilar to network interruption identification frameworks, content examination frameworks register and report accurately where the particular and delicate data begins and closures in a content archive. Along these lines, progressed regex coordinating capacities, for example, begin counterbalance reporting, catching gatherings, and furthest left match calculation are intensely utilized as a part of content examination frameworks. Additionally there is a novel regex coordinating design that backings such capacities in an asset proficient manner. The asset productivity is accomplished by 1) killing state replication, 2) staying away from costly counterbalance correlation operations in furthest left match calculation, and 3) minimizing the quantity of balance registers. Probes regex sets from content investigation and system interruption identification areas, utilizing an Altera Stratix IV FPGA, demonstrate that the proposed design accomplishes a more than triple decrease of the rationale assets utilized and a more than 1.25-overlap increment of the clock recurrence as for an as of late proposed engineering that backings indistinguishable elements. The paper gives a generic overview of role of regular expressions in big data analytics. © 2017 IEEE.