PUBLICATIONS

A New Algorithm for Finding Frequent Items in Streams of Data

Authors

Klemen Simonic,

Janez Brank,

Marko Grobelnik,

Publication date

Publisher

Total citations

Cited by

Description

Given a stream of queries posed to an Internet search engine, how should we detect the popular queries? Or, which are the frequent transactions in an enormous collection across all branches of a supermarket chain? Or, how quickly can we declare a query or transaction as being frequent? These kinds of data usually arrive at enormous rates, hundreds of gigabytes per day or even more, but we are given only a small fraction of resources compared to the total quantity of data. Are we able to process the data in real time and provide up-to-minute analysis and statistics on current trends? We present a new algorithm for finding frequent items in a stream of data that handles the above mentioned problems in a better way than other known algorithms. We evaluated our algorithm on several large real-world data streams and made several experiments in order to optimize the algorithm. We applied our algorithm to the task of detecting frequent n-grams (sequences of n adjacent words) in streams of text. This way, we are able to detect popular phrases on the Internet “as it happens”.

Publication

PUBLICATIONS

A New Algorithm for Finding Frequent Items in Streams of Data

OptimalAI