Authors
Klemen Simonic,
Janez Brank,
Marko Grobelnik,
Publication date
Publisher
Total citations
Cited by
Description
Given a stream of queries posed to an Internet search engine, how should we detect the popular queries? Or, which are the frequent transactions in an enormous collection across all branches of a supermarket chain? Or, how quickly can we declare a query or transaction as being frequent? These kinds of data usually arrive at enormous rates, hundreds of gigabytes per day or even more, but we are given only a small fraction of resources compared to the total quantity of data. Are we able to process the data in real time and provide up-to-minute analysis and statistics on current trends? We present a new algorithm for finding frequent items in a stream of data that handles the above mentioned problems in a better way than other known algorithms. We evaluated our algorithm on several large real-world data streams and made several experiments in order to optimize the algorithm. We applied our algorithm to the task of detecting frequent n-grams (sequences of n adjacent words) in streams of text. This way, we are able to detect popular phrases on the Internet “as it happens”.