Title : Breaking News Detection from the Web Documents through Text Mining and Seasonality


Authors : Syed Tanveer Jishan, Nuruddin Monsur, Hafiz Abdur Rahman

Abstract : In recent years, news distribution through the internet has increased significantly and so does our growing dependency on online news sources. As vast numbers of web documents from different news websites are readily available, it is possible to extract information that can be used for various applications. One possible application is breaking news detection through text and property analysis of these web documents. In this paper, we presented an approach to detect breaking news from web documents by using keywords extraction through Brill’s tagger and HTML tag attributes. Once the keywords are extracted, seasonality for each of the keywords are calculated by the ratio of the linear weighted moving averages (LWMA) at each point of the time series. Our approach has been validated and performance metrics have been evaluated with two online newspapers.


Journal : International Journal of Knowledge and Web Intelligence (IJKWI), Inderscience Publishers, Geneva, Switzerland Volume : 5 Year : 2016 Issue : 3
Pages : 190-207 City : Edition : Editors :
Publisher : ISBN : Book : Chapter :
Proceeding Title : Institution : Issuer : Number :