Conference papers

Update Frequency and Background Corpus Selection in Dynamic TF-IDF Models for First Story Detection

Fei Wang, Technological University DublinFollow
Robert J. Ross, Technological University DublinFollow
John D. Kelleher, Technological University DublinFollow

Document Type

Conference Paper

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

Computer Sciences

Publication Details

2019 16th International Conference of the Pacific Association for Computational Linguistics (PACLING 2019) October 11-13, 2019, Hanoi, Vietnam

Abstract

First Story Detection (FSD) requires a system to detect the very first story that mentions an event from a stream of stories. Nearest neighbour-based models, using the traditional term vector document representations like TF-IDF, currently achieve the state of the art in FSD. Because of its online nature, a dynamic term vector model that is incrementally updated during the detection process is usually adopted for FSD instead of a static model. However, very little research has investigated the selection of hyper-parameters and the background corpora for a dynamic model. In this paper, we analyse how a dynamic term vector model works for FSD, and investigate the impact of different update frequencies and background corpora on FSD performance. Our results show that dynamic models with high update frequencies outperform static model and dynamic models with low update frequencies; and that the FSD performance of dynamic models does not always increase with higher update frequencies, but instead reaches steady state after some update frequency threshold is reached. In addition, we demonstrate that different background corpora have very limited influence on the dynamic models with high update frequencies in terms of FSD performance.

DOI

https://doi.org/10.21427/44xz-6a62

Recommended Citation

Wang F. et al. (2019) Update Frequency and Background Corpus Selection in Dynamic TF-IDF Models for First Story Detection,16th International Conference of the Pacific Association for Computational Linguistics (PACLING 2019) October 11-13, 2019, Hanoi, Vietnam. doi:10.21427/44xz-6a62

Funder

ADAPT Research Centre

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Conference papers

Update Frequency and Background Corpus Selection in Dynamic TF-IDF Models for First Story Detection

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Search

Browse

Author Corner

Links

Conference papers

Update Frequency and Background Corpus Selection in Dynamic TF-IDF Models for First Story Detection

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Share

Search

Browse

Author Corner

Links