Document Type
Conference Paper
Rights
Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
Disciplines
Computer Sciences
Abstract
This paper examines the effectiveness of a range of pre-trained language representations in order to determine the informativeness and information type of social media in the event of natural or man-made disasters. Within the context of disaster tweet analysis, we aim to accurately analyse tweets while minimising both false positive and false negatives in the automated information analysis. The investigation is performed across a number of well known disaster-related twitter datasets. Models that are built from pre-trained word embeddings from Word2Vec, GloVe, ELMo and BERT are used for performance evaluation. Given the relative ubiquity of BERT as a standout language representation in recent times it was expected that BERT dominates results. However, results are more diverse, with classical Word2Vec and GloVe both displaying strong results. As part of the analysis, we discuss some challenges related to automated twitter analysis including the fine-tuning of language models to disaster-related scenarios.
DOI
https://doi.org/10.1145/3341161.3343680
Recommended Citation
Jain, P., Ross, R., Schoen-Phelan,B. (2019). Estimating distributed representation performance in disaster-related social media classificatio. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Publication Details
2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining