Conference papers

A Comparison of Classical Versus Deep Learning Techniques for Abusive Content Detection on Social Media Sites

Hao Che, Dublin Institute of TechnologyFollow
Susan McKeever, Technological University DublinFollow
Sarah Jane Delany, Technological University DublinFollow

Document Type

Article

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE

Publication Details

International Conference on Social Informatics

Part of the Lecture Notes in Computer Science Book series (LNCS, vol.11185)

Abstract

The automated detection of abusive content on social media websites faces a variety of challenges including imbalanced training sets, the identification of an appropriate feature representation and the selection of optimal classifiers. Classifiers such as support vector machines (SVM), combined with bag of words or ngram feature representation, have traditionally dominated in text classification for decades. With the recent emergence of deep learning and word embeddings, an increasing number of researchers have started to focus on deep neural networks. In this paper, our aim is to explore cutting-edge techniques in automated abusive content detection. We use two deep learning approaches: convolutional neural networks (CNNs) and recurrent neural networks (RNNs). We apply these to 9 public datasets derived from various social media websites. Firstly, we show that word embeddings pre-trained on the same data source as the subsequent classification task improves the prediction accuracy of deep learning models. Secondly, we investigate the impact of different levels of training set imbalances on classifier types. In comparison to the traditional SVM classifier, we identify that although deep learning models can outperform the classification results of the traditional SVM classifier when the associated training dataset is seriously imbalanced, the performance of the SVM classifier can be dramatically improved through the use of oversampling, surpassing the deep learning models. Our work can inform researchers in selecting appropriate text classification strategies in the detection of abusive content, including scenarios where the training datasets suffer from class imbalance.

DOI

https://doi.org/10.1007/978-3-030-01129-1_8

Recommended Citation

Chen H., McKeever S., Delany S.J. (2018) A comparison of classical versus deep learning techniques for abusive content detection on social media sites. In:( Staab S., Koltsova O., Ignatov D. (eds)) Social Informatics: SocInfo 2018. Springer. Lecture Notes in Computer Science, vol 11185.

Download

Included in

Computer Sciences Commons

COinS

Conference papers

A Comparison of Classical Versus Deep Learning Techniques for Abusive Content Detection on Social Media Sites

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Conference papers

A Comparison of Classical Versus Deep Learning Techniques for Abusive Content Detection on Social Media Sites

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links