Conference papers

Detecting Hacker Threats: Performance of Word and Sentence Embedding Models in Identifying Hacker Communications

Susan McKeever, Technological University DublinFollow
Brian Keegan, Technological University DublinFollow
Andrei Quieroz, Technological University DublinFollow

Document Type

Conference Paper

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

Computer Sciences

Publication Details

AICS 2019 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science

Proceedings for the 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science NUI Galway Galway, Ireland, December 5-6th, 2019.

Abstract

Abstract—Cyber security is striving to find new forms of protection against hacker attacks. An emerging approach nowadays is the investigation of security-related messages exchanged on deep/dark web and even surface web channels. This approach can be supported by the use of supervised machine learning models and text mining techniques. In our work, we compare a variety of machine learning algorithms, text representations and dimension reduction approaches for the detection accuracies of software-vulnerability-related communications. Given the imbalanced nature of the three public datasets used, we investigate appropriate sampling approaches to boost detection accuracies of our models. In addition, we examine how feature reduction techniques such as Document Frequency Reduction, Chi-square and Singular Value Decomposition (SVD) can be used to reduce the number of features of the model without impacting the detection performance. We conclude that: (1) a Support Vector Machine (SVM) algorithm used with traditional Bag of Words achieved highest accuracies (2) The increase of the minority class with Random Oversampling technique improves the detection performance of the model by 5% on average, and (3) The number of features of the model can be reduced by up to 10% without affecting the detection performance. Also, we have provided the labelled dataset used in this work for further research. These findings can be used to support Cyber Security Threat Intelligence (CTI) with respect to the use of text mining techniques for detecting security-related communication.

Recommended Citation

Queiroz, A. Mckeever, S. & Keegan,B. (2019) Detecting Hacker Threats: Performance of Word and Sentence Embedding Models in Identifying Hacker Communications, 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science AICS 2019, Proceedings urn:nbn:de:0074-2563-0 Vol-2563, Pages 116-127

Download

DOWNLOADS

Since July 09, 2020

Included in

Data Science Commons

Share

COinS

Conference papers

Detecting Hacker Threats: Performance of Word and Sentence Embedding Models in Identifying Hacker Communications

Document Type

Rights

Disciplines

Publication Details

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Conference papers

Detecting Hacker Threats: Performance of Word and Sentence Embedding Models in Identifying Hacker Communications

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links