Books/Book Chapters

A Comparison on the Classification of Short-text Documents Using Latent Dirichlet Allocation and Formal Concept Analysis

Noel Rogers, Technological University Dublin
Luca Longo, Technological University DublinFollow

Document Type

Conference Paper

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE, Computer Sciences

Publication Details

CEUR Workshop Proceedings

Volume 2086, 2017, Pages 50-62 25th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2017; Dublin; Ireland; 7 December 2017 through 8 December 2017

Abstract

With the increasing amounts of textual data being collected online, automated text classification techniques are becoming increasingly important. However, a lot of this data is in the form of short-text with just a handful of terms per document (e.g. Text messages, tweets or Facebook posts). This data is generally too sparse and noisy to obtain satisfactory classification. Two techniques which aim to alleviate this problem are Latent Dirichlet Allocation (LDA) and Formal Concept Analysis (FCA). Both techniques have been shown to improve the performance of short-text classification by reducing the sparsity of the input data. The relative performance of classifiers that have been enhanced using each technique has not been directly compared so, to address this issue, this work presents an experiment to compare them, using supervised models. It has shown that FCA leads to a much higher degree of correlation among terms than LDA and initially gives lower classification accuracy. However, once a subset of features is selected for training, the FCA models can outperform those trained on LDA expanded data. © 2017 CEUR-WS. All rights reserved.

DOI

https://doi.org/10.21427/D79N6T

Recommended Citation

Rogers, N. & Longo, L. (2017). A Comparison on the Classification of Short-text Documents Using Latent Dirichlet Allocation and Formal Concept Analysis. 25th Irish Conference on Artificial Intelligence and Cognitive Science, AICS 2017; Dublin, Ireland; 7 December - 8 December 2017, Volume 2086, Pages 50-62.

Download

Included in

Computational Engineering Commons

COinS

Books/Book Chapters

A Comparison on the Classification of Short-text Documents Using Latent Dirichlet Allocation and Formal Concept Analysis

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Books/Book Chapters

A Comparison on the Classification of Short-text Documents Using Latent Dirichlet Allocation and Formal Concept Analysis

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links