TY - JOUR
T1 - A comparison on the classification of short-text documents using Latent Dirichlet Allocation and Formal Concept Analysis
AU - Rogers, Noel
AU - Longo, Luca
PY - 2017
Y1 - 2017
N2 - With the increasing amounts of textual data being collected online, automated text classification techniques are becoming increasingly important. However, a lot of this data is in the form of short-text with just a handful of terms per document (e.g. Text messages, tweets or Facebook posts). This data is generally too sparse and noisy to obtain satisfactory classification. Two techniques which aim to alleviate this problem are Latent Dirichlet Allocation (LDA) and Formal Concept Analysis (FCA). Both techniques have been shown to improve the performance of short-text classification by reducing the sparsity of the input data. The relative performance of classifiers that have been enhanced using each technique has not been directly compared so, to address this issue, this work presents an experiment to compare them, using supervised models. It has shown that FCA leads to a much higher degree of correlation among terms than LDA and initially gives lower classification accuracy. However, once a subset of features is selected for training, the FCA models can outperform those trained on LDA expanded data.
AB - With the increasing amounts of textual data being collected online, automated text classification techniques are becoming increasingly important. However, a lot of this data is in the form of short-text with just a handful of terms per document (e.g. Text messages, tweets or Facebook posts). This data is generally too sparse and noisy to obtain satisfactory classification. Two techniques which aim to alleviate this problem are Latent Dirichlet Allocation (LDA) and Formal Concept Analysis (FCA). Both techniques have been shown to improve the performance of short-text classification by reducing the sparsity of the input data. The relative performance of classifiers that have been enhanced using each technique has not been directly compared so, to address this issue, this work presents an experiment to compare them, using supervised models. It has shown that FCA leads to a much higher degree of correlation among terms than LDA and initially gives lower classification accuracy. However, once a subset of features is selected for training, the FCA models can outperform those trained on LDA expanded data.
UR - https://researchprofiles.tudublin.ie/en/publications/e77af2ca-0f13-4d72-a80c-71ba47aa2534
U2 - 10.21427/d79n6t
DO - 10.21427/d79n6t
M3 - Article
SN - 1613-0073
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
ER -