Dissertations

Feature Augmentation for Improved Topic Modeling of Youtube Lecture Videos using Latent Dirichlet Allocation

Nakul Srikumar, Technological University Dublin

Document Type

Dissertation

Rights

This item is available under a Creative Commons License for non-commercial use only

Disciplines

Computer Sciences

Publication Details

A dissertation submitted in partial fufilment of the requirements of Dublin Institute of Technology for the degree of M.Sc. in Computing (Data Analytics) 2021

Abstract

Application of Topic Models in text mining of educational data and more specifically, the text data obtained from lecture videos, is an area of research which is largely unexplored yet holds great potential. This work seeks to find empirical evidence for an improvement in Topic Modeling by pre- extracting bigram tokens and adding them as additional features in the Latent Dirichlet Allocation (LDA) algorithm, a widely-recognized topic modeling technique. The dataset considered for analysis is a collection of transcripts of video lectures on Machine Learning scraped from YouTube. Using the cosine similarity distance measure as a metric, the experiment showed a statistically significant improvement in topic model performance against the baseline topic model which did not use extra features, thus confirming the hypothesis. By introducing explainable features before modeling and using deep learning based text representation only at the post-modeling evaluation stage, the overall model interpretability is retained. This empowers educators and researchers alike to not only benefit from the LDA model in their own fields but also to play a substantial role in eorts to improve model performance. It also sets the direction for future work which could use the feature augmented topic model as the input to other more common text mining tasks like document categorization and information retrieval.

DOI

https://doi.org/10.21427/4ey6-qg08

Recommended Citation

Srikumar, N. (2021). Feature augmentation for improved topic modeling YouTube lecture videos using latent dirichlet allocation. Dissertation. Dublin: Technological University Dublin. doi:10.21427/4ey6-qg08s

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

Dissertations

Feature Augmentation for Improved Topic Modeling of Youtube Lecture Videos using Latent Dirichlet Allocation

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Dissertations

Feature Augmentation for Improved Topic Modeling of Youtube Lecture Videos using Latent Dirichlet Allocation

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links