The proliferation of textual data in the form of online news articles and social media feeds has had an impact on the text analytics developments in recent years. Some of the challenges of natural language processing, understanding and generation have been successfully resolved and the results are applications such as AI personal assistants and bots. Word embeddings are an example of these successful solutions where unsupervised data-driven algorithms are used to understand concepts and relationships between words. This paper presents a description of word embedding algorithms, and a discussion on how bias in the training data can be captured, reproduced and even amplified by the algorithms.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 License.