Document Type

Conference Paper

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

Computer Sciences, Information Science, Linguistics

Publication Details

Paper was published in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 4893–4902. Marseille, 11–16 May 2020. Publisher: European Language Resources Association (ELRA).

doi:d10.21427/qvgt-zn56

Abstract

This is a resource description paper that describes the creation and properties of a set of pseudo-corpora generated artificially from a random walk over the English WordNet taxonomy. Our WordNet taxonomic random walk implementation allows the exploration of different random walk hyperparameters and the generation of a variety of different pseudo-corpora. We find that different combinations of the walk’s hyperparameters result in varying statistical properties of the generated pseudo-corpora. We have published a total of 81 pseudo-corpora that we have used in our previous research, but have not exhausted all possible combinations of hyperparameters, which is why we have also published a codebase that allows the generation of additional WordNet taxonomic pseudo-corpora as needed. Ultimately, such pseudo-corpora can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.

DOI

https://doi.org/10.21427/qvgt-zn56

Funder

ADAPT Centre for Dig- ital Content Technology


Share

COinS