Document Type
Conference Paper
Rights
Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
Disciplines
Computer Sciences, Information Science, Linguistics
Abstract
This is a resource description paper that describes the creation and properties of a set of pseudo-corpora generated artificially from a random walk over the English WordNet taxonomy. Our WordNet taxonomic random walk implementation allows the exploration of different random walk hyperparameters and the generation of a variety of different pseudo-corpora. We find that different combinations of the walk’s hyperparameters result in varying statistical properties of the generated pseudo-corpora. We have published a total of 81 pseudo-corpora that we have used in our previous research, but have not exhausted all possible combinations of hyperparameters, which is why we have also published a codebase that allows the generation of additional WordNet taxonomic pseudo-corpora as needed. Ultimately, such pseudo-corpora can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.
DOI
https://doi.org/10.21427/qvgt-zn56
Recommended Citation
Klubička, F., Maldonado, A., Mahalunkar, A. & Kelleher, J.D. English WordNet Random Walk Pseudo-Corpora. Proceedings of The 12th Language Resources and Evaluation Conference. ELRA Marseilles, France, p.4893‑4902.
Funder
ADAPT Centre for Dig- ital Content Technology
Publication Details
Paper was published in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 4893–4902. Marseille, 11–16 May 2020. Publisher: European Language Resources Association (ELRA).
doi:d10.21427/qvgt-zn56