Document Type
Conference Paper
Rights
This item is available under a Creative Commons License for non-commercial use only
Disciplines
Computer Sciences, Information Science, Linguistics
Abstract
This is a resource description paper that describes the creation and properties of a set of pseudo-corpora generated artificially from a random walk over the English WordNet taxonomy. Our WordNet taxonomic random walk implementation allows the exploration of different random walk hyperparameters and the generation of a variety of different pseudo-corpora. We find that different combinations of the walk’s hyperparameters result in varying statistical properties of the generated pseudo-corpora. We have published a total of 81 pseudo-corpora that we have used in our previous research, but have not exhausted all possible combinations of hyperparameters, which is why we have also published a codebase that allows the generation of additional WordNet taxonomic pseudo-corpora as needed. Ultimately, such pseudo-corpora can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.
DOI
https://doi.org/10.21427/qvgt-zn56
Recommended Citation
Klubička, F., Maldonado, A., Mahalunkar, A. & Kelleher, J.D. English WordNet Random Walk Pseudo-Corpora. Proceedings of The 12th Language Resources and Evaluation Conference. ELRA Marseilles, France, p.4893‑4902.
Publication Details
Paper was published in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 4893–4902. Marseille, 11–16 May 2020. Publisher: European Language Resources Association (ELRA).
doi:d10.21427/qvgt-zn56