Doctoral

Probing with Noise: Unpicking the Warp and Weft of Taxonomic and Thematic Meaning Representations in Static and Contextual Embeddings

Filip Klubička, Technological University DublinFollow

Document Type

Theses, Ph.D

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE

Publication Details

Thesis submitted for the degree of Doctor of Philosophy, Technological University Dublin, October 2022.

Abstract

The semantic relatedness of words has two key dimensions: it can be based on taxonomic information or thematic, co-occurrence-based information. These are captured by different language resources—taxonomies and natural corpora—from which we can build different computational meaning representations that are able to reflect these relationships. Vector representations are arguably the most popular meaning representations in NLP, encoding information in a shared multidimensional semantic space and allowing for distances between points to reflect relatedness between items that populate the space. Improving our understanding of how different types of linguistic information are encoded in vector space can provide valuable insights to the field of model interpretability and can further our understanding of different encoder architectures.

Alongside vector dimensions, we argue that information can be encoded in more implicit ways and hypothesise that it is possible for the vector magnitude—the norm—to also carry linguistic information. We develop a method to test this hypothesis and provide a systematic exploration of the role of the vector norm in encoding the different axes of semantic relatedness across a variety of vector representations, including taxonomic, thematic, static and contextual embeddings.

The method is an extension of the standard probing framework and allows for relative intrinsic interpretations of probing results. It relies on introducing targeted noise that ablates information encoded in embeddings and is grounded by solid baselines and confidence intervals. We call the method probing with noise and test the method at both the word and sentence level, on a host of established linguistic probing tasks, as well as two new semantic probing tasks: hypernymy and idiomatic usage detection.

Our experiments show that the method is able to provide geometric insights into embeddings and can demonstrate whether the norm encodes the linguistic information being probed for. This confirms the existence of separate information containers in English word2vec, GloVe and BERT embeddings. The experiments and complementary analyses show that different encoders encode different kinds of linguistic information in the norm: taxonomic vectors store hypernym-hyponym information in the norm, while non-taxonomic vectors do not. Meanwhile, non-taxonomic GloVe embeddings encode syntactic and sentence length information in the vector norm, while the contextual BERT encodes contextual incongruity.

Our method can thus reveal where in the embeddings certain information is contained. Furthermore, it can be supplemented by an array of post-hoc analyses that reveal how information is encoded as well, thus offering valuable structural and geometric insights into the different types of embeddings.

DOI

https://doi.org/10.21427/t56y-zp55

Recommended Citation

Klubička, F. (2022). Probing with Noise: Unpicking the Warp and Weft of Taxonomic and Thematic Meaning Representations in Static and Contextual Embeddings. Technological University Dublin. DOI: 10.21427/T56Y-ZP55

Funder

Science Foundation Ireland

Download

Contact the Author

Included in

Computer Sciences Commons

COinS

Doctoral

Probing with Noise: Unpicking the Warp and Weft of Taxonomic and Thematic Meaning Representations in Static and Contextual Embeddings

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Included in

Search

Browse

Author Corner

Doctoral

Probing with Noise: Unpicking the Warp and Weft of Taxonomic and Thematic Meaning Representations in Static and Contextual Embeddings

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Included in

Share

Search

Browse

Author Corner