Author ORCID Identifier

https://orcid.org/0000-0001-6462-3248

Document Type

Conference Paper

Disciplines

2.2 ELECTRICAL, ELECTRONIC, INFORMATION ENGINEERING

Publication Details

https://link.springer.com/chapter/10.1007/978-3-031-26438-2_6

https://doi.org/10.1007/978-3-031-26438-2_6

Sardina, J., Sardina, C., Kelleher, J.D., O’Sullivan, D. (2023). Analysis of Attention Mechanisms in Box-Embedding Systems. In: Longo, L., O’Reilly, R. (eds) Artificial Intelligence and Cognitive Science. AICS 2022. Communications in Computer and Information Science, vol 1662. Springer, Cham.

Abstract

Large-scale Knowledge Graphs (KGs) have recently gained considerable research attention for their ability to model the inter- and intra- relationships of data. However, the huge scale of KGs has necessitated the use of querying methods to facilitate human use. Question Answering (QA) systems have shown much promise in breaking down this human-machine barrier. A recent QA model that achieved state-of-the-art performance, Query2box, modelled queries on a KG using box embeddings with an attention mechanism backend to compute the intersections of boxes for query resolution. In this paper, we introduce a new model, Query2Geom, which replaces the Query2box attention mechanism with a novel, exact geometric calculation. Our findings show that Query2Geom generally matches the performance of Query2box while having many fewer parameters. Our analysis of the two models leads us to formally describe the interaction between knowledge graph data and box embeddings with the concepts of semantic-geometric alignment and mismatch. We create the Attention Deviation Metric as a measure of how well the geometry of box embeddings captures the semantics of a knowledge graph, and apply it to explain the difference in performance between Query2box and Query2Geom. We conclude that Query2box’s attention mechanism operates using “latent intersections” that attend to the semantic properties in embeddings not expressed in box geometry, acting as a limit on model interpretability. Finally, we generalise our results and propose that semantic-geometric mismatch is a more general property of attention mechanisms, and provide future directions on how to formally model the interaction between attention and latent semantics.

DOI

https://doi.org/10.1007/978-3-031-26438-2_6

Funder

Science Foundation Ireland D-REAL CRT under Grant Agreement No. 18/CRT6225 at the ADAPT SFI Research Centre at Trinity College Dublin, together with sponsorship of Sonas Innovation Ireland. The ADAPT SFI Centre for Digital Content Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under the European Regional Development Fund (ERDF) through Grant # 13/RC/2106_P2.

Creative Commons License

Creative Commons Attribution-Share Alike 4.0 International License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.


Share

COinS