Author ORCID Identifier

0009-0001-3790-9045

Document Type

Theses, Ph.D

Disciplines

Statistics, 1.2 COMPUTER AND INFORMATION SCIENCE, *hearing, visual and other physical, Specific languages, Linguistics

Publication Details

Submitted in partial fulfillment of the requirements for the degree of Phd at Technological University Dublin, Ireland, June 2023.

doi:10.21427/1kjv-ee18

Abstract

In recent years, the use of virtual assistants and voice user interfaces has become a latent part of modern living. Unseen to the user are the various artificial intelligence and natural language processing technologies, the vast datasets, and the linguistic insights that underpin such tools. The technologies supporting them have chiefly targeted widely used spoken languages, leaving sign language users at a disadvantage. One important reason why sign languages are unsupported by such tools is a requirement of the underpinning technologies for a comprehensive description of the language. Sign language processing technologies endeavour to bridge this technology inequality.

Recent approaches to sign language processing have shifted to the domain of machine learning. The principal challenge facing this method is the comparatively small sign language corpora available for training machine learning models. Such corpora are typically 10,000 times smaller than their spoken language equivalents. This study produces a statistical model which may be used in future hybrid learning approaches for sign language processing tasks. In doing so, this research explores the emerging patterns of non-manual articulation concerning grammatical classes in Irish Sign Language (ISL). Specifically, this study focuses on head movement, body movement, eyebrows, eyegaze, eye aperture, and cheek movement, in relation to the grammatical classes listed in the Auslan corpus annotation guidelines.

The experimental method applied here is a novel implementation of an association rules mining approach to a sign language dataset. This method is transferable to other corpus based analyses of sign languages. The study analyses the articulation of various non-manual features across grammatical classes. The dataset, a subset of the Signs of Ireland (SOI) corpus, contains Non-Manual Feature (NMF) annotations and has been further annotated, as part of this study, to include grammatical class data across 2,989 signs. The dataset is further refactored and refined according to the knowledge discovery on data process before it is subjected to an association rules mining approach.

Results from the exploratory analysis, and a lexical frequency analysis, provide new statistical insights related to the distribution of grammatical classes and of NMFs in ISL. Meanwhile, an association rules analysis identifies patterns between grammatical classes and various non-manual articulations. One such pattern discovery is the strong correlation between various NMFs and depicting verbs. Indeed, this study reports that the more lexicalised a sign is, the less likely it is to use NMFs. This study also reports on patterns discovered between non-manual articulators, and finally, patterns discovered for constructed actions.

This research provides novel contributions to the field of sign language linguistics and sign language processing. Firstly, a contribution to the understanding of ISL at the lexical level through new statistical insights. Secondly, through a transferable and novel application of the association rules mining method to sign language corpus data. Thirdly, through the production of two assets: (1) a statistical model applicable to future machine learning approaches, and (2) supplementary annotations to the SOI corpus.

DOI

https://doi.org/10.21427/1kjv-ee18

Funder

Technological University Dublin

Creative Commons License

Creative Commons Attribution-Share Alike 4.0 International License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.


Share

COinS