Articles

Enhancing Zero‑Shot Action Recognition in Videos by Combining GANs with Text and Images

Kaiqiang Huang, Technological University Dublin, IrelandFollow
Luis Miralles-Pechuán, Technological University DublinFollow
Susan McKeever, Technological University DublinFollow

Document Type

Article

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE

Publication Details

https://link.springer.com/article/10.1007/s42979-023-01803-3

Huang, K., Miralles-Pechuán, L. & Mckeever, S. Enhancing Zero-Shot Action Recognition in Videos by Combining GANs with Text and Images. SN COMPUT. SCI. 4, 375 (2023).

https://doi.org/10.1007/s42979-023-01803-3

Abstract

Zero-shot action recognition (ZSAR) tackles the problem of recognising actions that have not been seen by the model during the training phase. Various techniques have been used to achieve ZSAR in the field of human action recognition (HAR) in videos. Techniques based on generative adversarial networks (GANs) are the most promising in terms of performance. GANs are trained to generate representations of unseen videos conditioned on information related to the unseen classes, such as class label embeddings. In this paper, we present an approach based on combining information from two different GANs, both of which generate a visual representation of unseen classes. Our dual-GAN approach leverages two separate knowledge sources related to the unseen classes: class-label texts and images related to the class label obtained from Google Images. The generated visual embeddings of the unseen classes by the two GANs are merged and used to train a classifier in a supervised-learning fashion for ZSAR classification. Our methodology is based on the idea that using more and richer knowledge sources to generate unseen classes representations will lead to higher downstream accuracy when classifying unseen classes. The experimental results show that our dual-GAN approach outperforms state-of-the-art methods on the two benchmark HAR datasets: HMDB51 and UCF101. Additionally, we present a comprehensive discussion and analysis of the experimental results for both datasets to understand the nuances of each approach at a class level. Finally, we examine the impact of the number of visual embeddings generated by the two GANs on the accuracy of the models.

DOI

https://doi.org/10.1007/s42979-023-01803-3

Recommended Citation

Huang, Kaiqiang; Miralles-Pechuán, Luis; and McKeever, Susan, "Enhancing Zero‑Shot Action Recognition in Videos by Combining GANs with Text and Images" (2023). Articles. 160.
https://arrow.tudublin.ie/ittsciart/160

Funder

This project is funded under the Fiosraigh Scholarship of Technological University Dublin. Open Access funding provided by the IReL Consortium.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Download

Included in

Computer Engineering Commons

COinS

Articles

Enhancing Zero‑Shot Action Recognition in Videos by Combining GANs with Text and Images

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Search

Browse

Author Corner

Articles

Enhancing Zero‑Shot Action Recognition in Videos by Combining GANs with Text and Images

Authors

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Share

Search

Browse

Author Corner