Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
Zero-Shot Action Recognition (ZSAR) aims to recognise action classes in videos that have never been seen during model training. In some approaches, ZSAR has been achieved by generating visual features for unseen classes based on the semantic information of the unseen class labels using generative adversarial networks (GANs). Therefore, the problem is converted to standard supervised learning since the unseen visual features are accessible. This approach alleviates the lack of labelled samples of unseen classes. In addition, objects appearing in the action instances could be used to create enriched semantics of action classes and therefore, increase the accuracy of ZSAR. In this paper, we consider using, in addition to the label, objects related to that action label. For example, the objects ‘horse’ and ‘saddle’ are highly related to the action ‘Horse Riding’ and these objects can bring additional semantic meaning. In this work, we aim to improve the GAN-based framework by incorporating object-based semantic information related to the class label with three approaches: replacing the class labels with objects, appending objects to the class, and averaging objects with the class. Then, we evaluate the performance using a subset of the popular dataset UCF101. Our experimental results demonstrate that our approach is valid since when including appropriate objects into the action classes, the baseline is improved by 4.93%.
Huang, K.; Miralles-Pechuán, L. and Mckeever, S. (2021). Zero-Shot Action Recognition with Knowledge Enhanced Generative Adversarial Networks. In Proceedings of the 13th International Joint Conference on Computational Intelligence - NCTA, ISBN 978-989-758-534-0; ISSN 2184-2825, pages 254-264. DOI: 10.5220/0010717000003063
Technological University Dublin