Author ORCID Identifier

0000-0002-1894-3360

Document Type

Conference Paper

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE

Publication Details

https://link.springer.com/chapter/10.1007/978-3-031-58181-6_50

International Conference on Computer Vision and Image Processing CVIP 2023, Springer Nature Switzerl

doi:10.1007/978-3-031-58181-6_50

Abstract

Action understanding involves the recognition and detection of specific actions within videos. This crucial task in computer vision gained significant attention due to its multitude of applications across various domains. The current action detection models, inspired by 2D object detection methods, employ two-stage architectures. The first stage is to extract actor-centric video sub-clips, i.e. tubelets of individuals, and the second stage is to classify these tubelets using action recognition networks. The majority of these recognition models utilize a frame-level pre-trained 3D Convolutional Neural Networks (3D CNN) to extract spatio-temporal features of a given tubelet. This, however, results in suboptimal spatio-temporal feature representation for action recognition, primarily because the actor typically occupies a relatively small area in the frame. This work proposes the use of actor-centric tubelets instead of frames to learn spatio-temporal feature representation for action recognition. We present an empirical study of the actor-centric tubelet and frame-level action recognition models and propose a baseline for actor-centric action recognition. We evaluated the proposed method on the state-of-the-art C3D, I3D, and SlowFast 3D CNN architectures using the NTURGBD dataset. Our results demonstrate that the actor-centric feature extractor consistently outperforms the frame-level and large pre-trained fine-tuned models. The source code for the tubelet generation is available at https://github.com/anilkunchalaece/ntu_tubelet_parser.

DOI

https://doi.org/10.1007/978-3-031-58181-6_50

Recommended Citation

Kunchala, Anil; Schoen-Phelan, Bianca; and Bouroche, Mélanie, "Actor-Centric Spatio-Temporal Feature Extraction for Action Recognition" (2023). Conference papers. 431.
https://arrow.tudublin.ie/scschcomcon/431

Funder

Science Foundation Ireland

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Download

Contact the Author

Included in

Computer Engineering Commons

COinS

Conference papers

Actor-Centric Spatio-Temporal Feature Extraction for Action Recognition

Author ORCID Identifier

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Search

Browse

Author Corner

Links

Conference papers

Actor-Centric Spatio-Temporal Feature Extraction for Action Recognition

Authors

Author ORCID Identifier

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Share

Search

Browse

Author Corner

Links