Temporal Cross-Layer Correlation Mining for Action Recognition

Publication Type:
Journal Article
Citation:
IEEE Transactions on Multimedia, 2022, 24, pp. 668-676
Issue Date:
2022-01-01
Filename Description Size
Temporal_Cross-Layer_Correlation_Mining_for_Action_Recognition.pdfPublished version1.81 MB
Adobe PDF
Full metadata record
Neighboring frames are more correlated compared to frames from further temporal distances. In this paper, we aim to explore the temporal correlations among neighboring frames and exploit cross-layer multi-scale features for action recognition. First, we present a Temporal Cross-Layer Correlation (TCLC) framework for temporal correlation learning. The unified framework uncovers both local and global structures from video data, enabling a better exploration of temporal context and assisting cross-layer spatio-temporal feature learning. Second, we propose a novel cross-layer attention and a center-guided attention mechanism to integrate features with contextual knowledge from multiple scales. Our method is a two-stage process for effective cross-layer feature learning. The first stage incorporates the cross-layer attention module to decide the importance weight of the convolutional layers. The second stage leverages the center-guided attention mechanism to aggregate local features from each layer for the generation of a final video representation. We leverage global centers to extract shared semantic knowledge among videos. We evaluate TCLC on three action recognition datasets, i.e., UCF-101, HMDB-51 and Kinetics. Our experimental results demonstrate the superiority of our proposed temporal correlation mining method.
Please use this identifier to cite or link to this item: