Field |
Value |
Language |
dc.contributor.author |
Wang, X |
|
dc.contributor.author |
Zhu, L |
|
dc.contributor.author |
Wu, F |
|
dc.contributor.author |
Yang, Y
https://orcid.org/0000-0002-0512-880X
|
|
dc.date.accessioned |
2024-03-15T04:14:56Z |
|
dc.date.available |
2024-03-15T04:14:56Z |
|
dc.date.issued |
2023-05 |
|
dc.identifier.citation |
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19, (3) |
|
dc.identifier.issn |
1551-6857 |
|
dc.identifier.issn |
1551-6865 |
|
dc.identifier.uri |
http://hdl.handle.net/10453/176774
|
|
dc.description.abstract |
<jats:p>It is crucial to sample a small portion of relevant frames for efficient video classification. The existing methods mainly develop hand-designed sampling strategies or learn sequential selection policies. However, there are two challenges to be solved. First, hand-designed sampling strategies are intrinsically non-adaptive to different video backbones. Second, sequential frame selection policies ignore temporal relations among all video frames. The sequential selection process also hinders the application of these video samplers in speed-critical systems. In this article, we propose a differentiable parallel video sampling network (PSN) to tackle the aforementioned challenges, First, we optimize the video sampler with a differentiable surrogate loss, allowing to dynamically learn the sampler with the cooperation from the video classification model. Our sampler considers the feedback from all frames jointly, eliminating the learning difficulties of sequential decision making. The learning process is fully gradient-based, making the sampler be learned efficiently. Our video sampler can assess a set of frames swiftly and determine the importance of each frame in parallel. Second, we propose to model the inter-relation among contextual frames, which encourages the sampler to select frames based on a comprehensive inspection of the entire video. We observe that a simple context relation mining instantiation would significantly improve the classification performance. The experimental results on three standard video recognition benchmarks demonstrate the efficacy and efficiency of our framework.</jats:p> |
|
dc.language |
English |
|
dc.publisher |
ASSOC COMPUTING MACHINERY |
|
dc.relation.ispartof |
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS |
|
dc.relation.isbasedon |
10.1145/3569584 |
|
dc.rights |
info:eu-repo/semantics/closedAccess |
|
dc.subject |
0803 Computer Software, 0805 Distributed Computing, 0806 Information Systems |
|
dc.subject.classification |
Artificial Intelligence & Image Processing |
|
dc.subject.classification |
4603 Computer vision and multimedia computation |
|
dc.subject.classification |
4606 Distributed computing and systems software |
|
dc.subject.classification |
4607 Graphics, augmented reality and games |
|
dc.title |
A Differentiable Parallel Sampler for Efficient Video Classification |
|
dc.type |
Journal Article |
|
utslib.citation.volume |
19 |
|
utslib.for |
0803 Computer Software |
|
utslib.for |
0805 Distributed Computing |
|
utslib.for |
0806 Information Systems |
|
pubs.organisational-group |
University of Technology Sydney |
|
pubs.organisational-group |
University of Technology Sydney/Faculty of Engineering and Information Technology |
|
utslib.copyright.status |
closed_access |
* |
dc.date.updated |
2024-03-15T04:14:50Z |
|
pubs.issue |
3 |
|
pubs.publication-status |
Published |
|
pubs.volume |
19 |
|
utslib.citation.issue |
3 |
|