img

Notice détaillée

LPR

learning point-level temporal action localization through re-training

Article Ecrit par: Fang, Zhenying ; Fan, Jianping ; Yu, Jun ;

Résumé: Point-level temporal action localization (PTAL) aims to locate action instances in untrimmed videos with only one timestamp annotation for each action instance. Existing methods adopt the localization-by-classification paradigm to locate action boundaries in the temporal class activation map (TCAM) by thresholding, also known as TCAM-based method. However, TCAM-based methods are limited by the gap between classification and localization tasks, since TCAM is generated by a classification network. To address this issue, we propose a re-training framework for the PTAL task, also known as LPR. This framework consists of two stages: pseudo-label generation and re-training. In the pseudo-label generation stage, we propose a feature embedding module based on a transformer encoder to capture global context features and optimize pseudo-labels' quality by leveraging point-level annotations. In the re-training stage, LPR uses the above pseudo-labels as supervision to locate action instances with a temporal action localization network rather than generating TCAMs. Furthermore, to alleviate the effects of label noise in the pseudo-labels, we propose a joint learning classification module (JLCM) in the re-training stage. This module contains two classification sub-modules that simultaneously predict action categories and are guided by a jointly determined clean set for network training. The proposed framework achieves state-of-the-art localization performance on both the THUMOS'14 and BEOID datasets.


Langue: Anglais