Natural language description of human activities from video images based on concept hierarchy of actions
Article Ecrit par: Kojima, A. ; Tamura, T. ; Fukunaga, K. ;
Résumé: We propose a method for describing human activities from video images based on concept hierarchies of actions. Major difficulty in transforming video images into textual descriptions is how to bridge a semantic gap between them, which is also known as inverse Hollywood problem. In general, the concepts of events or actions of human can be classified by semantic primitives. By associating these concepts with the semantic features extracted from video images, appropriate syntactic components such as verbs, objects, etc. are determined and then translated into natural language sentences. We also demonstrate the performance of the proposed method by several experiments.
Langue:
Anglais