<  Retour au portail Polytechnique Montréal

Movie description

Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, Niket Tandon, Christopher J. Pal, Hugo Larochelle, Aaron Courville et Bernt Schiele

Article de revue (2017)

Document en libre accès dans PolyPublie et chez l'éditeur officiel
[img]
Affichage préliminaire
Libre accès au plein texte de ce document
Version officielle de l'éditeur
Conditions d'utilisation: Creative Commons: Attribution (CC BY)
Télécharger (1MB)
Afficher le résumé
Cacher le résumé

Abstract

Audio description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. We introduce the Large Scale Movie Description Challenge (LSMDC) which contains a parallel corpus of 128,118 sentences aligned to video clips from 200 movies (around 150 h of video in total). The goal of the challenge is to automatically generate descriptions for the movie clips. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in the challenges organized in the context of two workshops at ICCV 2015 and ECCV 2016.

Mots clés

Movie description; Video description; Video captioning; Video understanding; Movie description dataset; Movie description challenge; Long short-term memory network; Audio description; LSMDC

Sujet(s): 2700 Technologie de l'information > 2700 Technologie de l'information
2700 Technologie de l'information > 2708 Traitement d'images et traitement vidéo
Département: Département de génie informatique et génie logiciel
Organismes subventionnaires: Max Planck Society, FITweltweit-Program of the German Academic Exchange Service (DAAD)
URL de PolyPublie: https://publications.polymtl.ca/3529/
Titre de la revue: International Journal of Computer Vision (vol. 123, no 1)
Maison d'édition: Springer
DOI: 10.1007/s11263-016-0987-1
URL officielle: https://doi.org/10.1007/s11263-016-0987-1
Date du dépôt: 06 déc. 2018 12:14
Dernière modification: 28 sept. 2024 01:01
Citer en APA 7: Rohrbach, A., Torabi, A., Rohrbach, M., Tandon, N., Pal, C. J., Larochelle, H., Courville, A., & Schiele, B. (2017). Movie description. International Journal of Computer Vision, 123(1), 94-120. https://doi.org/10.1007/s11263-016-0987-1

Statistiques

Total des téléchargements à partir de PolyPublie

Téléchargements par année

Provenance des téléchargements

Dimensions

Actions réservées au personnel

Afficher document Afficher document