Continuous conditional video synthesis by neural processes

Article de revue (2025)

Document en libre accès dans PolyPublie et chez l'éditeur officiel

Affichage préliminaire

Libre accès au plein texte de ce document
Version officielle de l'éditeur
Conditions d'utilisation: Creative Commons: Attribution-Utilisation non commerciale (CC BY-NC)
Télécharger (1MB)

Afficher le résumé

Cacher le résumé

Abstract

Different conditional video synthesis tasks, such as frame interpolation and future frame prediction, are typically addressed individually by task-specific models, despite their shared underlying characteristics. Additionally, most conditional video synthesis models are limited to discrete frame generation at specific integer time steps. This paper presents a unified model that tackles both challenges simultaneously. We demonstrate that conditional video synthesis can be formulated as a neural process, where input spatio-temporal coordinates are mapped to target pixel values by conditioning on context spatio-temporal coordinates and pixel values. Our approach leverages a Transformer-based non-autoregressive conditional video synthesis model that takes the implicit neural representation of coordinates and context pixel features as input. Our task-specific models outperform previous methods for future frame prediction and frame interpolation across multiple datasets. Importantly, our model enables temporal continuous video synthesis at arbitrary high frame rates, outperforming the previous state-of-the-art.

Mots clés

Matériel d'accompagnement:	Code
Département:	Département de génie informatique et génie logiciel
Centre de recherche:	LITIV - Laboratoire d'interprétation et de traitement d'images et vidéo
Organismes subventionnaires:	NSERC, FRQ-NT
Numéro de subvention:	RGPIN-2020-04633
URL de PolyPublie:	https://publications.polymtl.ca/66039/
Titre de la revue:	Computer Vision and Image Understanding (vol. 259)
Maison d'édition:	Elsevier BV
DOI:	10.1016/j.cviu.2025.104387
URL officielle:	https://doi.org/10.1016/j.cviu.2025.104387
Date du dépôt:	10 juin 2025 09:42
Dernière modification:	11 févr. 2026 04:15

Citer en APA 7:	Ye, X., & Bilodeau, G.-A. (2025). Continuous conditional video synthesis by neural processes. Computer Vision and Image Understanding, 259, 104387 (11 pages). https://doi.org/10.1016/j.cviu.2025.104387

Statistiques

Total des téléchargements à partir de PolyPublie

Téléchargements par année

Provenance des téléchargements

Dimensions

Actions réservées au personnel

Afficher document