Publicaciones de conferencias: comunicaciones, ponencias, pósters, etc (conferenceObject). 2023

Video Memorability Prediction From Jointly-learnt Semantic and Visual Features

Archivo Digital UPM
Archivo Digital UPM
  • Martín Fernández, Iván
  • Kleinlein, Ricardo
  • Luna Jiménez, Cristina
  • Gil Martín, Manuel
  • Fernández Martínez, Fernando
The memorability of a video is defined as an intrinsic property of its visual features that dictates the fraction of people who recall having watched it on a second viewing within a memory game. Still, unravelling what are the key features to predict memorability remains an obscure matter. This challenge is addressed here by fine-tuning text and image encoders using a cross-modal strategy known as Contrastive Language-Image Pre-training (CLIP). The resulting video-level data representations learned include semantics and topic-descriptive information as observed from both modalities, hence enhancing the predictive power of our algorithms. Our proposal achieves in the text domain a significantly greater Spearman Rank Correlation Coefficient (SRCC) than a default pre-trained text encoder (0.575 ± 0.007 and 0.538 ± 0.007, respectively) over the Memento10K dataset. A similar trend, although less pronounced, can be noticed in the visual domain.We believe these findings signal the potential benefits that cross-modal predictive systems can extract from being fine-tuned to the specific issue of media memorability.

Archivo Digital UPM

Archivo Digital UPM
Ver en:
Archivo Digital UPM