Buscador | Buscador

Publicaciones de conferencias: comunicaciones, ponencias, pósters, etc (conferenceObject). 2023

Video Memorability Prediction From Jointly-learnt Semantic and Visual Features

Archivo Digital UPM

oai:oa.upm.es:76826

Archivo Digital UPM

Martín Fernández, Iván
Kleinlein, Ricardo
Luna Jiménez, Cristina
Gil Martín, Manuel
Fernández Martínez, Fernando

The memorability of a video is defined as an intrinsic property of its visual features that dictates the fraction of people who recall having watched it on a second viewing within a memory game. Still, unravelling what are the key features to predict memorability remains an obscure matter. This challenge is addressed here by fine-tuning text and image encoders using a cross-modal strategy known as Contrastive Language-Image Pre-training (CLIP). The resulting video-level data representations learned include semantics and topic-descriptive information as observed from both modalities, hence enhancing the predictive power of our algorithms. Our proposal achieves in the text domain a significantly greater Spearman Rank Correlation Coefficient (SRCC) than a default pre-trained text encoder (0.575 ± 0.007 and 0.538 ± 0.007, respectively) over the Memento10K dataset. A similar trend, although less pronounced, can be noticed in the visual domain.We believe these findings signal the potential benefits that cross-modal predictive systems can extract from being fine-tuned to the specific issue of media memorability.

Proyecto: EC/HE/101071191

DOI: https://oa.upm.es/76826/

Archivo Digital UPM

oai:oa.upm.es:76826

HANDLE: https://oa.upm.es/76826/

Archivo Digital UPM

oai:oa.upm.es:76826

Ver en: https://oa.upm.es/76826/

Archivo Digital UPM

oai:oa.upm.es:76826

Guía de uso

1033

BUSCADOR RECOLECTA

Video Memorability Prediction From Jointly-learnt Semantic and Visual Features

Archivo Digital UPM