Multimodal Film Music Dataset

2025

Dataset & Website

Developer

This project is my Master’s thesis in Music Technology at the Georgia Institute of Technology. It grew out of a gap in the film-music literature: there are very few large, publicly available datasets that connect film music with visuals, dialogue, and narrative context. It also reflects my interest in film and dramatic literature and how music shapes on-screen storytelling.

We collected 1,800 feature films across 5 genres, 12 countries, and 3 recent decades, with significant help from cinephile collaborators and archivists. Music cues were detected from raw film audio using pretrained audio classifiers, then separated from dialogue and Foley/FX using deep source separation models. For each film we extracted music, textual, and visual features at a 2-second time resolution, including:

Musical attributes: instrumentation, mood, genre, RMS and loudness variability, tempo, and related features.
Textual attributes: dialogue sentiment/emotion, lexical density, and associated statistics.
Visual attributes: shot boundaries and cut locations, dominant frame colors, and detected visual objects with their approximate dimensions.

To manage this at scale, I developed a web application that orchestrates the extraction pipeline and stores features in a PostgreSQL relational database. The system supports querying and versioning different extraction methods, running jobs in parallel, and monitoring them via an event-based architecture using Celery and RabbitMQ.

On top of this, I built a public-facing interface using Django with HTML, CSS, and JavaScript that lets users explore films and their features, filter by metadata, and export results. The landing page visualizes the learned music embeddings in a 3D UMAP space, revealing clusters of cues along dimensions such as instrumentation, mood, and genre.

My role was to design and implement the full stack of the data pipeline and website, under the supervision of Dr. Claire Arthur in the Computational and Cognitive Musicology Lab.

We are currently preparing a paper based on this dataset. The public interface and GitHub repository will be released after the review process. In the meantime, you can see a demo of the website in the video above.

Farshad Jafari