C4DM Seminar: Zhaokai Wang: From Frames to Beats: Progress and Challenges in Video-to-Music Generation
QMUL, School of Electronic Engineering and Computer Science
Centre for Digital Music Seminar Series
Seminar by: Zhaokai Wang
Date/time: Friday, 23th Jan 2026, 2 pm
Location: GC222, Graduate Centre, Mile End Campus, Queen Mary University of London
Title: From Frames to Beats: Progress and Challenges in Video-to-Music Generation
Abstract: Video-to-music generation aims to create original music that is semantically, rhythmically, and emotionally aligned with the content of a given video, addressing a critical need in media creation, entertainment, and content production. This talk provides an overview of the video-to-music generation field, including a comprehensive list of models, datasets and evaluation metrics. We will share our team's research journey in this domain, spanning early symbolic music generation with handcrafted rule-based systems to the recent MLLM-driven audio synthesis approaches. We will also discuss the current challenges and impacts on the music industry, with potential future directions for advancing video-to-music generation toward more practical, creative, and human-centric applications.
Bio: Zhaokai Wang is currently a Ph.D. student at Shanghai Jiao Tong University and Shanghai AI Laboratory, supervised by Prof. Jifeng Dai. He is currently a visiting student at UCL, supervised by Prof. Jun Wang. He obtained his bachelor’s degree from Beihang University. His research interest includes multimodal large language models and music generation. He has published 10+ papers in TPAMI, NeurIPS, CVPR, ISMIR, etc, and has won the Best Paper Award in ACM MM 2021 and Best Paper Runner Up in NeurIPS 2025. https://scholar.google.com/citations?hl=zh-CN&user=W0zVf-oAAAAJ&view_op=list_works&sortby=pubdate
