C4DM Seminar: You (Neil) Zhang: From Neural Fields to Perception-Informed Learning: Scalable and Perceptually Grounded HRTF Personalization
QMUL, School of Electronic Engineering and Computer Science
Centre for Digital Music Seminar Series
Seminar by: You (Neil) Zhang
Date/time: Tuesday, 17th Feb 2026, 2 pm
Location: online only
Title: From Neural Fields to Perception-Informed Learning: Scalable and Perceptually Grounded HRTF Personalization
Abstract: Head-related transfer functions (HRTFs) are fundamental to spatial audio rendering and immersive listening experiences, yet scalable personalization remains challenging. Learning-based approaches must address three key obstacles: the high dimensionality of HRTFs, heterogeneous measurement protocols across databases, and the mismatch between spectral reconstruction objectives and human perception. This talk presents a unified learning framework that tackles these challenges from three complementary perspectives—modeling, data, and perception. First, I introduce HRTF Field, an implicit neural representation that models HRTFs as continuous functions over space, enabling interpolation at arbitrary directions and unified learning across datasets with different spatial sampling schemes. Second, I analyze systematic cross-database measurement biases and propose normalization strategies that harmonize datasets to support robust cross-dataset training and improved generalization. Third, I show that minimizing spectral distortion alone yields representations weakly correlated with perceptual similarity, and present perception-informed objectives based on auditory metrics and metric multidimensional scaling to align latent spaces with coloration, externalization, and localization cues. Together, these methods establish practical foundations for perceptually grounded HRTF modeling and personalization, and highlight new opportunities at the intersection of machine learning, acoustics, and auditory perception.
Bio: You (Neil) Zhang is a PhD candidate in Electrical and Computer Engineering at the University of Rochester, advised by Prof. Zhiyao Duan in the Audio Information Research (AIR) Lab, and a Senior Researcher in Spatial Audio and Multimodal AI at Dolby Laboratories. His work explores human-centric audio intelligence, spanning spatial audio and HRTF personalization, audio deepfake detection, and audio-visual learning. He is a recipient of the ICASSP Rising Star award, the National Institute of Justice Graduate Research Fellowship, the IEEE Signal Processing Society Scholarship, and the WASPAA Best Student Paper Award. https://scholar.google.com/citations?user=nYtHcRAAAAAJ&hl=en
