For external participants: Please join our mailing list to receive announcements of future C4DM seminars.
Date and Time Friday, 25th Nov 2016, at 3:00pm
Place Room 3.25, Electronic Engineering building, Queen Mary University of London, Mile End Road, London E1 4NS. Information on how to access the school can be found at here.
Speaker Tian Cheng
Title Exploiting Piano Acoustics in Automatic Transcription
Video
Abstract In this talk we exploit piano acoustics to automatically transcribe piano recordings into a symbolic representation: the pitch and timing of each detected note. The talk mainly consists of two parts. Firstly, we investigate the decay of individual piano partials based on the theoretical analysis of coupled piano strings to model the decay patterns of piano in real-world recordings. In the second part, we propose an attack/decay model, that takes into account the time-varying timbre and decaying energy of piano sounds. The system divides a piano note into percussive attack and harmonic decay stages, and separately models the two parts using two sets of templates and amplitude envelopes. The two parts are coupled by the note activations. We simplify the decay envelope by an exponentially decaying function. We demonstrate the utility of the proposed system in piano music transcription. Results show that explicitly modelling piano acoustical features, especially temporal features, can improve the transcription performance.
Bio Tian Cheng recently completed a Ph.D. at Centre for Digital Music in Queen Mary of London University, supervised by Simon Dixon and Matthias Mauch. Her Ph.D. research topic is automatic transcription of piano music using acoustical cues. In 2012, she received a Master's degree in Huazhong University of Science and Technology, China.
Speaker Siddharth Sigtia
Title Neural Networks for Analysing Music and Environmental Audio
Video
Abstract We consider the analysis of music and environmental audio recordings with neural networks. Recently, neural networks have been shown to be an effective family of models for speech recognition, computer vision, natural language processing and a number of other statistical modelling problems. The composite layer-wise structure of neural networks allows for flexible model design, where prior knowledge about the domain of application can be used to inform the design and architecture of the neural network models. Additionally, it has been shown that when trained on large quantities of data, neural networks can be directly applied to low-level features to learn mappings to high level concepts like phonemes in speech and object classes in computer vision. In this work we investigate whether neural network models can be usefully applied to processing music and environmental audio.
Bio Siddharth Sigtia is currently a researcher at the Siri Speech team at Apple where he investigates neural networks for acoustic modelling for speech recognition. He finished his PhD in Electronics Engineering at the Centre for Digital Music (C4DM) at Queen Mary University of London, where he was supervised by Simon Dixon. Previously, he received a Master's degree in Physics and a Bachelor's degree in Electronics Engineering from BITS, Pilani, India.