For external participants: Please join our mailing list to receive announcements of future C4DM seminars: http://c4dm.eecs.qmul.ac.uk/seminars.html
All welcome, no pre-booking required.
Date and Time Wednesday, 22nd Nov 2017, at 4:00pm
Place Room B.R. 3.01, Bancroft Road Teaching Rooms, Queen Mary University of London, Mile End Road, London E1 4NS. Information on how to access the school can be found at http://www.qmul.ac.uk/about/howtofindus/mileend/.
Speaker Athanasios Lykartsis
Title Using MIR methods for speech rhythm analysis: Results, challenges and perspectives
Abstract The talk will focus on the existing results on the description of speech rhythm by adapting and applying rhythm description methods from MIR for tasks such as automatic language and/or speaker identification. Various methods for the extraction of rhythm features such as the beat histogram, rhythm patterns and rhythmic similarity calculation schemes have been applied in order to extract rhythm information for music analysis tasks (e.g. genre classification) with relative success. In prior work, a more sophisticated novelty function extraction has been developed to extract rhythm features based on a beat histogram for both speech and music related tasks and has shown promising results: using those features, drum-based genre classification (for music) and language identification (for indo-european languages and higher quality signals) showed improved accuracy for a few datasets, with features related to the spectral flux and the fundamental frequency (for speech) being very informative. However, there is still room for improvement compared to other state-of-the-art approaches, especially when working with unbalanced datasets and lower quality signals or when approaching speaker identification. Latest research has focused either on using the i-vector approach or following the popular paradigm of deep learning for feature learning. We will discuss possibilities to benefit from music cognition approaches as well as further research goals, including onset detection for speech using deep learning with RNNs, direct pattern extraction for rhythmic similarity calculation and defining or extracting a rhythmic vocabulary for feature design.
Bio Athanasios Lykartsis was born in Thessaloniki, Greece, where he received his Diploma in Electrical and Computer Engineering at the Aristoteles University of Thessaloniki in 2009. In the course of his studies he specialized in signal processing, pattern recognition and machine learning. After this he continued to receive a Master's degree in Audio Technology at the Technische Universität Berlin in 2014, where he specialized further in audio signal processing, specifically MIR tasks and speech technology. At the moment he is a research associate and PhD candidate at the Audio Communication Group of the TU Berlin where he holds research and teaching responsibilities. Special interests include rhythm feature design (for speech and music), rhythm-based genre classification/language identification and machine learning for audio data analysis.