Kosta, Katerina, Oscar F. Bandtlow, Elaine Chew (2018). MazurkaBL: Score-aligned loudness, beat, and expressive markings data for 2000 Chopin Mazurka recordings. In Proceedings of the 4th International Conference on Technologies for Music Notation and Representation (TENOR’18), 24-26 May 2018, Montreal, Canada.
Large-scale analysis of expressive performance—with focus on how a performer responds to score markings—has been limited by a lack of big datasets of recordings with accurate beat and loudness information with score markings. To bridge this gap, we created the MazurkaBL dataset, a collection of score-beat positions and loudness values, with corresponding score dynamic and tempo markings for 2000 recordings of forty-four Chopin Mazurkas. MazurkaBL forms the largest annotated expressive performance dataset to date. This paper describes how the dataset was created, and variations found in the dataset. For each Mazurka, the recordings were first aligned to the score and one to another to facilitate the transfer of meticulously created manual beat annotations from one reference to all other recordings. We propose a multi-recording alignment heuristic that optimises the reference audio choice for best average alignment results. Loudness values in sones are extracted and analysed; we also provide the score position of dynamic and tempo markings. The result is a rich repository of score-aligned loudness, beat, and expressive marking data for studying expressive variations. We further discuss recent and future applications of MazurkaBL and future directions for database development.