A deep-learning system for isolating individual instruments from orchestral recordings.

One of the objectives of the REPERTORIUM project is to achieve sound source separation in classical music recordings — that is, isolating each instrument from an orchestral performance.
To reach this goal, the team is developing novel deep-learning models and dedicated datasets for training and evaluation.

A central part of this effort involves creating both real and synthetic datasets of orchestral music, enabling robust machine learning models capable of separating and enhancing individual instrument tracks from complex audio mixtures.

Source Separation 1 - Repertorium AI will revolutionise music scholarship, enhance streaming revenues, and empower musicians

Real Recordings Dataset

At REPERTORIUM, Mozart’s Symphony No. 40 in G minor and Tchaikovsky’s Romeo and Juliet were recorded at The Spheres Recording Studio — not in the traditional orchestral setup, but by recording each instrument individually. This unique dataset offers around one hour of authentic studio performances, providing perfect ground truth references for source separation research.

In order to have more data to train our deep-learning models, we have also developed a new synthetic dataset utilizing a vast MIDI database, Spitfire’s BBC Symphony Orchestra Professional synthesizer, and an automated procedure we created to generate diverse and representative conditions during the synthesis process.

The resulting dataset, called SynthSOD [1], contains more than 47 hours of classical music with separation references for most of the instruments that we can expect to find in a symphonic orchestra.
image1 - Repertorium AI will revolutionise music scholarship, enhance streaming revenues, and empower musicians
Despite the large number of hours of SynthSOD, we found that models trained with it were not able to obtain good results when used on real recordings. To fix this problem, we came up with a novel idea: predict the separation masks based only on the score information. The idea is simple: the model predicts in which frequencies the sound of every instrument will be present, and then we use this information to filter the audio and obtain the separation results.

Since the model is not taking the audio at its input, models trained with synthetic audio can obtain good results when applied to real recordings [2]. Following this approach, we have been able to obtain some of the best separation results for small ensembles to date:
Original Audio WAV
Separeted Violin FLAC
Separeted Viola FLAC
Separeted Cello FLAC
Apart from the main stereo microphones that capture most of the sound in the room, orchestras are usually also recorded using extra microphones close to every instrument, so during the production phase, the audio engineers can use those signals to increase the loudness of instruments that might be too weak in the main microphones.

However, the signals of these close microphones do not contain only the sound of their closer instrument, but also a large amount of interference (also known as bleeding) from the other instruments.

At REPERTORIUM, we are currently developing new deep-learning models to eliminate this bleeding, making the work easier for audio engineers working on the production of classical music recordings.

Audio Engineering

Improved mixing and mastering workflows through automatic instrument deblending and noise reduction.

Music Information Retrieval

Enables precise analysis of orchestration, performance dynamics, and timbral structure.

Dataset Contribution

Public release of both real and synthetic datasets will provide essential benchmarks for future research in source separation.

[1] SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation​

in IEEE Open Journal of Signal Processing, vol. 6, pp. 129-137, 2025
J. Garcia-Martinez, D. Diaz-Guerra, A. Politis, T. Virtanen, J. J. Carabias-Orti and P. Vera-Candeas

[2] Score-informed Music Source Separation: Improving Synthetic-to-real Generalization in Classical Music

2025 33rd European Signal Processing Conference (EUSIPCO), Palermo, Italy, 2025
Eetu Tunturi, David Diaz-Guerra, Archontis Politis, Tuomas Virtanen

Contact us to request a demo