Spatial Audio and Immersive Streaming

Polytechnic University of Milan (Politecnico di Milano)

A spatial-audio tool for reconstructing and streaming 3D concert-hall sound fields with six-degrees-of-freedom navigation.

The Image and Sound Processing Lab (ISPL) at Politecnico di Milano contributed to REPERTORIUM through two connected tasks related to classical music concerts:

1) sound-field reconstruction for six-degrees-of-freedom (6DoF) navigation in concert halls

2) spatial-audio streaming of classical concerts. We focused primarily on 6DoF reconstruction.
We explored multiple acquisition strategies and advanced both parametric and data-driven approaches to model an acoustic scene from a small set of sparse microphones.

Parametric methods describe the field with compact signal models (e.g., source positions and direct/reverberant components), which enable efficient, low-latency navigation.

Data-driven methods, instead, rely on neural networks trained to infer spatial cues and room acoustics from multi-microphone recordings.

After reconstruction, users can freely move and interact with the virtual concert scene. As the delivery layer, our streaming pipeline includes a lightweight renderer exposing prescribed listening positions.

It supports (i) real-time spatial-audio streaming of live classical concerts, where remote listeners can select a seat in the hall and experience the performance as if present, and (ii) on-demand experiences, where users navigate reconstructed areas of the venue and listen while moving through the space.

Parametric Sound-Field Reconstruction

Development of a parametric sound-field reconstruction method that explicitly incorporates early reflections into the model, whereas previous state-of-the-art approaches accounted only for the direct sound component and diffuse field. This extension enables a more realistic and immersive auditory experience for the listener.

Federated Learning for Sound-field Modeling

Development of a federated learning–based sound-field reconstruction technique that leverages advanced AI methods to efficiently reconstruct acoustic scenes from a limited number of measurements acquired with higher-order microphones.

Physics-informed and Diffusion-based Modeling

Exploration of innovative sound-field reconstruction techniques, including diffusion models and physics-informed neural networks.

Spatial Upsampling of Microphone Arrays

Development of a physics-informed approach for upsampling spherical microphone array recordings, enabling the generation of high-quality measurements from low-resolution data.

VR Testing Environment

Development of a lightweight immersive streaming pipeline for delivering spatial-audio concerts directly to remote users.

Immersive Streaming Pipeline

Development of an immersive testing framework that leverages virtual reality headsets to enable users to evaluate spatial audio algorithms within a fully interactive and immersive environment.

Acoustic Measurements

Acquired the acoustic response of various performing spaces, such as concert halls, classrooms and theaters.

Immersive Streaming of Live and Recorded Concerts

Audiences can experience concerts in full spatial audio, both in real time and offline. Listeners select their preferred position — front row, balcony, or conductor’s podium — and enjoy the concert as if physically present. This system supports:

Real-time streaming with spatial coherence and a strong sense of presence.
On-demand exploration of reconstructed venues.

Such technology enhances audience accessibility, enables new distribution formats for cultural institutions, and helps preserve performances in immersive form.

Intelligent Sound-Field Reconstruction for Virtual and Augmented Reality

The developed models allow accurate 3D acoustic environments to be recreated from a few microphone inputs.sxApplications include:

VR/AR experiences with authentic spatial sound matching user position and orientation.
Architectural and acoustic design simulations.
Educational and training environments requiring realistic sound perception.

By reducing the number of required microphones, this approach makes high-quality immersive audio feasible and scalable across multiple domains.

A Physics-Informed Neural Network-Based Approach for the Spatial Upsampling of Spherical Microphone Arrays

2024 18th International Workshop on Acoustic Signal Enhancement (IWAENC), Aalborg, Denmark, 2024, pp. 215-219, doi: 10.1109/IWAENC61483.2024.10694489.
F. Miotello, F. Terminiello, M. Pezzoli, A. Bernardini, F. Antonacci and A. Sarti,

HOMULA-RIR: A Room Impulse Response Dataset for Teleconferencing and Spatial Audio Applications Acquired through Higher-Order Microphones and Uniform Linear Microphone Arrays

2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Seoul, Korea, Republic of, 2024, pp. 795-799, doi: 10.1109/ICASSPW62465.2024.10626753.
F. Miotello et al.

European Heritage

Archival Tools

Sound Processing

Metaverse-ready

Outcomes