Unifying Symbolic Music Arrangement: Track-Aware Reconstruction and Structured Tokenization


NeurIPS 2025 Submission 19229

Abstract: We present a unified framework for automatic multitrack music arrangement that enables a single pre-trained symbolic music model to handle diverse arrangement scenarios, including reinterpretation, simplification, and additive generation. At its core is a segment-level reconstruction objective operating on token-level disentangled content and style, allowing for flexible any-to-any instrumentation transformations at inference time. To support track-wise modeling, we introduce REMI-z, a structured tokenization scheme for multitrack symbolic music. By preserving track-wise continuity while reducing sequence length and complexity, it enhances modeling efficiency and effectiveness for both arrangement tasks and unconditional generation. Our method outperforms task-specific state-of-the-art models on representative tasks in different arrangement scenarios—band arrangement, piano reduction, and drum arrangement, in both objective metrics and perceptual evaluations. Taken together, our framework demonstrates strong generality and suggests broader applicability in symbolic music-to-music transformation.


Band Arrangement

This task involves arranging an existing piece of music for arbitrary combinations of instruments. The model must understand the properties and typical playing styles of each instrument to allocate or generate new notes appropriately. We adopt Transformer-VAE from (Zhao, 2024) as the major baseline, the strongest previously reported model for multitrack arrangement without assumptions on track type or number. It combines Transformer-based long-term and inter-track modeling with a VQ-VAE generation module. Additionally, the Rule-Based method distributes notes evenly by pitch across instruments. Results from an ablation variant of our model are also included where the generative pre-training phrase is removed (w/o PT).

Demo No. Source Music Ours Transformer-VAE w/o PT Rule-Based
#1 (Jazz Band)
#2 (Jazz Band)
#3 (Rock Band)
#4 (Rock Band)
#5 (String Trio)
#6 (String Trio)


Piano Reduction

This task focuses on simplifying multi-instrumental musical pieces into solo piano accompaniments that ensure pianistic playability while preserving the original musical essence, i.e., harmonies and textures. The major baseline is UNet from (Terao, 2023), the most recent work in this area. Additionally, Rule-F is the flattened multitrack where the piano plays all notes, and Rule-O is the original piano track.

The original melody is played with a different instrument and mixed with the generated piano accompaniments.

No. Source Music Ours UNet w/o PT Rule-F Rule-O
#1
#2
#3
#4
#5


Drum Arrangement

This task involves creating a drum track for songs that lack one. The model needs to recognize the groove of the music and enhance it using the drum set, and further, handle transitions between musical phrases to drive the music forward and make it more engaging, which consequently requires a better understanding of musical structure. The Composer's Assistant 2 (CA v2) from (Malandro, 2024) was used as the major baseline, which is a strong track infilling model capable of handling multitrack inputs and generating drum outputs.

No. Source Music Ours Ground Truth CA v2 w/o PT
#1
#2
#3
#4
#5


Long-Term Arrangement

Our model also supports long-term arrangement. Below, we present full-song arrangement results for three tasks: band arrangement, piano reduction, and drum arrangement.

Band Arrangement

Piano Reduction

Drum Arrangement


This site uses the jekyll template.