Abstract: Automatic music arrangement streamlines the creation of musical variants for composers and arrangers, reducing reliance on extensive music expertise. However, existing methods suffer from inefficient tokenization, underutilization of pre-trained music language models (LMs), and suboptimal fidelity and coherence in generated arrangements. This paper introduces an efficient multitrack music tokenizer for unconditional and conditional symbolic music generation, along with a unified sequence-to-sequence reconstruction fine-tuning objective for pre-trained music LMs that balances task-specific needs with coherence constraints. Our approach achieves state-of-the-art results on band arrangement, piano reduction, and drum arrangement, surpassing task-specific models in both objective metrics and perceptual quality. Additionally, we demonstrate that generative pretraining significantly contributes to the performance across these arrangement tasks, especially when handling long segments with complex alignment.
Band Arrangement
This task involves arranging an existing piece of music for arbitrary combinations of instruments. The model must understand the properties and typical playing styles of each instrument to allocate or generate new notes appropriately. We adopt Transformer-VAE from (Zhao, 2024) as the major baseline, the strongest previously reported model for multitrack arrangement without assumptions on track type or number. It combines Transformer-based long-term and inter-track modeling with a VQ-VAE generation module. Additionally, the Rule-Based method distributes notes evenly by pitch across instruments. Results from an ablation variant of our model are also included where the generative pre-training phrase is removed (w/o PT).
Demo No. | Source Music | Ours | Transformer-VAE | w/o PT | Rule-Based |
---|---|---|---|---|---|
#1 (Jazz Band) | |||||
#2 (Jazz Band) | |||||
#3 (Rock Band) | |||||
#4 (Rock Band) | |||||
#5 (String Trio) | |||||
#6 (String Trio) |
Piano Reduction
This task focuses on simplifying multi-instrumental musical pieces into solo piano arrangements that ensure pianistic playability while preserving the original musical essence, i.e., harmonies and textures. The major baseline is UNet from (Terao, 2023), the most recent work in this area. Additionally, Rule-F is the flattened multitrack where the piano plays all notes, and Rule-O is the original piano track.
No. | Source Music | Ours | UNet | w/o PT | Rule-F | Rule-O |
---|---|---|---|---|---|---|
#1 | ||||||
#2 | ||||||
#3 | ||||||
#4 | ||||||
#5 |
Drum Arrangement
This task involves creating a drum track for songs that lack one. The model needs to recognize the groove of the music and enhance it using the drum set, and further, handle transitions between musical phrases to drive the music forward and make it more engaging, which consequently requires a better understanding of musical structure. The Composer's Assistant 2 (CA v2) from (Malandro, 2024) was used as the major baseline, which is a strong track infilling model capable of handling multitrack inputs and generating drum outputs.
No. | Source Music | Ours | Ground Truth | CA v2 | w/o PT |
---|---|---|---|---|---|
#1 | |||||
#2 | |||||
#3 | |||||
#4 | |||||
#5 |