YuE (乐) Project Description

What is the project about?

YuE is a series of open-source foundation models for music generation, specifically designed for transforming lyrics into complete songs (lyrics2song). It generates full songs, including both vocal and accompaniment tracks, spanning multiple minutes.

What problem does it solve?

YuE addresses the challenge of generating high-quality, full-length songs with both vocals and accompaniment from textual input (lyrics and genre tags). It simplifies the music creation process, making it accessible to a wider audience. It also allows for style transfer and voice cloning via in-context learning.

What are the features of the project?

Lyrics-to-Song Generation: Transforms lyrics and genre tags into complete songs.
Full Song Generation: Creates songs lasting several minutes, not just short clips.
Vocal and Accompaniment Tracks: Generates both vocal and instrumental parts.
Multi-Genre/Language/Vocal Technique Support: Models diverse musical styles, languages (English, Mandarin, Cantonese, Japanese, Korean, etc.), and vocal techniques.
In-Context Learning (ICL): Allows users to provide a reference song (single or dual-track) to influence the style of the generated music (voice cloning, music style transfer).
Dual-Track ICL: Uses separate vocal and instrumental tracks for improved ICL performance.
Chain-of-Thought (CoT) Mode: Standard generation mode without a reference song.
Customizable Generation: Allows control over the number of song sections, repetition penalty, and batch size.
Open Source (Apache 2.0 License): Freely available for use and modification, encouraging creative use and attribution.
Prompt Engineering Guide: Provides detailed instructions for crafting effective prompts.
Gradio Interface Support: Several community projects provide GUI.

What are the technologies used in the project?

Python: Primary programming language.
PyTorch: Deep learning framework.
Hugging Face Transformers: Library for working with transformer models.
FlashAttention 2: For reduced VRAM usage and faster generation.
xcodec_mini: Custom tokenizer.
Conda: Package and environment management.
Git LFS: For managing large files (model weights).
Docker: Supported by community projects.

What are the benefits of the project?

Democratizes Music Creation: Makes music generation accessible to non-musicians.
Enables Creative Exploration: Allows artists to experiment with different styles and genres.
Facilitates Content Creation: Provides a tool for generating original music for various applications.
Open Source and Collaborative: Encourages community contributions and improvements.
Commercial Use Allowed: Users can monetize their creations (with attribution).
High-Quality Output: Capable of generating impressive and musically coherent songs.

What are the use cases of the project?

Songwriting and Composition: Generating original songs from lyrics.
Music Production: Creating backing tracks or full arrangements.
Content Creation: Generating music for videos, games, podcasts, and other media.
Music Education: Exploring different musical styles and arrangements.
AI Research: Studying music generation and representation learning.
Style Transfer: Adapting existing songs to new styles.
Voice Cloning: Generating new songs in the style of a specific vocalist.
Accessibility: Creating music for individuals with limited musical training.