GitHub

Axolotl: Streamlined AI Model Post-Training

What is the project about?

Axolotl is a tool designed to simplify and accelerate the post-training process for various AI models. Post-training encompasses techniques like fine-tuning, parameter-efficient tuning (LoRA, QLoRA), supervised fine-tuning (SFT), instruction tuning, and alignment. It provides a user-friendly interface, primarily through YAML configuration files, to manage the entire workflow.

What problem does it solve?

Post-training large language models (LLMs) and other AI models can be complex, requiring significant configuration and infrastructure setup. Axolotl addresses this by:

  • Simplifying Configuration: Uses YAML files to define training parameters, datasets, and model architectures, making the process more accessible and reproducible.
  • Streamlining Workflow: Handles dataset preprocessing, training/fine-tuning, inference, and evaluation in a unified framework.
  • Supporting Diverse Models and Techniques: Offers compatibility with a wide range of Hugging Face models and various post-training methods.
  • Optimizing Performance: Integrates with performance-enhancing technologies like Flash Attention and xformers, and supports multi-GPU and multi-node training.
  • Reducing Boilerplate: Automates many of the repetitive tasks involved in model training.

What are the features of the project?

  • Broad Model Support: Compatible with numerous Hugging Face models, including LLaMA, Mistral, Mixtral, Pythia, Falcon, and more.
  • Multiple Training Methods: Supports full fine-tuning, LoRA, QLoRA, ReLoRA, and GPTQ.
  • YAML-Based Configuration: Defines training setups using easy-to-understand YAML files. CLI overrides are also supported.
  • Flexible Dataset Handling: Loads various dataset formats, allows custom formats, and supports pre-tokenized datasets.
  • Performance Optimizations:
    • Integration with xformers and Flash Attention.
    • Support for the Liger kernel.
    • RoPE scaling and multipacking.
  • Distributed Training: Supports single-GPU, multi-GPU (FSDP, DeepSpeed), and multi-node training.
  • Docker Integration: Provides Docker support for easy local and cloud deployment.
  • Experiment Tracking: Integrates with Weights & Biases (wandb), MLflow, and Comet for logging results and checkpoints.
  • Multipacking: Efficiently packs multiple short sequences into a single training example.

What are the technologies used in the project?

  • Python 3.11
  • PyTorch (≥ 2.4.1)
  • Hugging Face Transformers: For model definitions and training utilities.
  • Flash Attention: For optimized attention mechanisms.
  • xformers: For memory-efficient transformer components.
  • DeepSpeed / FSDP: For distributed training.
  • YAML: For configuration files.
  • Docker: For containerization.
  • Weights & Biases (wandb), MLflow, Comet: For experiment tracking.
  • Liger Kernel (optional)
  • NVIDIA or AMD GPU

What are the benefits of the project?

  • Accessibility: Makes advanced post-training techniques accessible to a wider range of users.
  • Reproducibility: YAML configurations ensure consistent and reproducible training runs.
  • Efficiency: Performance optimizations and distributed training support accelerate the training process.
  • Flexibility: Supports a wide variety of models, datasets, and training methods.
  • Scalability: Can be deployed on various hardware setups, from single GPUs to large clusters.
  • Faster Development: Streamlines the workflow, allowing for quicker iteration and experimentation.

What are the use cases of the project?

  • Fine-tuning LLMs for specific tasks: Adapting pre-trained language models to perform well on tasks like text summarization, question answering, code generation, or chatbot interactions.
  • Instruction Tuning: Training models to follow specific instructions.
  • Alignment: Aligning model outputs with human preferences or desired behaviors.
  • Research and Development: Providing a flexible platform for experimenting with new post-training techniques.
  • Creating Specialized Models: Developing models tailored to specific domains or industries.
  • Parameter Efficient Fine Tuning: Adapting models with limited compute resources.
  • Any task requiring post-training of a supported model.
</p>
axolotl screenshot