LIMO: Less Is More for Reasoning

What is the project about?

LIMO is a research project that challenges the conventional approach to training large language models (LLMs) for mathematical reasoning. It demonstrates that high performance can be achieved with significantly less training data, provided that the data is of high quality and carefully curated.

What problem does it solve?

The project addresses the problem of inefficient scaling in LLM training for mathematical reasoning. Traditional approaches often rely on massive datasets, which can be expensive and time-consuming to create and process. LIMO shows that a smaller, strategically selected dataset can outperform much larger ones, reducing the computational resources and time required for training. It also addresses the issue of generalization in mathematical reasoning, showing strong performance across a variety of problem types.

What are the features of the project?

State-of-the-art (SOTA) performance: Achieves top results on multiple mathematical reasoning benchmarks with a remarkably small training set.
Strong generalization: Demonstrates excellent performance across diverse mathematical problem types, not just those seen during training.
Comprehensive evaluation: Thoroughly tested on 10 different benchmarks, providing a robust measure of its capabilities.
Publicly available resources: Releases the high-quality training dataset, the trained model, and evaluation tools to the research community.
Efficient training: Uses the LLaMA-Factory framework for streamlined and efficient training.
Multiple inference options: Compatible with popular frameworks like Hugging Face Transformers, VLLM, and TensorRT-LLM.
Rule-based and Model-based Evaluation: Provides scripts for evaluating LLMs on mathematical reasoning tasks, using both rule-based (for numerical answers) and model-based (for complex responses) approaches.

What are the technologies used in the project?

Base Model: Qwen2.5-32B-Instruct (a large language model).
Training Framework: LLaMA-Factory.
Inference Frameworks: Hugging Face Transformers, VLLM, TensorRT-LLM.
Programming Language: Python.
Hosting: Hugging Face (for models and datasets).
Evaluation Judge Model: Qwen2.5-32B-Instruct

What are the benefits of the project?

Reduced training costs: Requires significantly less data and computational resources compared to traditional methods.
Faster training times: The smaller dataset leads to quicker training cycles.
Improved generalization: The focus on high-quality data leads to better performance on unseen problems.
Open-source contribution: Provides valuable resources (dataset, model, code) to the research community, fostering further advancements.
Resource efficiency: Demonstrates that high performance doesn't always require massive datasets, promoting more sustainable AI development.

What are the use cases of the project?

Mathematical reasoning research: Provides a strong baseline and valuable resources for further research in this area.
Development of mathematical reasoning systems: Can be used as a foundation for building applications that require mathematical problem-solving capabilities.
Educational tools: Potentially applicable in educational settings to assist with learning and problem-solving in mathematics.
Automated theorem proving: Could contribute to advancements in automated theorem proving and formal verification.
Scientific discovery: May aid in scientific research by automating complex mathematical calculations and reasoning tasks.
Improving RL Scaling: The dataset has shown potential in enhancing reinforcement learning scaling.