GitHub

Project Description: torchchat

What is the project about?

torchchat is a codebase that enables the execution of large language models (LLMs) across various platforms and environments. It focuses on providing a seamless experience for running LLMs.

What problem does it solve?

It simplifies the deployment and execution of LLMs, removing the complexity of running these models in different environments (Python, C/C++, iOS, Android). It bridges the gap between research (PyTorch) and production (mobile, server). It allows running LLMs on resource-constrained devices.

What are the features of the project?

  • Cross-Platform Execution: Runs LLMs in Python, C/C++ applications, iOS, and Android.
  • Multiple Execution Modes: Supports eager execution, compilation (AOT Inductor), and mobile-optimized execution (ExecuTorch).
  • Model Support: Compatible with a wide range of popular LLMs, including Llama 3, Llama 2, Mistral, CodeLlama, and more. Includes multimodal support.
  • Data Type Flexibility: Handles float32, float16, and bfloat16 data types.
  • Quantization: Offers multiple quantization schemes for model size reduction and performance improvement.
  • Command-Line Interface: Provides a CLI for interacting with models (chat, generate), managing models, and exporting for different environments.
  • Web Interface: Includes a simple browser-based chat interface.
  • REST API: Exposes a REST API (following OpenAI's specification) for model interaction.
  • Evaluation: Integrates with the lm-eval library for model accuracy evaluation.
  • Mobile Friendly Models.

What are the technologies used in the project?

  • PyTorch: The core framework for model definition and execution.
  • AOT Inductor (AOTI): PyTorch's ahead-of-time compilation tool for faster inference.
  • ExecuTorch: PyTorch's framework for mobile and embedded device deployment.
  • Hugging Face Transformers: Used for accessing and managing pre-trained models.
  • C/C++: For native application integration and runners.
  • Python: Primary scripting language.
  • Streamlit: For the browser-based chat interface.
  • lm_evaluation_harness: For model evaluation.
  • Android SDK, NDK, Java, Kotlin: For Android deployment.
  • Xcode, Swift: For iOS deployment.

What are the benefits of the project?

  • Simplified Deployment: Makes it easier to deploy and run LLMs in various environments.
  • Performance Optimization: Offers options for compilation and quantization to improve inference speed and reduce resource usage.
  • Cross-Platform Compatibility: Enables running the same model across different platforms.
  • Extensibility: Designed with a modular architecture for easy extension and customization.
  • Native PyTorch Integration: Leverages the power and flexibility of PyTorch.
  • Mobile Deployment: Enables running LLMs on mobile devices.

What are the use cases of the project?

  • Chatbots: Building interactive chatbots for various applications.
  • Text Generation: Generating text for creative writing, content creation, and more.
  • Code Generation: Assisting with code development (especially with models like CodeLlama).
  • Mobile Applications: Integrating LLM capabilities into mobile apps.
  • Embedded Systems: Running LLMs on resource-constrained devices.
  • Research and Development: Prototyping and experimenting with LLMs.
  • Server-Side Applications: Deploying LLMs as part of backend services.
  • Model Evaluation: Benchmarking and evaluating the performance of different LLMs.
torchchat screenshot