Guidance
What is the project about?
Guidance is a programming paradigm and Python library designed for controlling and steering language models (LLMs) more effectively and efficiently than traditional prompting or fine-tuning. It allows for structured output generation, constrained generation, and seamless interleaving of control flow and generation.
What problem does it solve?
- Lack of Control: Traditional prompting gives limited control over the structure and content of LLM output.
- Inefficiency: Standard prompting and chaining can be slow and expensive, requiring multiple LLM calls and intermediate parsing.
- Tokenization Issues: Standard tokenization can lead to unexpected behavior and biases, especially at prompt boundaries.
- Complex Tool Integration: Integrating tools (like calculators or search engines) with LLMs traditionally requires complex parsing and handling of intermediate outputs.
- Lack of structured output.
What are the features of the project?
- Pure Python Syntax: Write generation logic using familiar Python constructs (conditionals, loops) with added LLM-specific functionality.
- Constrained Generation: Force the model to generate output that adheres to specific constraints:
- Selection: Choose from a predefined set of options.
- Regular Expressions: Match generated text against regular expressions.
- Context-Free Grammars: Define complex output structures using CFGs.
- Pre-built components: Use built in functions like
substring
andjson
.
- Stateful Control + Generation: Create functions that combine control flow (if/else, loops) with generation, eliminating the need for external parsers. This is equivalent to a single LLM call, improving speed.
- Tool Use: Easily integrate external tools (like calculators) by defining trigger grammars and tool functions. The model automatically stops generation, calls the tool, and resumes.
- Token Healing: Automatically handles token boundary issues, allowing users to work with text instead of worrying about tokenization artifacts.
- Rich Templating: Use f-string-like syntax for easy template creation.
- Chat Abstraction: Provides a clean interface for interacting with chat models, handling special tokens automatically.
- Reusable Components: Create and reuse custom guidance functions.
- Streaming Support: Supports streaming output, integrated with Jupyter notebooks.
- Multi-modal Support: Works with images, demonstrated with Gemini.
- Backend Compatibility: Supports various backends, including Transformers, llama.cpp, AzureAI, VertexAI, and OpenAI.
What are the technologies used in the project?
- Python: The primary programming language.
- Language Models: Supports various LLMs through different backends:
- Transformers: (Hugging Face Transformers library)
- llama.cpp: For local execution of Llama models.
- OpenAI: (GPT-3.5, etc.)
- VertexAI: (Google's Vertex AI platform, including PaLM 2 and Gemini)
- AzureAI: (Microsoft Azure AI)
What are the benefits of the project?
- Increased Control: Precise control over LLM output structure and content.
- Improved Efficiency: Faster and more cost-effective than traditional prompting due to batched text and single-call execution.
- Simplified Development: Easier to write and maintain complex LLM interactions.
- Reduced Errors: Token healing and constrained generation minimize unexpected behavior.
- Seamless Tool Integration: Simplified tool use without complex parsing.
- Higher Quality Output: Constrained generation and structured output.
- Portability: Write once, run on multiple backends.
What are the use cases of the project?
- Chatbots: Building chatbots with complex conversation flows and tool integration.
- Structured Data Extraction: Extracting information from text into structured formats (e.g., JSON).
- Code Generation: Generating code that adheres to specific syntax and constraints.
- Content Creation: Creating content with specific formatting and style requirements.
- Reasoning Tasks: Implementing reasoning frameworks like ReAct.
- Question Answering: Building question-answering systems with controlled responses.
- Any task requiring precise control over LLM output.
