GitHub

Guidance

What is the project about?

Guidance is a programming paradigm and Python library designed for controlling and steering language models (LLMs) more effectively and efficiently than traditional prompting or fine-tuning. It allows for structured output generation, constrained generation, and seamless interleaving of control flow and generation.

What problem does it solve?

  • Lack of Control: Traditional prompting gives limited control over the structure and content of LLM output.
  • Inefficiency: Standard prompting and chaining can be slow and expensive, requiring multiple LLM calls and intermediate parsing.
  • Tokenization Issues: Standard tokenization can lead to unexpected behavior and biases, especially at prompt boundaries.
  • Complex Tool Integration: Integrating tools (like calculators or search engines) with LLMs traditionally requires complex parsing and handling of intermediate outputs.
  • Lack of structured output.

What are the features of the project?

  • Pure Python Syntax: Write generation logic using familiar Python constructs (conditionals, loops) with added LLM-specific functionality.
  • Constrained Generation: Force the model to generate output that adheres to specific constraints:
    • Selection: Choose from a predefined set of options.
    • Regular Expressions: Match generated text against regular expressions.
    • Context-Free Grammars: Define complex output structures using CFGs.
    • Pre-built components: Use built in functions like substring and json.
  • Stateful Control + Generation: Create functions that combine control flow (if/else, loops) with generation, eliminating the need for external parsers. This is equivalent to a single LLM call, improving speed.
  • Tool Use: Easily integrate external tools (like calculators) by defining trigger grammars and tool functions. The model automatically stops generation, calls the tool, and resumes.
  • Token Healing: Automatically handles token boundary issues, allowing users to work with text instead of worrying about tokenization artifacts.
  • Rich Templating: Use f-string-like syntax for easy template creation.
  • Chat Abstraction: Provides a clean interface for interacting with chat models, handling special tokens automatically.
  • Reusable Components: Create and reuse custom guidance functions.
  • Streaming Support: Supports streaming output, integrated with Jupyter notebooks.
  • Multi-modal Support: Works with images, demonstrated with Gemini.
  • Backend Compatibility: Supports various backends, including Transformers, llama.cpp, AzureAI, VertexAI, and OpenAI.

What are the technologies used in the project?

  • Python: The primary programming language.
  • Language Models: Supports various LLMs through different backends:
    • Transformers: (Hugging Face Transformers library)
    • llama.cpp: For local execution of Llama models.
    • OpenAI: (GPT-3.5, etc.)
    • VertexAI: (Google's Vertex AI platform, including PaLM 2 and Gemini)
    • AzureAI: (Microsoft Azure AI)

What are the benefits of the project?

  • Increased Control: Precise control over LLM output structure and content.
  • Improved Efficiency: Faster and more cost-effective than traditional prompting due to batched text and single-call execution.
  • Simplified Development: Easier to write and maintain complex LLM interactions.
  • Reduced Errors: Token healing and constrained generation minimize unexpected behavior.
  • Seamless Tool Integration: Simplified tool use without complex parsing.
  • Higher Quality Output: Constrained generation and structured output.
  • Portability: Write once, run on multiple backends.

What are the use cases of the project?

  • Chatbots: Building chatbots with complex conversation flows and tool integration.
  • Structured Data Extraction: Extracting information from text into structured formats (e.g., JSON).
  • Code Generation: Generating code that adheres to specific syntax and constraints.
  • Content Creation: Creating content with specific formatting and style requirements.
  • Reasoning Tasks: Implementing reasoning frameworks like ReAct.
  • Question Answering: Building question-answering systems with controlled responses.
  • Any task requiring precise control over LLM output.
guidance screenshot