Project Description: code2prompt

What is the project about?

code2prompt is a command-line tool (CLI) and Python library designed to convert a codebase into a single, well-formatted prompt suitable for large language models (LLMs) like GPT or Claude. It essentially automates the process of gathering and structuring code from a project into a format that can be easily fed into an LLM for various code-related tasks.

What problem does it solve?

The project addresses the challenge of efficiently providing a large codebase as context to LLMs. Manually copy-pasting multiple source files into a prompt is tedious, error-prone, and often exceeds the context window limits of many LLMs. code2prompt solves this by:

Automating Context Gathering: It automatically traverses a project's directory structure, collecting source code files.
Formatting for LLMs: It formats the code into a readable Markdown prompt, including a source tree representation.
Token Management: It calculates the token count of the generated prompt, helping users stay within the context window limits of their chosen LLM.
Customization: It allows users to customize the prompt generation process using Handlebars templates, enabling a wide range of use cases.
Git Integration: It can include Git diff output (staged files or branch comparisons) in the prompt, making it useful for generating commit messages or pull request descriptions.

What are the features of the project?

Prompt Generation: Creates LLM prompts from codebases.
Customizable Templates: Uses Handlebars templates for flexible prompt generation (with several pre-built templates provided).
.gitignore Respect: Respects .gitignore rules by default (can be disabled).
File Filtering: Allows filtering files using glob patterns (include/exclude).
Hidden File Control: Option to include or exclude hidden files.
Token Counting: Calculates and displays the token count of the generated prompt, supporting various OpenAI tokenizers (cl100k_base, p50k_base, p50k_edit, r50k_base, o200k_base).
Git Diff Integration: Optionally includes Git diff output (staged files or branch comparisons).
Clipboard Copy: Automatically copies the generated prompt to the clipboard.
Output File: Saves the generated prompt to an output file.
Line Numbers: Option to add line numbers to source code blocks.
JSON Output: Option to print output as JSON.
User-Defined Variables: Supports user-defined variables in templates for dynamic prompt generation.
Python SDK: Provides a Python API for programmatic use.

What are the technologies used in the project?

Rust: The core CLI tool is written in Rust.
Cargo: Rust's package manager and build system.
Handlebars: Templating engine for customizing prompt generation.
tiktoken-rs: Rust library for tokenization (counting tokens in the prompt).
Python: Python bindings are provided for integration with Python applications.
Git: Used for optional diff output.

What are the benefits of the project?

Efficiency: Saves time and effort compared to manually creating LLM prompts for code.
Accuracy: Reduces errors in prompt creation.
Scalability: Handles codebases of any size.
Flexibility: Customizable templates allow for various use cases.
Context Window Awareness: Token counting helps users manage LLM context limits.
Integration: Easy integration with LLMs and other tools (via CLI and Python SDK).
Reproducibility: Consistent prompt generation.

What are the use cases of the project?

Code Documentation: Automatically generate documentation for the codebase.
Code Analysis: Analyze code for bugs, security vulnerabilities, or performance issues.
Code Generation: Generate new code based on the existing codebase.
Code Refactoring: Improve code quality and readability.
Code Translation: Rewrite code in another programming language.
Git Commit Message Generation: Create commit messages based on staged changes.
Pull Request Description Generation: Generate pull request descriptions by comparing branches.
Chatbots/Assistants: Provide context to code-focused chatbots or AI assistants.
Educational Purposes: Help understand and learn a new codebase.
Onboarding: Quickly get up to speed with a new project.