What is the project about?
Data Formulator is an AI-powered application that helps users transform data and create visualizations. It leverages large language models (LLMs) to assist in the data transformation process, making it easier to create complex visualizations.
What problem does it solve?
It simplifies the process of creating data visualizations, especially when the desired visualization requires data transformations or computations that are not readily available in the original dataset. It reduces the need for manual data wrangling and coding. It allows the user to create visualizations beyond the initial dataset.
What are the features of the project?
- Combines UI interactions (drag-and-drop) with natural language (NL) inputs.
- Allows users to specify visual encodings using data fields, including those that need to be computed.
- Automatically generates code (using LLMs) to perform data transformations.
- Supports iterative visualization creation, allowing users to refine their charts based on previous results.
- Tracks exploration history in a "Data Threads" panel.
- Supports multiple LLM providers: OpenAI, Azure, Ollama, and Anthropic (via LiteLLM).
- Experimental feature: load an image or messy text for AI to parse and clean.
- Data visualization challenges.
What are the technologies used in the project?
- Large Language Models (LLMs): OpenAI (e.g., GPT-4o), Azure, Ollama, Anthropic.
- LiteLLM.
- Python (with PIP package management).
- GitHub Codespaces (for cloud-based development).
What are the benefits of the project?
- Faster and easier data visualization creation.
- Reduced need for manual data manipulation and coding.
- Enables exploration of data beyond its initial structure.
- More intuitive interaction through a combination of UI and NL.
- Iterative refinement of visualizations.
- Support for various LLM backends.
What are the use cases of the project?
- Data analysis and exploration.
- Creating custom data visualizations that require non-trivial data transformations.
- Rapid prototyping of visualizations.
- Educational tool for learning about data visualization and LLMs.
- Situations where users have a concept for a visualization but lack the exact data fields.
