STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking
What is the project about?
STORM is an LLM system that automatically writes Wikipedia-like articles from scratch, complete with citations, by conducting Internet-based research. Co-STORM extends this by enabling human-AI collaboration in the knowledge curation process.
What problem does it solve?
STORM addresses the challenge of automatically generating comprehensive, well-sourced articles on a given topic. It automates the research process, which is often time-consuming and requires significant effort. Co-STORM addresses the need for human oversight and alignment in the information-gathering and synthesis process.
What are the features of the project?
- Automated Research: Conducts Internet-based research to gather relevant information and references.
- Outline Generation: Creates a structured outline to organize the collected knowledge.
- Article Writing: Generates a full-length article with citations, based on the outline and references.
- Perspective-Guided Question Asking: Discovers different perspectives on a topic to guide the question-asking process.
- Simulated Conversation: Simulates a conversation between a writer and a topic expert to improve understanding and ask follow-up questions.
- Collaborative Discourse (Co-STORM): Supports human-AI collaboration through a turn-based discourse protocol.
- Mind Map (Co-STORM): Maintains a dynamically updated mind map to visualize the conceptual space.
- Modular Design: Implemented in a modular way using
dspy
, allowing for customization. - API Support: Provides an API for integrating various language models and retrieval/search engines (Litellm, YouRM, BingSearch, VectorRM, etc.).
- Human in the loop (Co-STORM): Allows users to participate in the process.
What are the technologies used in the project?
- Large Language Models (LLMs): Uses LLMs (configurable, with examples using GPT-3.5 and GPT-4o) for question asking, answer synthesis, outline generation, and article writing.
- Retrieval Modules (RMs): Integrates with various search engines and vector databases (YouRM, BingSearch, VectorRM, SerperRM, BraveRM, SearXNG, DuckDuckGoSearchRM, TavilySearchRM, GoogleSearch, AzureAISearch).
- Python: The primary programming language.
dspy
: A framework for programming with foundation models.litellm
: For integrating various language and embedding models.- Hugging Face Datasets: Used for providing datasets (FreshWiki, WildSeek).
What are the benefits of the project?
- Automation: Automates the tedious process of researching and writing articles.
- Comprehensive Coverage: Gathers information from multiple perspectives, leading to more thorough articles.
- Verifiability: Includes citations to support the generated content.
- Customizability: Allows users to customize different components of the pipeline.
- Collaboration (Co-STORM): Enables human users to guide and refine the knowledge curation process.
- Efficiency: Reduces the time and effort required for knowledge exploration and article creation.
What are the use cases of the project?
- Assisting Wikipedia editors: Helps in the pre-writing stage by providing a starting point for article creation.
- Knowledge exploration: Facilitates in-depth research on a wide range of topics.
- Content generation: Automates the creation of informative articles for various purposes.
- Educational tool: Can be used for learning and understanding complex topics.
- Information seeking: Supports complex information-seeking tasks.
- Collaborative knowledge curation: Human and AI work together to build a shared understanding.
