STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking

What is the project about?

STORM is an LLM system that automatically writes Wikipedia-like articles from scratch, complete with citations, by conducting Internet-based research. Co-STORM extends this by enabling human-AI collaboration in the knowledge curation process.

What problem does it solve?

STORM addresses the challenge of automatically generating comprehensive, well-sourced articles on a given topic. It automates the research process, which is often time-consuming and requires significant effort. Co-STORM addresses the need for human oversight and alignment in the information-gathering and synthesis process.

What are the features of the project?

Automated Research: Conducts Internet-based research to gather relevant information and references.
Outline Generation: Creates a structured outline to organize the collected knowledge.
Article Writing: Generates a full-length article with citations, based on the outline and references.
Perspective-Guided Question Asking: Discovers different perspectives on a topic to guide the question-asking process.
Simulated Conversation: Simulates a conversation between a writer and a topic expert to improve understanding and ask follow-up questions.
Collaborative Discourse (Co-STORM): Supports human-AI collaboration through a turn-based discourse protocol.
Mind Map (Co-STORM): Maintains a dynamically updated mind map to visualize the conceptual space.
Modular Design: Implemented in a modular way using dspy, allowing for customization.
API Support: Provides an API for integrating various language models and retrieval/search engines (Litellm, YouRM, BingSearch, VectorRM, etc.).
Human in the loop (Co-STORM): Allows users to participate in the process.

What are the technologies used in the project?

Large Language Models (LLMs): Uses LLMs (configurable, with examples using GPT-3.5 and GPT-4o) for question asking, answer synthesis, outline generation, and article writing.
Retrieval Modules (RMs): Integrates with various search engines and vector databases (YouRM, BingSearch, VectorRM, SerperRM, BraveRM, SearXNG, DuckDuckGoSearchRM, TavilySearchRM, GoogleSearch, AzureAISearch).
Python: The primary programming language.
dspy: A framework for programming with foundation models.
litellm: For integrating various language and embedding models.
Hugging Face Datasets: Used for providing datasets (FreshWiki, WildSeek).

What are the benefits of the project?

Automation: Automates the tedious process of researching and writing articles.
Comprehensive Coverage: Gathers information from multiple perspectives, leading to more thorough articles.
Verifiability: Includes citations to support the generated content.
Customizability: Allows users to customize different components of the pipeline.
Collaboration (Co-STORM): Enables human users to guide and refine the knowledge curation process.
Efficiency: Reduces the time and effort required for knowledge exploration and article creation.

What are the use cases of the project?

Assisting Wikipedia editors: Helps in the pre-writing stage by providing a starting point for article creation.
Knowledge exploration: Facilitates in-depth research on a wide range of topics.
Content generation: Automates the creation of informative articles for various purposes.
Educational tool: Can be used for learning and understanding complex topics.
Information seeking: Supports complex information-seeking tasks.
Collaborative knowledge curation: Human and AI work together to build a shared understanding.