GitHub

STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking

What is the project about?

STORM is an LLM system that automatically writes Wikipedia-like articles from scratch, complete with citations, by conducting Internet-based research. Co-STORM extends this by enabling human-AI collaboration in the knowledge curation process.

What problem does it solve?

STORM addresses the challenge of automatically generating comprehensive, well-sourced articles on a given topic. It automates the research process, which is often time-consuming and requires significant effort. Co-STORM addresses the need for human oversight and alignment in the information-gathering and synthesis process.

What are the features of the project?

  • Automated Research: Conducts Internet-based research to gather relevant information and references.
  • Outline Generation: Creates a structured outline to organize the collected knowledge.
  • Article Writing: Generates a full-length article with citations, based on the outline and references.
  • Perspective-Guided Question Asking: Discovers different perspectives on a topic to guide the question-asking process.
  • Simulated Conversation: Simulates a conversation between a writer and a topic expert to improve understanding and ask follow-up questions.
  • Collaborative Discourse (Co-STORM): Supports human-AI collaboration through a turn-based discourse protocol.
  • Mind Map (Co-STORM): Maintains a dynamically updated mind map to visualize the conceptual space.
  • Modular Design: Implemented in a modular way using dspy, allowing for customization.
  • API Support: Provides an API for integrating various language models and retrieval/search engines (Litellm, YouRM, BingSearch, VectorRM, etc.).
  • Human in the loop (Co-STORM): Allows users to participate in the process.

What are the technologies used in the project?

  • Large Language Models (LLMs): Uses LLMs (configurable, with examples using GPT-3.5 and GPT-4o) for question asking, answer synthesis, outline generation, and article writing.
  • Retrieval Modules (RMs): Integrates with various search engines and vector databases (YouRM, BingSearch, VectorRM, SerperRM, BraveRM, SearXNG, DuckDuckGoSearchRM, TavilySearchRM, GoogleSearch, AzureAISearch).
  • Python: The primary programming language.
  • dspy: A framework for programming with foundation models.
  • litellm: For integrating various language and embedding models.
  • Hugging Face Datasets: Used for providing datasets (FreshWiki, WildSeek).

What are the benefits of the project?

  • Automation: Automates the tedious process of researching and writing articles.
  • Comprehensive Coverage: Gathers information from multiple perspectives, leading to more thorough articles.
  • Verifiability: Includes citations to support the generated content.
  • Customizability: Allows users to customize different components of the pipeline.
  • Collaboration (Co-STORM): Enables human users to guide and refine the knowledge curation process.
  • Efficiency: Reduces the time and effort required for knowledge exploration and article creation.

What are the use cases of the project?

  • Assisting Wikipedia editors: Helps in the pre-writing stage by providing a starting point for article creation.
  • Knowledge exploration: Facilitates in-depth research on a wide range of topics.
  • Content generation: Automates the creation of informative articles for various purposes.
  • Educational tool: Can be used for learning and understanding complex topics.
  • Information seeking: Supports complex information-seeking tasks.
  • Collaborative knowledge curation: Human and AI work together to build a shared understanding.
storm screenshot