GitHub

Site RAG Project Description

What is the project about?

Site RAG is a Chrome extension that allows users to ask questions about websites using a Retrieval-Augmented Generation (RAG) approach.

What problem does it solve?

It enables users to quickly get answers to questions based on the content of a website, either a single page or an entire site, without manually searching through the information. It also allows for persistent indexing of websites for repeated querying.

What are the features of the project?

  • One-off queries on the current page.
  • Indexing of the current page and persisting documents in a vector store for RAG.
  • Indexing of an entire site and persisting documents in a vector store for RAG.
  • 100% local operation within the browser, storing secrets in browser storage.
  • Optional connection to a locally running Ollama instance for local LLM inference.
  • "Multi query mode" generates multiple queries for a more comprehensive search.
  • Support for follow-up questions, maintaining context from previous interactions.
  • "Context stuff mode" includes the entire content of the current page in the system prompt.
  • Support for multiple LLM providers.

What are the technologies used in the project?

  • Language: Likely JavaScript (based on yarn install, yarn build, and file extensions like .ts).
  • Framework/Libraries: LangChain (implied by langchain.chat_models_universal.initChatModel), and other dependencies managed by yarn.
  • Database: Supabase (PostgreSQL with pgvector extension) for vector storage.
  • LLM Providers: Anthropic, OpenAI, Google GenAI, Together AI, and potentially others (extensible).
  • Web Scraping: FireCrawl API.
  • Chrome Extension API: For building the browser extension.

What are the benefits of the project?

  • Efficient Information Retrieval: Quickly find answers within websites.
  • Local Processing: Data and processing stay within the user's browser.
  • Flexibility: Supports various LLMs and indexing options.
  • Persistence: Indexed data can be stored for later use.
  • Contextual Awareness: Maintains context in follow-up questions.

What are the use cases of the project?

  • Researching information on a specific website.
  • Quickly finding answers to questions while browsing.
  • Creating a knowledge base from a website for repeated querying.
  • Comparing information across multiple pages or websites.
  • Summarizing the content of web pages or entire sites.
site-rag screenshot