Maxun: Open-Source No-Code Web Data Extraction Platform

What is the project about?

Maxun is a platform that allows users to extract data from websites without writing any code. It lets you train a "robot" to scrape web data automatically.

What problem does it solve?

It simplifies web scraping, making it accessible to users without programming skills. It automates the process of extracting data from websites, saving time and effort. It deals with common web scraping challenges like pagination and scheduling.

What are the features of the project?

No-Code Data Extraction: Extract data visually without coding.
Pagination & Scrolling Handling: Automatically handles websites with multiple pages or infinite scrolling.
Scheduled Runs: Run robots on a specific schedule to extract data regularly.
Website to API: Effectively turn websites into APIs by extracting data in a structured way.
Website to Spreadsheet: Export extracted data to spreadsheets (Google Sheets integration available).
Adapt to Website Layout Changes: (Coming soon) Robots will be able to adapt to changes in website structure.
Extract Behind Login: (Coming soon) Support for scraping data behind login forms, including two-factor authentication.
Integrations: Google Sheet integration.
BYOP (Bring Your Own Proxy): Connect external proxies to bypass anti-bot protection.
Robot Actions: Capture List, Capture Text, Capture Screenshot.

What are the technologies used in the project?

Node.js
PostgreSQL
MinIO (for storing screenshots)
Redis (for scheduling using BullMQ)
Playwright (for browser automation)
Docker Compose (for simplified deployment)

What are the benefits of the project?

Accessibility: Makes web scraping accessible to non-programmers.
Automation: Automates data extraction, saving time and effort.
Scalability: Can be self-hosted or used via a managed cloud service (coming soon) for larger-scale data extraction.
Flexibility: Offers various extraction methods (list, text, screenshot) and supports proxies.
Open Source: The project is open-source (AGPLv3 license), allowing for community contributions and customization.

What are the use cases of the project?

E-commerce: Scraping product information (prices, descriptions, etc.) from online stores.
Market Research: Gathering data for competitive analysis or market trends.
Lead Generation: Extracting contact information from websites.
Data Aggregation: Collecting data from multiple sources for analysis or reporting.
Content Monitoring: Tracking changes on websites (e.g., price changes, news updates).
Any task requiring regular extraction of data from websites.