GitHub

Camoufox: Stealthy Firefox for Web Scraping

What is the project about?

Camoufox is a customized, minimalistic build of Firefox specifically designed for web scraping. It focuses on avoiding bot detection and providing robust fingerprint injection and rotation.

What problem does it solve?

Camoufox addresses the challenge of websites detecting and blocking web scraping bots. It does this by making the browser appear more human-like and less like an automated tool, bypassing anti-bot systems and WAFs (Web Application Firewalls). It solves the problem of getting blocked while scraping data from websites.

What are the features of the project?

  • Anti-Bot Evasion: Outperforms many commercial anti-bot browsers in avoiding detection.
  • Comprehensive Fingerprint Injection & Rotation: Spoofs a wide range of browser and device characteristics without relying on detectable JavaScript injection. This includes:
    • Navigator properties (User-Agent, OS, hardware, etc.)
    • Screen and window properties
    • Geolocation, timezone, locale
    • Fonts
    • WebGL parameters
    • WebRTC IP addresses (at the protocol level)
    • Media devices, voices
    • AudioContext parameters
    • Battery Status
  • Quality of Life Features:
    • Human-like cursor movement.
    • Ad blocking and circumvention.
    • Disables CSS animations for speed.
  • Optimized Performance: Debloated and optimized for memory efficiency.
  • Python Package: Provides a PyPi package (camoufox) for easy updates and automatic fingerprint injection.
  • Up-to-Date: Keeps pace with the latest Firefox versions.
  • Playwright Integration: Offers a custom, stealth-enhanced implementation of Playwright for Firefox. The Playwright integration is a key feature, making it easy to use with existing Playwright scripts.
  • Addon Support: Supports Firefox addons, and includes uBlock Origin and B.P.C. by default.
  • Font Metric Fingerprinting Prevention: Randomly offsets letter spacing to prevent font metric fingerprinting.

What are the technologies used in the project?

  • Firefox (C++, JavaScript): The core browser is a modified version of Firefox. Patches are applied at the C++ level for maximum stealth.
  • Python: A Python library (camoufox) provides a user-friendly interface for controlling the browser and injecting fingerprints. This is the primary way users interact with Camoufox.
  • Playwright: A customized version of Playwright (a Node.js library for browser automation) is integrated.
  • Juggler: A patched version of Puppeteer's Juggler is used for Playwright support.
  • Docker: The project provides a Dockerfile for building Camoufox in a consistent environment, regardless of the host OS.
  • Make: The build system uses Makefiles.

What are the benefits of the project?

  • Improved Scraping Success: Higher success rates when scraping websites due to advanced anti-detection capabilities.
  • Reduced Blocking: Minimizes the risk of being blocked by websites.
  • Stealth: Fingerprint spoofing is done at a low level (C++), making it very difficult for websites to detect.
  • Ease of Use: The Python interface simplifies configuration and usage.
  • Performance: Faster and more memory-efficient than standard Firefox.
  • Open Source: Transparent and customizable.
  • Maintainability: The PyPi package and regular updates ensure the project stays effective against evolving anti-bot techniques.

What are the use cases of the project?

  • Web Scraping: The primary use case is extracting data from websites that employ anti-bot measures.
  • Data Collection: Gathering data for research, analysis, or other purposes.
  • Price Monitoring: Tracking prices on e-commerce sites.
  • Competitive Analysis: Gathering intelligence on competitors.
  • Automation: Automating tasks on websites that would normally require a human user.
  • Testing: Testing website security and anti-bot systems (with permission).
  • Bypassing CAPTCHAs: The project mentions high scores on reCAPTCHA tests, suggesting it can be used to bypass CAPTCHAs.

In summary, Camoufox is a powerful, open-source tool for developers and researchers who need a reliable and stealthy way to scrape data from websites that actively try to prevent automated access. It combines a heavily modified Firefox build with a convenient Python interface and Playwright integration to provide a robust and user-friendly solution.

Camoufox screenshot