HivisionIDPhoto

Project Description

HivisionIDPhoto is a project focused on creating a practical and systematic AI-powered ID photo generation tool. It leverages a pipeline of AI models to recognize various user photo scenarios, perform background removal (matting), and generate standard ID photos.

What is the project about?

The project is about developing an intelligent system for creating ID photos from user-submitted images. It automates the process of background removal, resizing, and formatting to meet specific ID photo requirements.

What problem does it solve?

It solves the problem of easily and quickly creating compliant ID photos from casual photos, without needing professional photography equipment or editing skills. It addresses the need for on-demand ID photo generation, especially in urgent situations.

What are the features of the project?

Lightweight Matting: Performs background removal efficiently, even on CPUs (offline).
Standard ID Photo Generation: Creates ID photos conforming to different size specifications.
Layout Photo Generation: Produces print-ready layouts (e.g., 6-inch photo paper layouts) with multiple ID photos.
Flexible Inference: Supports both purely offline (CPU-based) and client-server inference.
Beautification: Includes a beautification feature to enhance the appearance of the subject.
Smart Formal Attire Change: (Future feature) Intends to add the ability to digitally change the subject's clothing to formal attire.
Face Alignment: Rotate the face to align.
Custom Background Color: Support custom background color by HEX input.
Print Layout: Support five layout sizes: 6-inch, 5-inch, A4, 3R, and 4R.
Beauty Parameters: Add beauty parameters to the API interface.
DPI and Face Alignment Parameters: Add DPI and face alignment parameters to the API interface.
Base64 Image Input: Add base64 image input option to the API interface.

What are the technologies used in the project?

Python: The primary programming language.
ONNX Runtime: Used for running the matting models (MODNet, hivision_modnet, rmbg-1.4, birefnet-v1-lite).
Gradio: Used for creating the interactive web demo.
Docker: Used for containerization and deployment.
Face Detection Models: MTCNN (default, lightweight), RetinaFace (optional, more accurate), Face++ (optional, online API).
CUDA/cuDNN (Optional): For GPU acceleration of the birefnet-v1-lite model.
SwanLab: A deep learning training tracking and visualization tool.

What are the benefits of the project?

Convenience: Users can create ID photos anytime, anywhere.
Speed: The process is fast, especially with CPU-based matting.
Cost-Effectiveness: Eliminates the need for professional photo services.
Flexibility: Supports various ID photo sizes and background colors.
Accessibility: Can be used offline, making it accessible even without an internet connection.
Ease of Use: The Gradio demo provides a user-friendly interface.
Open Source: Allows for community contributions and extensions.

What are the use cases of the project?

Generating ID photos for official documents: Passports, driver's licenses, visas, etc.
Creating photos for online profiles: Social media, job applications, school registrations.
Emergency ID photo needs: When a quick and readily available solution is required.
Printing ID photos at home: Using the layout photo generation feature.
Developing custom applications: The API allows integration into other software or services.
Community-built applications: WeChat mini-programs, ComfyUI workflows, web applications, and Windows GUI applications.