SCUDA: GPU-over-IP Project Description

What is the project about?

SCUDA is a "GPU over IP" bridge. It allows GPUs located on remote machines to be used by machines that don't have their own dedicated GPUs. Essentially, it makes a remote GPU appear as if it's locally connected.

What problem does it solve?

SCUDA addresses the problem of limited access to GPU resources. It enables developers and applications to utilize GPUs that are not physically present in their local machine, opening up possibilities for using distributed GPU pools, remote training/inferencing, and local testing with remote GPU acceleration. It removes the need for every machine to have a powerful, dedicated GPU.

What are the features of the project?

Remote GPU Access: The core feature is the ability to use a GPU over a network (TCP) connection.
Unified Memory Support: Demonstrated to work with CUDA's Unified Memory, simplifying memory management between the CPU and GPU.
CUDA Compatibility: Designed to work with CUDA applications, including those using libraries like cuBLAS and cuDNN.
Codegen for RPC: Uses code generation to create the necessary remote procedure calls (RPCs) for interacting with the remote GPU.
Docker Support: Provides Dockerfiles for building and running SCUDA-enabled applications.
Server/Client Architecture: Uses a server component running on the machine with the GPU and a client library (libscuda) that intercepts CUDA calls on the client machine.

What are the technologies used in the project?

CUDA: NVIDIA's parallel computing platform and programming model.
C++: (Implied by the use of CMake and .cu files) The primary programming language.
Python: Used for code generation and potentially for scripting/testing.
CMake: Build system for generating the server and client binaries.
TCP/IP: The networking protocol used for communication between the client and server.
Docker: Containerization technology for deployment and testing.
cuBLAS, cuDNN, NVML: CUDA libraries for linear algebra, deep neural networks, and GPU management, respectively.

What are the benefits of the project?

Increased GPU Accessibility: Makes GPUs more accessible to developers and applications without requiring physical proximity.
Resource Pooling: Enables the creation of shared GPU pools, improving resource utilization.
Flexibility: Allows developers to work on GPU-accelerated tasks from various devices (laptops, low-power machines).
Simplified Development: Potentially simplifies development workflows by allowing local testing and development with remote GPU acceleration.
Scalability: Facilitates scaling of GPU-intensive applications by leveraging distributed GPU resources.

What are the use cases of the project?

Local Testing: Testing CUDA applications locally while using a remote GPU for acceleration.
Aggregated GPU Pools: Centralized management and allocation of GPU resources.
Remote Model Training: Training machine learning models on remote GPUs from a local machine.
Remote Inferencing: Running inference workloads on a remote GPU server.
Remote Data Processing: Performing data processing tasks (filtering, aggregation, etc.) on a remote GPU.
Remote Fine-tuning: Fine-tuning pre-trained models on a remote GPU.