Bend

Project Description

What is the project about?

Bend is a high-level, massively parallel programming language designed for efficient execution on parallel hardware like GPUs. It combines the expressiveness of languages like Python and Haskell with the scalability of CUDA, achieving near-linear acceleration with increased core count.

What problem does it solve?

Bend addresses the challenge of writing parallel programs that can efficiently utilize massively parallel hardware (like GPUs) without requiring explicit parallelism annotations (e.g., threads, locks, mutexes). It simplifies parallel programming by automatically parallelizing code where possible. It solves the problem of needing to write low-level CUDA or similar code to get the benefits of GPU acceleration.

What are the features of the project?

High-level syntax: Similar to Python and Haskell, making it easy to write and understand.
Massively parallel execution: Runs on GPUs with near-linear speedup based on core count.
Automatic parallelization: No need for explicit thread creation, locks, or mutexes.
Functional programming features: Supports higher-order functions, closures, recursion, and continuations.
Fast object allocations.
Multiple runtimes: Can be run using C, Rust, or CUDA interpreters.
Standalone compilation: Can be compiled to C/CUDA files.

What are the technologies used in the project?

HVM2: The runtime environment powering Bend.
Rust: Used for the sequential interpreter and likely for the compiler infrastructure.
C: Used for one of the parallel interpreters and as a compilation target.
CUDA: Used for the massively parallel interpreter and as a compilation target (for NVIDIA GPUs).
GCC: C compiler.

What are the benefits of the project?

Simplified parallel programming: Developers can write high-level code without worrying about low-level parallelization details.
High performance: Achieves significant speedups on parallel hardware.
Scalability: Designed to scale with the number of cores.
Expressiveness: Offers the ease of use of high-level languages.
Portability (with limitations): Supports multiple runtimes (C, Rust, CUDA), although Windows support is still under development.

What are the use cases of the project?

Data-parallel algorithms: Algorithms that can be broken down into independent computations on different data elements (e.g., image processing, simulations, machine learning).
Divide-and-conquer algorithms: Problems that can be solved by recursively breaking them down into smaller subproblems (e.g., sorting, searching).
Scientific computing: Applications requiring high computational power.
Any application benefiting from GPU acceleration: Where performance is critical and the problem can be parallelized.
Bitonic Sorter