Grok-1 Project Description

What is the project about?

The project provides example code in JAX for loading and running the Grok-1 open-weights large language model.

What problem does it solve?

It allows users to test and experiment with the Grok-1 model, a large language model with 314 billion parameters. It provides a starting point for utilizing the model's weights.

What are the features of the project?

Loads and runs the Grok-1 model.
Samples from the model on a test input.
Mixture of 8 Experts (MoE) architecture.
Utilizes 2 experts per token.
64 Layers.
48 attention heads for queries, 8 for keys/values.
Embedding size of 6,144.
SentencePiece tokenizer with 131,072 tokens.
Rotary embeddings (RoPE).
Supports activation sharding and 8-bit quantization.
Maximum sequence length (context) of 8,192 tokens.

What are the technologies used in the project?

JAX
Python
SentencePiece (for tokenization)
Torrent or HuggingFace Hub (for downloading weights)

What are the benefits of the project?

Provides access to the open-weights Grok-1 model.
Allows for experimentation and validation of the model.
Open-source (Apache 2.0 license).

What are the use cases of the project?

Testing and validating the Grok-1 model.
Research and development using the Grok-1 architecture.
Potentially building applications on top of Grok-1 (though the provided implementation is not optimized for performance).
Experimenting with large language models.