GitHub

Grok-1 Project Description

What is the project about?

The project provides example code in JAX for loading and running the Grok-1 open-weights large language model.

What problem does it solve?

It allows users to test and experiment with the Grok-1 model, a large language model with 314 billion parameters. It provides a starting point for utilizing the model's weights.

What are the features of the project?

  • Loads and runs the Grok-1 model.
  • Samples from the model on a test input.
  • Mixture of 8 Experts (MoE) architecture.
  • Utilizes 2 experts per token.
  • 64 Layers.
  • 48 attention heads for queries, 8 for keys/values.
  • Embedding size of 6,144.
  • SentencePiece tokenizer with 131,072 tokens.
  • Rotary embeddings (RoPE).
  • Supports activation sharding and 8-bit quantization.
  • Maximum sequence length (context) of 8,192 tokens.

What are the technologies used in the project?

  • JAX
  • Python
  • SentencePiece (for tokenization)
  • Torrent or HuggingFace Hub (for downloading weights)

What are the benefits of the project?

  • Provides access to the open-weights Grok-1 model.
  • Allows for experimentation and validation of the model.
  • Open-source (Apache 2.0 license).

What are the use cases of the project?

  • Testing and validating the Grok-1 model.
  • Research and development using the Grok-1 architecture.
  • Potentially building applications on top of Grok-1 (though the provided implementation is not optimized for performance).
  • Experimenting with large language models.
grok-1 screenshot