Artificial Intelligence

GGUF

A file format for storing quantized language models designed for efficient CPU inference. GGUF is the standard format used by llama.cpp and is popular for local LLM deployment.

Why It Matters

GGUF made running LLMs on consumer hardware practical. It is why enthusiasts can run 70B-parameter models on a gaming laptop.

Example

Downloading a GGUF-quantized version of Llama 3 that runs on a MacBook with 32GB RAM, processing queries locally without any cloud API.

Think of it like...

Like MP3 compression for music — it makes large files small enough to use on consumer devices while preserving most of the quality.

Related Terms