Artificial Intelligence

Tokenizer

A component that converts raw text into tokens (numerical representations) that a language model can process. Different tokenizers split text differently, affecting model performance and efficiency.

Why It Matters

The tokenizer determines how efficiently a model processes text and directly impacts costs, as API pricing is based on token count.

Example

The word 'unhappiness' might be split into ['un', 'happiness'] by one tokenizer or ['un', 'happ', 'iness'] by another, each affecting how the model processes it.

Think of it like...

Like breaking chocolate into pieces before sharing — the size and shape of the pieces determine how many you get and how easy they are to work with.

Tokenizer

Why It Matters

Example

Think of it like...

Related Terms

Token

Byte-Pair Encoding

Context Window