Artificial Intelligence

Tokenizer

A component that converts raw text into tokens (numerical representations) that a language model can process. Different tokenizers split text differently, affecting model performance and efficiency.

Why It Matters

The tokenizer determines how efficiently a model processes text and directly impacts costs, as API pricing is based on token count.

Example

The word 'unhappiness' might be split into ['un', 'happiness'] by one tokenizer or ['un', 'happ', 'iness'] by another, each affecting how the model processes it.

Think of it like...

Like breaking chocolate into pieces before sharing — the size and shape of the pieces determine how many you get and how easy they are to work with.

Related Terms