Tokenizer

Level 3

Short Description

The component that splits raw text into tokens (and converts tokens back to text) for a language model.

Friendly Description: A tokenizer is the tool that chops your text into tokens before the model can read it, and stitches the model's tokens back into normal text after. It's like a translator that converts between human writing and the bite-sized pieces the AI thinks in. Different models use different tokenizers, which is one reason the same sentence can use a different number of tokens depending on which model you're using.

Example: If you paste "unbelievable" into a tokenizer, it might split the word into "un," "believ," and "able." The model handles each piece, then the tokenizer stitches the model's response back together so you see clean, readable text on your screen.

→ Back to Dictionary

Tokenizer

Short Description

Stay Updated on AI Advancements