Token

Technical

The basic unit of text that LLMs process. Text is split into tokens (subwords) by a tokenizer, and the model predicts the next token in a sequence.

Explained at 5 levels

👶5 Year Old

A tiny piece of a word that the AI reads — like breaking "butterfly" into "butter" and "fly".

📚Middle Schooler

The small chunks that AI breaks text into before reading it — usually parts of words. A sentence might be 10–20 tokens.

🎓College Student

The basic unit of text that LLMs process. Text is split into tokens (subwords) by a tokenizer, and the model predicts the next token in a sequence.

🧑Adult

A subword unit produced by a tokenizer (e.g., BPE or SentencePiece) that maps text to integer IDs consumed by the model. Context window size, cost, and latency all scale with token count.

🧠Genius

A discrete symbol from a finite vocabulary constructed via byte-pair encoding or unigram language modeling, serving as the atomic unit of the autoregressive factorization P(x₁,...,xₙ) = ∏P(xᵢ|x<ᵢ).

Want to explore Token in depth?

Ask SeekBox and get answers from 7 AI engines at once.

Try it in SeekBox →