The basic unit of text that LLMs process. Text is split into tokens (subwords) by a tokenizer, and the model predicts the next token in a sequence.
A tiny piece of a word that the AI reads โ like breaking "butterfly" into "butter" and "fly".
The small chunks that AI breaks text into before reading it โ usually parts of words. A sentence might be 10โ20 tokens.
The basic unit of text that LLMs process. Text is split into tokens (subwords) by a tokenizer, and the model predicts the next token in a sequence.
A subword unit produced by a tokenizer (e.g., BPE or SentencePiece) that maps text to integer IDs consumed by the model. Context window size, cost, and latency all scale with token count.
A discrete symbol from a finite vocabulary constructed via byte-pair encoding or unigram language modeling, serving as the atomic unit of the autoregressive factorization P(xโ,...,xโ) = โP(xแตข|x<แตข).
Want to explore Token in depth?
Ask SeekBox and get answers from 7 AI engines at once.
Try it in SeekBox โ