A neural network architecture that uses self-attention mechanisms to process sequential data in parallel, forming the foundation of most modern LLMs.
The special design inside modern AI that lets it pay attention to all parts of a sentence at once โ like reading a whole page instead of one word at a time.
The type of AI architecture behind ChatGPT, Claude, and other modern AI. It's really good at understanding the relationships between words in a sentence.
A neural network architecture that uses self-attention mechanisms to process sequential data in parallel, forming the foundation of most modern LLMs.
The dominant sequence modeling architecture based on multi-head self-attention and position-wise feed-forward layers, enabling parallel computation and capturing long-range dependencies more effectively than RNNs.
An architecture employing scaled dot-product attention over queries, keys, and values with multi-head projections, achieving O(nยฒd) complexity per layer โ foundational to the scaling hypothesis and emergent capability literature.
Want to explore Transformer in depth?
Ask SeekBox and get answers from 7 AI engines at once.
Try it in SeekBox โ