Transformer

Applies to: python, general

A transformer is a neural architecture built around attention, feed-forward layers, residual connections, and normalization.

tokens -> attention -> MLP -> logits

See also: attention, embedding, tokenization