Attention

Applies to: python, general

Attention lets a model weight other positions by relevance when computing a representation.

weights = softmax(Q @ K.T)
out = weights @ V

See also: transformer, embedding