Julian Henry

Attention is All You Need

03 Sep 2024

This paper changed the world. Given its impact in LLM development, the transformer with attention is an e=m*c^2 of our era.

viz Visualization paper Full original paper. formula attention formula. softmax is exponential max

multi-head-attention key formula 1 scaled-dot key formula 2

query-vector key vector space, viz. query vector space index Hand-wavey reason for sinusoidal index. Infinite length