This paper changed the world. Given its impact in LLM development,
the transformer with attention is an e=m*c^2
of our era.
viz Visualization paper Full original paper. formula attention formula. softmax is exponential max
multi-head-attention key formula 1 scaled-dot key formula 2
query-vector key vector space, viz. query vector space index Hand-wavey reason for sinusoidal index. Infinite length