“Attention is All You Need” demands attention because of its profound impact. Given its impact in LLM development, the transformer with attention is a bona fide e=m*c^2
of our time.
At the moment, I am unable to do the topic more justice then the attached resources.
Good luck, and I hope your attention mechanism finds the right weights to understand the topic!