What is Attention in LLMs? Why are large language models so powerful