Why Minibatch Gradient Descent in Transformers?