Efficient Training for GPU Memory using Transformers