Posted by Mingxing Tan and Zihang Dai, Research Scientists, Google Research As neural network models and training data size grow, training efficiency is becoming an important focus for deep learning. For example, GPT-3 demonstrates remarkable capability in few-shot learning, but…