Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Published in ICML, 2023
Recommended citation: Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Joshua M Susskind https://arxiv.org/abs/2303.06296
Recommended citation: Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Joshua M Susskind
