Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Published in ICML, 2023

Recommended citation: Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Joshua M Susskind https://arxiv.org/abs/2303.06296

Download paper here

Recommended citation: Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Joshua M Susskind