Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Published in ICML, 2023

Recommended citation: Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Joshua M Susskind https://arxiv.org/abs/2303.06296

Download paper here

Recommended citation:

@inproceedings{zhai2023sigmareparam,
  title={Stabilizing Transformer Training by Preventing Attention Entropy Collapse},
  author={Zhai, Shuangfei and Likhomanenko, Tatiana and Littwin, Etai and Busbridge, Dan and Ramapuram, Jason and Zhang, Yizhe and Gu, Jiatao and Susskind, Joshua M},
  booktitle={Proceedings of the International Conference on Machine Learning (ICML)},
  year={2023}
}