Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Published in ICML, 2023
Recommended citation: Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Joshua M Susskind https://arxiv.org/abs/2303.06296
Recommended citation:
@inproceedings{zhai2023sigmareparam,
title={Stabilizing Transformer Training by Preventing Attention Entropy Collapse},
author={Zhai, Shuangfei and Likhomanenko, Tatiana and Littwin, Etai and Busbridge, Dan and Ramapuram, Jason and Zhang, Yizhe and Gu, Jiatao and Susskind, Joshua M},
booktitle={Proceedings of the International Conference on Machine Learning (ICML)},
year={2023}
}
