Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

Publication
In International Conference on Machine Learning
Meng Li
Meng Li
Assistant Professor

I am currently a tenure-track assistant professor jointly affiliated with the Institute for Artificial Intelligence and School of Integrated Circuits in Peking University. My research interests focus on efficient and secure multi-modality AI acceleration algorithms and hardwares.