Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet

Posted on Mon, May 24, 2021 TLDR논문리뷰 CV

arXiv link, Cite.GG link

Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet

The strong performance of vision transformers on image classification and other vision tasks is often attributed to the design of their multi-head attention layers. However, the extent to which attention is responsible for this strong performance remains unclear. In this short report, we ask: is the attention layer even necessary?

Cite.GG

TL;DR