Abstract: Vision Transformer (ViT) has gained increasing attention in the computer vision community in recent years. How-ever, the core component of ViT, Self-Attention, lacks ex-plicit spatial priors ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results