We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
TransMCGC: a recast vision transformer for small-scale image classification tasks.
- Authors
Xiang, Jian-Wen; Chen, Min-Rong; Li, Pei-Shan; Zou, Hao-Li; Li, Shi-Da; Huang, Jun-Jie
- Abstract
Multi-stage hierarchical structure is a basic and effective design pattern in convolution neural networks (CNNs). Recently, Vision Transformers (ViTs) have achieved impressive performance as a new architecture for various vision tasks. However, many unknown properties of ViTs need to be further explored. In this paper, we empirically find that despite having no explicit multi-stage hierarchical design like CNNs, ViT models are able to automatically organize layers into stages (or blockgroups) to gradually extract different levels of feature information. Moreover, ViT models organize more highly similar Transformer blocks in the last stage, where the multi-head self-attention becomes less effective to learn useful concepts for feature learning and thus may limit the model to get the expected performance gain. To this end, we further recast a new ViT framework, named TransMCGC, replacing the inefficient Transformer blocks in the last stage of Vision Transformer with the proposed convolutional operation MCGC blocks. The MCGC block consists of two sub-modules in parallel: Multi-branch Convolution module to integrate local neighborhood features and multi-scale context information, and Global Context module to capture global dependencies with negligible parameters. In this way, the proposed MCGC block integrates collaboratively convolution locality and global dependencies to enhance the feature learning ability of the model. Finally, extensive experiments on six standard small-scale benchmark datasets, including CIFAR10, CIFAR100, Stanford Cars, Oxford102flowers, DTD and Food101, demonstrate the effectiveness of the proposed MCGC block and indicate that our TransMCGC models achieve better performance over baseline model ViT, while achieving competitive performance compared to state-of-the-art ViT variants.
- Subjects
CONVOLUTIONAL neural networks; IMAGE recognition (Computer vision); CONCEPT learning; LEARNING ability
- Publication
Neural Computing & Applications, 2023, Vol 35, Issue 10, p7697
- ISSN
0941-0643
- Publication type
Article
- DOI
10.1007/s00521-022-08067-7