Xiang, Jian-Wen; Chen, Min-Rong; Li, Pei-Shan; Zou, Hao-Li; Li, Shi-Da; Huang, Jun-Jie

doi:10.1007/s00521-022-08067-7

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: TransMCGC: a recast vision transformer for small-scale image classification tasks.
Authors: Xiang, Jian-Wen; Chen, Min-Rong; Li, Pei-Shan; Zou, Hao-Li; Li, Shi-Da; Huang, Jun-Jie
Abstract: Multi-stage hierarchical structure is a basic and effective design pattern in convolution neural networks (CNNs). Recently, Vision Transformers (ViTs) have achieved impressive performance as a new architecture for various vision tasks. However, many unknown properties of ViTs need to be further explored. In this paper, we empirically find that despite having no explicit multi-stage hierarchical design like CNNs, ViT models are able to automatically organize layers into stages (or blockgroups) to gradually extract different levels of feature information. Moreover, ViT models organize more highly similar Transformer blocks in the last stage, where the multi-head self-attention becomes less effective to learn useful concepts for feature learning and thus may limit the model to get the expected performance gain. To this end, we further recast a new ViT framework, named TransMCGC, replacing the inefficient Transformer blocks in the last stage of Vision Transformer with the proposed convolutional operation MCGC blocks. The MCGC block consists of two sub-modules in parallel: Multi-branch Convolution module to integrate local neighborhood features and multi-scale context information, and Global Context module to capture global dependencies with negligible parameters. In this way, the proposed MCGC block integrates collaboratively convolution locality and global dependencies to enhance the feature learning ability of the model. Finally, extensive experiments on six standard small-scale benchmark datasets, including CIFAR10, CIFAR100, Stanford Cars, Oxford102flowers, DTD and Food101, demonstrate the effectiveness of the proposed MCGC block and indicate that our TransMCGC models achieve better performance over baseline model ViT, while achieving competitive performance compared to state-of-the-art ViT variants.
Subjects: CONVOLUTIONAL neural networks; IMAGE recognition (Computer vision); CONCEPT learning; LEARNING ability
Publication: Neural Computing & Applications, 2023, Vol 35, Issue 10, p7697
ISSN: 0941-0643
Publication type: Article
DOI: 10.1007/s00521-022-08067-7

We found a match

TransMCGC: a recast vision transformer for small-scale image classification tasks.

Xiang, Jian-Wen; Chen, Min-Rong; Li, Pei-Shan; Zou, Hao-Li; Li, Shi-Da; Huang, Jun-Jie

CONVOLUTIONAL neural networks; IMAGE recognition (Computer vision); CONCEPT learning; LEARNING ability

Neural Computing & Applications, 2023, Vol 35, Issue 10, p7697

0941-0643

Article

10.1007/s00521-022-08067-7