We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
CSFNet: a compact and efficient convolution-transformer hybrid vision model.
- Authors
Feng, Jian; Wu, Peng; Xu, Renjie; Zhang, Xiaoming; Wang, Tao; Li, Xuan
- Abstract
The Vision Transformer (ViT) has demonstrated impressive performance in various visual tasks, but its high computational requirements limit its applicability on edge devices. Conversely, convolutional neural networks (CNNs) are commonly used in mobile applications, but their static and weak global properties hinder their performance. In this work, we propose a lightweight, high-density predictive classification hybrid-based model called CSFNet, which combines good local inductive bias capability with long-distance modeling property. To establish local-global information association, we introduce two layered structures. Firstly, we use the Local-Attention Block (LAB) with adaptive kernels and channel expansion ratio to aggregate n × n local information layer by layer, capturing multi-stage detail features and inducing efficient local inductive properties. Secondly, we introduce a linear complexity Channel-Spatial Fusion Attention (CSFA) that projects the attention matrix from both channel and tokens dimensions. The relationships between tokens are aggregated stage by stage to encode efficient contextual association information using low-rank matrix and element-by-element operations to reduce computational complexity. Experimental results demonstrate that our proposed CSFNet-XXS/XS/S models with 1.4M/2.4M/5.6M parameters and 0.3G/0.5G/1.1G multiply-adds (MAdds) achieve 70.23%/74.91%/78.82% top-1 accuracy on ImageNet-1k with competitive performance compared to recent mainstream methods. Furthermore, CSFNet performs well on small-scale datasets, MS-COCO2017 and ADE-20K.
- Subjects
TRANSFORMER models; CONVOLUTIONAL neural networks; IMAGE recognition (Computer vision); LOW-rank matrices; MOBILE apps
- Publication
Multimedia Tools & Applications, 2024, Vol 83, Issue 29, p72679
- ISSN
1380-7501
- Publication type
Article
- DOI
10.1007/s11042-024-18417-3