A method based on the improved generative adversarial networks (GAN) and vision transformer is introduced to address the issues of imbalanced fault samples compromising diagnostic performance in diagnosing gear faults. A least squares loss function is applied to enhance GAN's ability to learn the distribution characteristics of limited normalized 1D fault data, enabling the generation of high-quality and sufficient fault samples. Vision transformer is used to extract global fault features, further improving model accuracy and performance. Time-frequency (TF) feature maps, generated through wavelet transform, serve as inputs to complete the training of the diagnosis model. Finally, two experiments are conducted to validate the proposed method. Results indicate that the proposed strategy effectively addresses imbalanced sample issues, with improved model stability and diagnosis accuracy compared with conventional approaches, achieving at least 25 % higher accuracy under extremely imbalanced conditions.