Abstract:[Objective] To realize rapid detection of green tea species identification. [Methods] A rapid detection method based on the combination of electronic tongue and electronic nose combined with CNN-Transformer composite model was proposed. The electronic tongue and electronic nose were used to collect the fingerprint information of taste and smell for five different kinds of green tea. The one-dimensional electronic tongue and electronic nose signals were transformed into two-dimensional time-frequency maps using the short-time Fourier transform (STFT), which fully revealed the distribution characteristics of the signal energy in the time-frequency domain. A CNN-Transformer combination model was proposed to realize the fusion of the electronic tongue and the electronic nose information and pattern recognition. The model adopted selective kernel convolution and normalized attention in designing convolution module to replace the convolution layer of the traditional CNN to achieve the dynamic extraction of local features from the time-frequency map of the signal. The multi-head self-attention mechanism in the Transformer encoder was used to extract the global temporal information in the features of the electronic tongue and the electronic nose and achieve the weighted fusion of their features. Finally, classification recognition was carried out by the fully connected layer. [Results] The information fusion method based on electronic tongue and electronic nose could effectively extract the deep features of the taste and smell signals from green tea samples and provide richer fused feature representations for the model to achieve highly accurate recognition of different species, with a test set accuracy, precision, recall and F1-Score of 99.00%, 99.05%, 99.00%, and 99.00%, respectively. [Conclusion] This study provides a low-cost, fast and efficient detection method for green tea species recognition.