Abstract:Objective To propose an intelligent detection method that balances detection accuracy and speed for the problems of low manual detection efficiency, high missed and false detection rates, and insufficient recognition ability of traditional machine vision algorithms for complex texture packaging and small defects in packaging defect detection at current food production lines.Methods Using Swin Transformer as the core feature extraction module, this study utilizes its modeling ability for global image information and multi-scale feature fusion advantages to accurately capture defect features, such as small wrinkles and printing offsets on the packaging surface. Simultaneously, the YOLOv12 fast detection framework is introduced, which optimizes the neck network and loss function to achieve fast localization and classification of defect areas, forming an integrated detection process of high-precision feature extraction and fast object detection.Results The average detection accuracy of this method for common defect types is higher than 96.50%, improving by over 10.00% compared to the method before optimization. Single image detection takes less than 10 ms, meeting the real-time detection requirement of 30 frames per second for the production line. Additionally, this method still maintains stable performance in tests for different foods, demonstrating significantly better robustness than the comparative method.Conclusion By integrating the advantages of Swin Transformer feature extraction with the fast detection capability of YOLOv12, this study solves the core problem of balancing accuracy and speed in food packaging defect detection.