Abstract:Objective To establish a non-destructive method for precise identification of mechanical damage in Solanum muricatum fruits.Methods The S. muricatum fruit samples exhibiting varying degrees of damage are induced by free-fall collisions, and then the hyperspectral data of each sample are collected. The effects of four preprocessing methods on the performance of the random forest (RF) classification model are evaluated. The sequential projection algorithm (SPA) and competitive adaptive reweighting algorithm (CARS) are used to extract the feature wavelengths of the preprocessed spectral data. Three machine learning-based classification models-partial least squares-discriminant analysis (PLS-DA), support vector machine (SVM), and random forest-are constructed and compared. The Bayesian optimization (BO) algorithm is employed to optimize the hyperparameters of the best-performing model.Results The model utilizing standard normal variate (SNV) preprocessing achieves the highest classification accuracy, which reaches 78.89%. Further enhancement of classification accuracy is observed through feature wavelength extraction, and the SNV-CARS-RF model attains the best performance, with the accuracy reaching 92.78% on the prediction set. Finally, the BO algorithm is used to optimize four hyperparameters of the SNV-CARS-RF model, increasing the prediction accuracy to 100%.Conclusion The integration of hyperspectral technology with machine learning enables the accurate detection of varying degrees of damage in S. muricatum fruits.