To find out the most sensitive features, the ReliefF algorithm was used due to its advantages of dealing with multi-class classification problem and having no restrictions on the data types. A sample
R was randomly taken from the training sample set each time, and then
k nearest neighbor samples (near Hits) of
R were found from the sample set of the same class as
R, and
k nearest neighbor samples (near Misses) were found from each sample set of different classes of
R. The input features were ranked according to the weights from large to small. Then the correlation analysis was carried out for each feature and the combination with the smallest correlation coefficient was selected as the best combination for model construction. The superiority and efficiency have been illustrated in remote sensing-based classification and object recognition
[20-23]. Consequently, it was adopted to perform the feature selection, which gave different weights to the features in terms of the correlations between features and various disease samples
[24]. Specifically, according to the ReliefF algorithm, all the VI variables were sorted in descending order of weight, and eight VIs were selected with the weight of 0.075 as the threshold value. Then, the correlation analysis among the selected features were conducted. When the correlation coefficient (
r) of the feature owing the highest weight that was greater than 0.9, it was eliminated, and then that of the second highest weight with a high
r was eliminated, and so on. In addition, there was a close relationship between the disease incidence and meteorological factors such as temperature, precipitation, humidity, etc. The changes of VIs calculated at different growth stages also affected the sensitivity to the disease. Considering the temporal features and accumulative effect of temperature, three VIs were finally selected, namely the SAVI on 26 May 2014 and the SIPI and EVI on 17 May 2014.