Due to the capability to extract extensive fine-grained features, deep semantic segmentation can hereby enable end-to-end automated learning and the identification of both high-level semantics and low-level detailed disease features regarding tissue sites
[12-14]. It enables targeted classification while suppressing irrelevant noise and focusing on key semantic features
[15]. The most commonly used models mainly include fully convolutional networks (FCN)
[16], DeepLab
[17], SegNet
[18], U-Net
[19], etc. These models can more accurately diagnose the severity of leaf disease in image data captured under challenging brightness and background conditions
[20, 21]. An increasing number of researches have achieved accurate and efficient disease segmentation by leveraging the strong transferability and high accuracy of these models. HU et al.
[22] developed a lightweight DeepLabV3+ variant for precise segmentation of premium tea buds in automated harvesting, the model integrates MobileNetV2 with an ECA-ASPP (Efficient Channel Attention-Atrous Spatial Pyramid Pooling) fusion module, achieving 93.71% mIoU (mean Intersection over Union) while reducing parameters by 89.4%. ZHOU et al.
[23] enhanced the DeepLabV3+ and introduced a new segmentation model called GS-DeepLabV3+, achieved an mIoU of 87.77% and an average pixel accuracy of 94.55% on a self-constructed leaf disease dataset of oil tea. CHEN et al.
[24] introduced a novel multi-feature fusion module based on the YOLOv7 model to capture local and global dependencies, thereby obtaining more comprehensive feature information of tea bud leaves, achieved a final average detection accuracy of 94.43%. SUN et al.
[25] designed TeaDiseaseNet, which also used a multi-scale self-attention mechanism to enhance disease detection performance. HU et al.
[26] utilized U-Net for the segmentation of diseased spots and proposed an ellipse recovery for occluded or damaged leaves, employing conditional random fields (CRF) optimization. The algorithm achieved an average precision (AP) of 91.22%. In a subsequent study, the initial disease severity (IDS) coefficient was utilized to assess the severity of tea wilt, resulting in enhanced estimation accuracy
[27]. As above, an encoder-decoder-based semantic segmentation is capable of effectively learning to extract feature representations at various sizes and integrate contextual information. Therefore, for multi-size and multi-target leaf blob features within a wide field of view, the segmentation accuracy can be significantly improved by utilizing features from different receptive fields.