Objective In the global agricultural economy, China's tea industry is of paramount importance, serving as both a treasure of traditional culture and a key driver of rural economic growth. With market expansion, traditional manual tea picking has become incompatible with modernization due to its low efficiency and high costs. The adoption of intelligent tea picking technology is an inevitable trend. Although DeepLabV3+ has demonstrated outstanding performance in the field of semantic segmentation, its application in complex tea garden environments is limited by its large model size and long training cycles. Based on the Xiqing Tea Garden in Hunan province, constructing an image dataset of tea leaves under various conditions to provide a solid data foundation for intelligent tea picking. Building on this, innovative lightweight improvements have been made to the DeepLabV3+ model to ensure high accuracy while reducing resource consumption, enhancing the model's flexibility and efficiency in practical applications. This contributes to the intelligent transformation of the tea industry. Methods The primary technical innovation resided in the amalgamation of a lightweight network architecture, MobilenetV2, with an attention mechanism known as efficient channel attention network (ECANet), alongside optimization modules including atrous spatial pyramid pooling (ASPP). Initially, MobilenetV2 was employed as the feature extractor, substituting traditional convolution operations with depth wise separable convolutions. This led to a notable reduction in the model's parameter count and expedited the model training process. Subsequently, the innovative fusion of ECANet and ASPP modules constituted the ECA_ASPP module, with the intention of bolstering the model's capacity for fusing multi-scale features, especially pertinent to the intricate recognition of tea shoots. This fusion strategy facilitated the model's capability to capture more nuanced features of delicate shoots, thereby augmenting segmentation accuracy. The specific implementation steps entailed the feeding of image inputs through the improved network, whereupon MobilenetV2 was utilized to extract both shallow and deep features. Deep features were then fused via the ECA_ASPP module for the purpose of multi-scale feature integration, reinforcing the model's resilience to intricate backgrounds and variations in tea shoot morphology. Conversely, shallow features proceeded directly to the decoding stage, undergoing channel reduction processing before being integrated with upsampled deep features. This divide-and-conquer strategy effectively harnessed the benefits of features at differing levels of abstraction and, furthermore, heightened the model's recognition performance through meticulous feature fusion. Ultimately, through a sequence of convolutional operations and upsampling procedures, a prediction map congruent in resolution with the original image was generated, enabling the precise demarcation of tea shoot harvesting points. Results and Discussions The experimental outcomes indicated that the enhanced DeepLabV3+ model had achieved an average Intersection over Union (IoU) of 93.71% and an average pixel accuracy of 97.25% on the dataset of tea shoots. Compared to the original model based on Xception, there was a substantial decrease in the parameter count from 54.714 million to a mere 5.818 million, effectively accomplishing a significant lightweight redesign of the model. Further comparisons with other prevalent semantic segmentation networks revealed that the improved model exhibited remarkable advantages concerning pivotal metrics such as the number of parameters, training duration, and average IoU, highlighting its efficacy and precision in the domain of tea shoot recognition. This considerable decreased in parameter numbers not only facilitated a more resource-economical deployment but also led to abbreviated training periods, rendering the model highly suitable for real-time implementations amidst tea garden ecosystems. The elevated mean IoU and pixel accuracy attested to the model's capacity for precise demarcation and identification of tea shoots, even amidst intricate and varied datasets, demonstrating resilience and adaptability in pragmatic contexts. In all, the refined DeepLabV3+ model served as evidence of the potential inherent in innovative lightweight designs to augment both the performance and applicability of deep learning models tailored for specialized undertakings like tea shoot recognition within agricultural applications. Conclusions This study effectively implements an efficient and accurate tea shoot recognition method through targeted model improvements and optimizations, furnishing crucial technical support for the practical application of intelligent tea picking robots. The introduction of lightweight DeepLabV3+ not only substantially enhances recognition speed and segmentation accuracy, but also mitigates hardware requirements, thereby promoting the practical application of intelligent picking technology in the tea industry. In the future, with the continuous evolution of deep learning algorithms and the iterative upgrading of hardware facilities, it is anticipated that this research outcome will further propel the automation of tea picking, infusing fresh vitality into the sustainable development of the tea industry, while also providing a reference example of intelligent solutions for the fine management of other crops.