[Objective] This study aims to solve the current problem of the intelligent pruning robot's insufficient recognition accuracy of fruit tree branches and inaccurate pruning point localization in complex field environments. To address this, a deep learning method based on the fusion of images and point clouds was proposed, enabling non-contact segmentation of dormant high-spindle apple tree branches and phenotypic parameter measurement. And complete the automatic identification and accurate localization of pruning points. [Methods] In this study, localized RGB-D data were gathered from apple trees using a Realsense D435i camera, a device capable of effective depth measurements within a range of 0.3~3.0 m. Data collection occurred between early and mid-January 2024, from 9:00 AM to 4:00 PM daily. During this period, the weather remained sunny, ensuring optimal conditions for high-quality data acquisition. To maintain consistency, the camera was mounted on a stand at a distance of 0.4~0.5 m from the main stem of the apple trees. After collecting the data, researchers manually labeled trunks and branches using Labelme software. They also employed the OpenCV library to enhance image data, which helped prevent overfitting during model training. To improve segmentation accuracy for tree trunks and branches in RGB images, the research team introduced an enhanced U-Net model. This model utilized VGG16 (Visual Geometry Group 16) as its backbone feature extraction network and incorporated the Convolutional Block Attention Module (CBAM) at the up-sampling stage. Based on the segmentation results, a multimodal data processing flow was established. Initially, the segmented branch mask maps were obtained from skeleton lines extracted using OpenCV's algorithm. The first-level branch connection points were identified based on their positions relative to the trunk. Subsequently, potential pruning points were searched for within local neighborhoods through coordinate translation. An edge detection algorithm was applied to locate the nearest edge pixels to these potential pruning points. By extending the diameter line of the branch pixel points on the images and combining this with depth information, the actual diameter of the branches could be estimated. Additionally, the branch spacing was calculated using the differences in vertical coordinates of potential pruning points in the pixel coordinate system, alongside depth information. Meanwhile, the trunk point cloud data were acquired by merging the trunk mask map with the depth map. Preprocessing of the point cloud enabled the estimation of the average trunk diameter in the local view through cylindrical fitting using the Randomized Sampling Consistency (RANSAC) algorithm. Finally, an intelligent pruning decision-making algorithm was developed through investigation of orchardists' pruning experience, analysis of relevant literature, and integration of phenotypic parameter acquisition methods, thus achieving accurate prediction of apple tree pruning points. [Results and Discussion] The improved U-Net model in this study achieved a mean pixel accuracy (mPA) of 95.52% for branch segmentation, representing a 2.74% improvement over the original architecture. Corresponding increases were observed in mean intersection over union (mIoU) and precision metrics. Comparative evaluations against DeepLabV3+, PSPNet, and the baseline U-Net were conducted under both backlight and front-light illumination conditions. The enhanced model demonstrated superior segmentation performance and robustness across all tested scenarios. Ablation experiments indicated that replacing the original feature extractor with VGG16 yielded a 1.52% mPA improvement, accompanied by simultaneous gains in mIoU and precision. The integration of the Convolutional Block Attention Module (CBAM) at the up sampling stage further augmented the model's capacity to resolve fine branch structures. Phenotypic parameter estimation using segmented branch masks combined with depth maps showed strong correlations with manual measurements. Specifically, the coefficient of determination (R2) values for primary branch diameter, branch spacing, and trunk diameter were 0.96, 0.95, and 0.91, respectively. The mean absolute errors (MAE) were recorded as 1.33, 13.96, and 5.11 mm, surpassing the accuracy of visual assessments by human pruning operators. The intelligent pruning decision system achieved an 87.88% correct identification rate for pruning points, with an average processing time of 4.2 s per viewpoint. These results validated the proposed methodology's practical feasibility and operational efficiency in real-world agricultural applications. [Conclusion] In summary, an efficient and accurate method was proposed for identifying pruning points on apple trees based on the fusion of image and point cloud data through deep learning. This comprehensive approach provides significant support for the application of intelligent pruning robots in modern agriculture, further advancing the shift towards smarter and more efficient agricultural production. The findings demonstrate that this method not only offers high feasibility but also exhibits outstanding efficiency and accuracy in practical applications, laying a solid foundation for future agricultural automation.