基于迁移学习的多模态葡萄检测与计数方法

doi:10.12133/j.smartag.SA202504005

Smart Agriculture ›› 2025, Vol. 7 ›› Issue (4): 174-186.doi: 10.12133/j.smartag.SA202504005

• 信息处理与决策 • 上一篇

基于迁移学习的多模态葡萄检测与计数方法

徐雯雯¹^,², 余克健³, 戴泽旭¹^,², 吴云志¹^,²()

^1. 安徽农业大学信息与人工智能学院，安徽合肥 230036，中国
^2. 安徽省北斗精准农业信息工程研究中心，安徽合肥 230036，中国
^3. 东华大学计算机科学与技术学院，上海 201620，中国

收稿日期:2025-04-06 出版日期:2025-07-30
基金项目:
2024年安徽省科技创新攻坚计划项目(202423k09020031); 安徽省特色农业产业技术体系建设专项资助(ahtsnycytx-12)
作者简介:
徐雯雯，硕士研究生，研究方向为计算机视觉。E-mail：wenwenxu@stu.ahau.edu.cn
通信作者:
吴云志，硕士，副教授，研究方向为计算机视觉。E-mail：wuyzh@ahau.edu.cn

A Transfer Learning-Based Multimodal Model for Grape Detection and Counting

XU Wenwen¹^,², YU Kejian³, DAI Zexu¹^,², WU Yunzhi¹^,²()

^1. School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China
^2. Anhui Beidou Precision Agriculture Information Engineering Research Center, Hefei 230036, China
^3. School of Computer Science and Technology, Donghua University, Shanghai 201620, China

Received:2025-04-06 Online:2025-07-30
Foundation items:2024 Anhui Provincial Science and Technology Innovation Plan Project(202423k09020031); Special Fund for Anhui Characteristic Agriculture Industry Technology System(ahtsnycytx-12)
About author:
XU Wenwen, E-mail: wenwenxu@stu.ahau.edu.cn
Corresponding author:
WU Yunzhi, E-mail: wuyzh@ahau.edu.cn

摘要/Abstract

摘要：

【目的/意义】 葡萄作为全球综合产值最大的经济作物之一，其产量估计在农业经济发展中具有重要的意义。然而，目前葡萄产量预测困难且成本高。为解决上述问题，本研究提出了一种基于迁移学习的多模态检测框架，旨在实现不同品种葡萄的检测和计数，从而为葡萄产量预测及果园智能化管理提供有效支持。 【方法】 该框架利用公开数据集的预训练模型进行特征提取，并通过特征增强模块提高葡萄图像和文本之间的跨模态融合效果。在跨模态查询选择阶段，该框架通过语言引导的查询选择策略，从葡萄图像中筛选查询，进而采用跨模态解码器输出相应的预测结果。 【结果和讨论】 与9个基线模型相比，该方法在检测和计数方面均展现出最优效果。具体而言，在交并比（Intersection Over Union, IoU）阈值为0.5的条件下，该模型在检测任务上达到了80.3%的平均精度均值（Mean Average Precision, mAP）；在计数任务上实现了1.65的平均绝对误差（Mean Absolute Error, MAE），2.48的均方根误差（Root Mean Square Error, RMSE）。值得关注的是，该方法在识别不同目标大小的效果均表现较好，并且在不同环境条件下表现出良好的泛化能力和更快的收敛速度。 【结论】 本研究提出的葡萄检测与计数方法能够为精准农业提供强有力的技术支持。

关键词: 迁移学习, 计数, 多模态, 检测, 葡萄, GDCNet

Abstract:

[Objective] As one of the world's largest cash crops in terms of total production value, grape has a yield whose accurate estimation is crucial for agricultural and economic development. However, at present, grape yield prediction is difficult and costly, detection of green grape varieties with similar colors of grape berries and grape leaves has limitations, and detection of grape bunches with small berries is ineffective. In order to solve the above problems, a multimodal detection framework is proposed based on transfer learning, which aims to realize the detection and counting of different varieties of grapes, so as to provide reliable technical support for grape yield prediction and intelligent management of orchards. [Methods] A multimodal grape detection framework based on transfer learning was proposed. This transfer learning utilized the feature representation capabilities of pretrained models, requiring only a small number of grape images for fine-tuning to adapt to the task. This approach not only reduced labeling costs but also enhanced the ability to capture grape features effectively. The multimodal framework adopted a dual-encoder-single-decoder structure, consisting of three core modules: the image and text feature extraction and enhancement module, the language-guided query selection module, and the cross-modality decoder module. In the feature extraction stage, the framework employed pretrained models from public datasets for transfer learning, which significantly reduced the training time and costs of the model on the target task while effectively improving the capability to capture grape features. By introducing a feature enhancement module, the framework achieved cross-modality fusion effects between grape images and text. Additionally, the attention mechanism was implemented to enhance both image and text features, facilitating cross-modality feature learning between images and text. During the cross-modality query selection phase, the framework utilized a language-guided query selection strategy that enabled the filtering of queries from grape images. This strategy allowed for a more effective use of input text to guide the object in target detection, selecting features that were more relevant to the input text as queries for the decoder. The cross-modality decoder combined the features from grape images and text modalities to achieve more accurate modality alignment, thereby facilitating a more effective fusion of grape image and text information, ultimately producing the corresponding grape prediction results. Finally, to comprehensively evaluate the model's performance, the mean average precision (mAP) and average recall (AR) were adopted as evaluation metrics for the detection task, while the counting task was quantified using the mean absolute error (MAE) and root mean square error (RMSE) as assessment indicators. [Results and Discussions] This method exhibited optimal performance in both detection and counting when compared to nine baseline models. Specifically, a comprehensive evaluation was conducted on the WGISD public dataset, where the method achieved an mAP₅₀ of 80.3% in the detection task, representing a 2.7 percentage points improvement over the second-best model. Additionally, it reached 53.2% mAP and 58.2% mAP₇₅, surpassing the second-best models by 13.4 and 22 percent points, respectively, and achieved an mAR of 76.5%, which was 9.8 percent points increase over the next best model. In the counting task, the method realized a MAE of 1.65 and an RMSE of 2.48, outperforming all other baseline models in counting effectiveness. Furthermore, experiments were conducted using a total of nine grape varieties from both the WGISD dataset and field-collected data, resulting in an mAP₅₀ of 82.5%, an mAP of 58.5%, an mAP₇₅ of 64.4%, an mAR of 77.1%, an MAE of 1.44, and an RMSE of 2.19. These results demonstrated the model's strong adaptability and effectiveness across diverse grape varieties. Notably, the method not only performed well in identifying large grape clusters but also showed superior performance on smaller grape clusters, achieving an mAP_s of 74.2% in the detection task, which was 9.5 percent points improvement over the second-best model. Additionally, to provide a more intuitive assessment of model performance, this study selected grape images from the test set for visual comparison analysis. The results revealed that the model's detection and counting outcomes for grape clusters closely aligned with the original annotation information from the label dataset. Overall, this method demonstrated strong generalization capabilities and higher accuracy under various environmental conditions for different grape varieties. This technology has the potential to be applied in estimating total orchard yield and reducing pre-harvest measurement errors, thereby effectively enhancing the precision management level of vineyards. [Conclusions] The proposed method achieved higher accuracy and better adaptability in detecting five grape varieties compared to other baseline models. Furthermore, the model demonstrated substantial practicality and robustness across nine different grape varieties. These findings suggested that the method developed in this study had significant application potential in grape detection and counting tasks. It could provide strong technical support for the intelligent development of precision agriculture and the grape cultivation industry, highlighting its promising prospects in enhancing agricultural practices.

Key words: transfer learning, counting, multimodal, detection, grape, GDCNet

中图分类号:

TP391.41

徐雯雯, 余克健, 戴泽旭, 吴云志. 基于迁移学习的多模态葡萄检测与计数方法[J]. 智慧农业(中英文), 2025, 7(4): 174-186.

XU Wenwen, YU Kejian, DAI Zexu, WU Yunzhi. A Transfer Learning-Based Multimodal Model for Grape Detection and Counting[J]. Smart Agriculture, 2025, 7(4): 174-186.

图/表 16

表1

图1

表2

图2

图3

图4

表3

图5

表4

图6

表5

图7

表6

图8

表7

图9

参考文献 46

[1]	PALACIOS F, MELO-PINTO P, DIAGO M P, et al. Deep learning and computer vision for assessing the number of actual berries in commercial vineyards[J]. Biosystems engineering, 2022, 218: 175-188.
[2]	LIU S, COSSELL S, TANG J L, et al. A computer vision system for early stage grape yield estimation based on shoot detection[J]. Computers and electronics in agriculture, 2017, 137: 88-101.
[3]	WOHLFAHRT Y, COLLINS C, STOLL M. Grapevine bud fertility under conditions of elevated carbon dioxide[J]. OENO one, 2019, 53(2): ID 2428.
[4]	DE LA FUENTE M, LINARES R, BAEZA P, et al. Comparison of different methods of grapevine yield prediction in the time window between fruitset and veraison[J]. OENO one, 2016, 49(1): ID 27.
[5]	DIAGO M P, TARDAGUILA J, ALEIXOS N, et al. Assessment of cluster yield components by image analysis[J]. Journal of the science of food and agriculture, 2015, 95(6): 1274-1282.
[6]	CARRILLO E, MATESE A, ROUSSEAU J, et al. Use of multi-spectral airborne imagery to improve yield sampling in viticulture[J]. Precision agriculture, 2016, 17(1): 74-92.
[7]	SILVER D L, MONGA T. In vino veritas: Estimating vineyard grape yield from images using deep learning[M]// Advances in Artificial Intelligence. Cham: Springer International Publishing, 2019: 212-224.
[8]	BUAYAI P, SAIKAEW K R, MAO X Y. End-to-end automatic berry counting for table grape thinning[J]. IEEE access, 2021, 9: 4829-4842.
[9]	AQUINO A, MILLAN B, DIAGO M P, et al. Automated early yield prediction in vineyards from on-the-go image acquisition[J]. Computers and electronics in agriculture, 2018, 144: 26-36.
[10]	SHEN L, SU J Y, HE R T, et al. Real-time tracking and counting of grape clusters in the field based on channel pruning with YOLOv5s[J]. Computers and electronics in agriculture, 2023, 206: ID 107662.
[11]	张传栋, 高鹏, 亓璐, 等. 基于SAW-YOLO v8n的葡萄幼果轻量化检测方法[J]. 农业机械学报, 2024, 55(10): 286-294.
	ZHANG C D, GAO P, QI L, et al. Lightweight detection method for young grape cluster fruits based on SAW-YOLO v8n[J]. Transactions of the Chinese society for agricultural machinery, 2024, 55(10): 286-294.
[12]	COVIELLO L, CRISTOFORETTI M, JURMAN G, et al. GBCNet: In-field grape berries counting for yield estimation by dilated CNNs[J]. Applied sciences, 2020, 10(14): ID 4870.
[13]	ZABAWA L, KICHERER A, KLINGBEIL L, et al. Detection of single grapevine berries in images using fully convolutional neural networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway, New Jersey, USA: IEEE, 2019: 2571-2579.
[14]	WANG Q, FAN X J, ZHUANG Z Q, et al. One to all: Toward a unified model for counting cereal crop heads based on few-shot learning[J]. Plant phenomics, 2024, 6: ID 271.
[15]	刘畅, 孙雨, 杨晶, 等. 基于3C-YOLOv8n和深度相机的葡萄识别与定位方法[J]. 智慧农业(中英文), 2024, 6(6): 121-131.
	LIU C, SUN Y, YANG J, et al. Grape recognition and localization method based on 3C-YOLOv8n and depth camera[J]. Smart agriculture, 2024, 6(6): 121-131.
[16]	YU C H, SHI X Y, LUO W K, et al. MLG-YOLO: A model for real-time accurate detection and localization of winter jujube in complex structured orchard environments[J]. Plant phenomics, 2024, 6: ID 258.
[17]	DU W S, LIU P. Instance segmentation and berry counting of table grape before thinning based on AS-SwinT[J]. Plant phenomics, 2023, 5: ID 0085.
[18]	LU S L, LIU X Y, HE Z X, et al. Swin-transformer-YOLOv5 for real-time wine grape bunch detection[J]. Remote sensing, 2022, 14(22): ID 5853.
[19]	WANG J H, ZHANG Z Y, LUO L F, et al. SwinGD: A robust grape bunch detection model based on swin transformer in complex vineyard environment[J]. Horticulturae, 2021, 7(11): ID 492.
[20]	XIA L H, LIU J B, WU T. Depth estimation algorithm based on transformer-encoder and feature fusion[C]// 2024 7th International Conference on Advanced Algorithms and Control Engineering (ICAACE). Piscataway, New Jersey, USA: IEEE, 2024: 160-164.
[21]	WANG J H, ZHANG Z Y, LUO L F, et al. DualSeg: Fusing transformer and CNN structure for image segmentation in complex vineyard environment[J]. Computers and electronics in agriculture, 2023, 206: ID 107682.
[22]	AHMEDT-ARISTIZABAL D, SMITH D, KHOKHER M R, et al. An in-field dynamic vision-based analysis for vineyard yield estimation[J]. IEEE access, 2024, 12: 102146-102166.
[23]	ZHENG S J, WANG R J, ZHENG S T, et al. Adaptive density guided network with CNN and Transformer for underwater fish counting[J]. Journal of king Saud university-computer and information sciences, 2024, 36(6): ID 102088.
[24]	ZHANG C J, LIU T, WANG J X, et al. DeepPollenCount: A swin-transformer-YOLOv5-based deep learning method for pollen counting in various plant species[J]. Aerobiologia, 2024, 40(3): 425-436.
[25]	CECOTTI H, RIVERA A, FARHADLOO M, et al. Grape detection with convolutional neural networks[J]. Expert systems with applications, 2020, 159: ID 113588.
[26]	XUE X Q, NIU W D, HUANG J X, et al. TasselNetV2++: A dual-branch network incorporating branch-level transfer learning and multilayer fusion for plant counting[J]. Computers and electronics in agriculture, 2024, 223: ID 109103.
[27]	CAO B Y, ZHANG B H, ZHENG W, et al. Real-time, highly accurate robotic grasp detection utilizing transfer learning for robots manipulating fragile fruits with widely variable sizes and shapes[J]. Computers and electronics in agriculture, 2022, 200: ID 107254.
[28]	BAI Y H, GUO Y X, ZHANG Q, et al. Multi-network fusion algorithm with transfer learning for green cucumber segmentation and recognition under complex natural environment[J]. Computers and electronics in agriculture, 2022, 194: ID 106789.
[29]	CHEN D, LU Y Z, LI Z J, et al. Performance evaluation of deep transfer learning on multi-class identification of common weed species in cotton production systems[J]. Computers and electronics in agriculture, 2022, 198: ID 107091.
[30]	ZHA Z H, SHI D Y, CHEN X H, et al. Classification of appearance quality of red grape based on transfer learning of convolution neural network[J]. Agronomy, 2023, 13(8): ID 2015.
[31]	GAI R L, LIU Y, XU G H. TL-YOLOv8: A blueberry fruit detection algorithm based on improved YOLOv8 and transfer learning[J]. IEEE access, 2024, 12: 86378-86390.
[32]	SANTOS T T, DE SOUZA L L, DOS SANTOS A A, et al. Grape detection, segmentation, and tracking using deep neural networks and three-dimensional association[J]. Computers and electronics in agriculture, 2020, 170: ID 105247.
[33]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[M]// Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020: 213-229.
[34]	LI L H, ZHANG P C, ZHANG H T, et al. Grounded language-image pre-training[C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2022: 10955-10965.
[35]	ZHANG H, LI F, LIU S L, et al. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection[EB/OL]. arXiv: 2203.03605, 2022.
[36]	REDMON J, FARHADI A. YOLOv3: An incremental improvement[EB/OL]. arXiv: 1804.02767, 2018.
[37]	ZHANG S L, WANG X J, WANG J Q, et al. Dense distinct query for end-to-end object detection[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2023: 7329-7338.
[38]	ZHU B J, WANG J F, JIANG Z K, et al. AutoAssign: Differentiable label assignment for dense object detection[EB/OL]. arXiv: 2007.03496, 2020.
[39]	ZHANG S F, CHI C, YAO Y Q, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2020: 9756-9765.
[40]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149.
[41]	ZHANG H, CHANG H, MA B, et al. Dynamic R-CNN: Towards high quality object detection via dynamic training[C]// Computer Vision–ECCV 2020: 16th European Conference. Cham, Germany: Springer International Publishing, 2020: 260-275.
[42]	LU X, LI B Y, YUE Y X, et al. Grid R-CNN[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2019: 7355-7364.
[43]	GHIASI G, LIN T Y, LE Q V. NAS-FPN: Learning scalable feature pyramid architecture for object detection[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2019: 7029-7038.
[44]	Cao Y, Chen K, Loy C C, et al. Prime sample attention in object detection[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Piscataway, New Jersey, USA: IEEE. 2020: 11583-11591.
[45]	LIU S L, ZENG Z Y, REN T H, et al. Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection[EB/OL]. arXiv: 2303.05499, 2023.
[46]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2017: 618-626.

葡萄品种	图像/张	标记框/组
霞多丽（Chardonnay， CDY）	65	840
法国品丽珠（Cabernet Franc， CFR）	65	1 069
赤霞珠（Cabernet Sauvignon， CSV）	57	643
白苏维翁（Sauvignon Blanc， SVB）	65	1 317
西拉（Syrah， SYH）	48	563
总计	300	4 432

葡萄品种	图像/张	标记框/组
醉金香（Fragrant Gold， FTG）	70	705
美人指（Manicure Finger， MFN）	70	593
玫瑰香（Muscat Hamburg， MHB ）	70	654
甬尤（Yongyou， YGU）	70	459
总计	280	2 411

模型	mAP/%	mAP₅₀/%	mAP₇₅/%	mAR/%	epochs
YOLOv3	22.0	57.0	11.3	37.3	200
DDQ	27.7	45.5	28.6	66.7	70
Grid R-CNN	32.0	63.3	28.6	48.5	70
AutoAssign	6.50	21.2	1.50	26.2	170
PISA	29.7	60.0	25.6	44.2	100
NAS-FPN	39.8	77.6	36.2	52.8	80
Faster R-CNN	28.0	53.4	26.4	48.4	100
Dynamic R-CNN	23.0	48.9	19.7	40.3	80
ATSS	9.60	28.6	3.20	32.6	170
Grounding DINO	1.50	2.90	1.60	33.1	—
GDCNet	53.2	80.3	58.2	76.5	30

模型	mAP_s	mAP_m	mAP_l	mAR_s	mAR_m	mAR_l
YOLOv3	35.7	15.6	18.3	35.6	27.0	41.8
DDQ	64.7	15.9	24.9	64.9	57.7	71.7
Grid R-CNN	47.0	26.6	28.8	46.9	36.2	54.6
AutoAssign	28.4	2.4	4.4	28.3	8.0	29.0
PISA	44.3	16.8	27.1	44.3	26.8	48.7
NAS-FPN	51.7	31.2	36.7	51.8	43.7	57.3
Faster R-CNN	47.0	21.7	24.6	46.9	36.5	54.7
Dynamic R-CNN	39.2	14.8	19.6	39.2	23.9	45.5
ATSS	34.2	7.4	6.2	34.0	16.6	35.1
Grounding DINO	36.2	5.60	0.90	36.0	12.9	34.6
GDCNet	74.2	45.9	43.8	74.7	75.4	83.5

模型	MAE/串	RMSE/串
YOLOv3	6.85	8.47
DDQ	10.55	12.03
Grid R-CNN	4.80	6.21
AutoAssign	16.30	17.03
PISA	10.05	11.71
NAS-FPN	2.10	2.72
Faster R-CNN	10.65	12.08
Dynamic R-CNN	12.60	13.34
ATSS	16.30	17.03
GDCNet	1.65	2.48

基于迁移学习的多模态葡萄检测与计数方法

A Transfer Learning-Based Multimodal Model for Grape Detection and Counting

在线阅读

知网下载

本地下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 46

相关文章 15

编辑推荐

Metrics

本文评价

模型	mAP/%	mAP₅₀/%	mAP₇₅/%	mAR/%
GDCNet	53.2	80.3	58.2	76.5
去掉编码器融合	35.9	63.4	35.5	62.8
去掉文本交叉注意力模块	50.5	75.1	55.3	76.5

[1]	白珏坤, 陈怀勐, 董大明, 刘亚超, 岳晓龙, 杜秀可. 光谱技术在蔬菜生产检测中的研究进展、挑战与建议[J]. 智慧农业(中英文), 2025, 7(4): 1-17.
[2]	刘洁, 赵康, 赵钦君, 宋烨. 基于D-S证据理论的苹果霉心病声振检测方法[J]. 智慧农业(中英文), 2025, 7(4): 119-131.
[3]	常戬, 王冰冰, 尹龙, 李燕青, 李兆歆, 李壮. 基于YOLOv10n-CHL的蜜蜂授粉轻量化识别模型[J]. 智慧农业(中英文), 2025, 7(3): 185-198.
[4]	胡玲艳, 郭睿雅, 郭占俊, 徐国辉, 盖荣丽, 汪祖民, 张宇萌, 鞠博文, 聂晓宇. 融合PDE植物时序图像对比学习方法与GCN跳跃连接的U-Net温室甜樱桃图像分割方法[J]. 智慧农业(中英文), 2025, 7(3): 131-142.
[5]	张乐, 李爱学, 陈立平. 电化学传感器应用于植物活性小分子检测综述[J]. 智慧农业(中英文), 2025, 7(3): 69-88.
[6]	谢纪元, 张东彦, 牛圳, 程涛, 苑峰, 刘亚玲. 基于YOLOv10-MHSA的“三北”工程内蒙古地区植树位点精准检测研究[J]. 智慧农业(中英文), 2025, 7(3): 108-119.
[7]	黎祖胜, 唐吉深, 匡迎春. 基于改进YOLOv10n的轻量化荔枝虫害小目标检测模型[J]. 智慧农业(中英文), 2025, 7(2): 146-159.
[8]	牛子昂, 裘正军. 基于改进YOLOv11-Pose的玉米植株骨架及表型参数提取方法[J]. 智慧农业(中英文), 2025, 7(2): 95-105.
[9]	杨晨雪, 李娴, 周清波. 知识图谱驱动下粮食生产大数据应用现状与展望[J]. 智慧农业(中英文), 2025, 7(2): 26-40.
[10]	吴六爱, 许雪珂. 基于改进YOLOv10n的轻量化番茄叶片病虫害检测方法[J]. 智慧农业(中英文), 2025, 7(1): 146-155.
[11]	杨信廷, 胡焕, 陈晓, 李汶政, 周子洁, 李文勇. 多源场景下粘虫板小目标害虫轻量化检测识别模型[J]. 智慧农业(中英文), 2025, 7(1): 111-123.
[12]	金宁, 郭宇峰, 韩晓东, 缪祎晟, 吴华瑞. 基于迁移学习的农业短文本语义相似度计算方法[J]. 智慧农业(中英文), 2025, 7(1): 33-43.
[13]	宫宇, 王玲, 赵荣强, 尤海波, 周沫, 刘劼. 基于多模态数据表型特征提取的番茄生长高度预测方法[J]. 智慧农业(中英文), 2025, 7(1): 97-110.
[14]	吴华瑞, 赵春江, 李静晨. 基于多模态融合大模型架构Agri-QA Net的作物知识问答系统[J]. 智慧农业(中英文), 2025, 7(1): 1-10.
[15]	李洪波, 田鑫, 阮志文, 刘少文, 任玮琪, 苏中滨, 高睿, 孔庆明. 基于改进YOLOv8的苗期玉米行检测方法[J]. 智慧农业(中英文), 2024, 6(6): 72-84.