An Underwater Insitu Quality Estimation Method for Chinese Mitten Crab Based on Binocular Vision and Improved YOLOv11-pose

doi:10.12133/j.smartag.SA202505019

Abstract

Abstract:

[Objective] With the accelerated development of large-scale and intelligent aquaculture, accurate estimation of the body mass of individual Chinese mitten crabs is critical for tasks such as precise feeding, disease prevention, and optimization of harvest decisions. Traditional methods of manually catching and weighing crabs are time-consuming, labor-intensive, and can cause stress or injury to the crabs, while also failing to provide real-time monitoring. To address the challenges posed by turbid water conditions in aquaculture, which lead to poor image quality and difficulty in feature extraction, a method is proposed for estimating Chinese mitten crab quality that combines binocular vision with deep learning–based keypoint detection. This approach achieves high-precision detection of anatomical keypoints on the crab, providing new technical support for precision aquaculture and intelligent management. [Methods] Based on a lightweight YOLOv11 framework, in its C3K2 module, MBConv depthwise-separable convolutions were incorporated to significantly reduce computational complexity and improve feature extraction efficiency. An EffectiveSE channel attention mechanism was introduced to adaptively emphasize important channel-wise features. To further enhance cross-scale information fusion, a spatial dynamic feature fusion module (SDFM) was added. The SDFM adaptively and weightedly fused local spatial attention with global channel attention, enabling detailed extraction of crab shell edges and anatomical keypoints. The improved YOLOv11-ES model could simultaneously output the crab's bounding box, the positions of four anatomical keypoints, and the crab's sex classification in a single forward pass. In the 3D reconstruction stage, calibrated stereo camera parameters were used, and a sparse keypoint matching strategy guided by the crab's sex and spatial geometric constraints was employed. High-confidence keypoint pairs were selected from the left and right views, and the true 3D coordinates of the crab's carapace length and width were computed by triangulation. Finally, the obtained carapace length, width, and sex label data were fed into a two-layer back-propagation (BP) neural network to perform a regression prediction of the individual crab's mass. [Results and Discussion] To validate the effectiveness and robustness of the proposed method, a dataset of Chinese mitten crab images with annotated keypoints was constructed under varying water turbidity and lighting conditions, and both ablation and comparative experiments were conducted. The YOLOv11-ES achieved a mean Average Precision at intersection over union (IOU) threshold of 0.5 (mAP@50) of 97.2% on the test set, which is 4.4 percentage point higher than the original YOLOv11 model. The keypoint detection component reached an mAP@50 of 96.7%, which is 3.6 percentage point higher than that of the original YOLOv11 model. In comparative experiments, YOLOv11-ES also demonstrated significant advantages over other models in the same series. Moreover, in a full-system evaluation using images of 30 individual crabs, the mean absolute percentage error (MAPE) for carapace width measurements was only 2.68%, and for carapace length it was 1.48%. The Pearson correlation coefficients between the measured and manually obtained true values for both carapace length and width exceeded 0.977, indicating high accuracy in the 3D reconstruction and minimal measurement error. Experiments analyzing the influence of image quality on measurement accuracy showed that when the underwater image quality measure (UIQM) reached at least 1.5, the combined MAPE of carapace length and width errors could be kept below 5%. When UIQM reached at least 2.2, the MAPE dropped to about 1.9%. These results confirmed the robustness of the method against variations in water turbidity and lighting conditions. For mass regression prediction, the BP network trained on carapace length, width, and sex features achieved a mean absolute error (MAE) of 2.39 g and a MAPE of 7.1% on an independent test set, demonstrating high-precision estimation of individual crab mass. [Conclusions] The proposed method, which combines an improved YOLOv11 object detection network, binocular sparse keypoint matching, and a two-layer BP regression network, enabled high-precision, low-error, real-time, non-contact estimation of Chinese mitten crab mass in complex turbid aquatic environments. This approach featured a lightweight model, high computational efficiency, excellent measurement accuracy, and strong adaptability to varying environmental conditions. It provided key technical parameters for intelligent Chinese mitten crab farming. In the future, this approach could be extended to other aquaculture species and complex farming scenarios. Combined with transfer learning and online adaptive calibration techniques, its generalization capability could be further improved and integrated with intelligent monitoring platforms to achieve large-scale, all-weather underwater crab quality estimation, contributing to the sustainable development of smart aquaculture.

Key words: Chinese mitten crab, keypoint detection, binocular vision, YOLOv11, weight estimation

CLC Number:

TP391.4

LI Aoqiang, DAI Hangyu, GUO Ya. An Underwater Insitu Quality Estimation Method for Chinese Mitten Crab Based on Binocular Vision and Improved YOLOv11-pose[J]. Smart Agriculture, doi: 10.12133/j.smartag.SA202505019.

Figures/Tables 11

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Table 1

Table 2

Fig. 6

Fig. 7

Fig. 8

Fig. 9

References 33

[1]	段延娥, 李道亮, 李振波, 等. 基于计算机视觉的水产动物视觉特征测量研究综述[J]. 农业工程学报, 2015, 31(15): 1-11.
	DUAN Y E, Li D L, Li Z B, et al. Review on visual characteristic measurement research of aquatic animals based on computer vision[J]. Transactions of the Chinese society of agricultural engineering, 2015, 31(15): 1-11.
[2]	LI D L, LIU C, SONG Z Y, et al. Automatic monitoring of relevant behaviors for crustacean production in aquaculture: A review[J]. Animals, 2021, 11(9): ID 2709.
[3]	ZHAO Y X, QIN H X, XU L, et al. A review of deep learning-based stereo vision techniques for phenotype feature and behavioral analysis of fish in aquaculture[J]. Artificial intelligence review, 2024, 58(1): ID 7.
[4]	张铮, 鲁祥, 胡庆松. 基于图像增强与GC-YOLO v5s的水下环境河蟹识别轻量化模型研究[J]. 农业机械学报, 2024, 55(11): 124-131, 374.
	ZHANG Z, LU X, HU Q S. Lightweight model for river crab detection based on image enhancement and improved YOLO v5s[J]. Transactions of the Chinese society for agricultural machinery, 2024, 55(11): 124-131, 374.
[5]	JI W, PENG J Q, XU B, et al. Real-time detection of underwater river crab based on multi-scale pyramid fusion image enhancement and MobileCenterNet model[J]. Computers and electronics in agriculture, 2023, 204: ID 107522.
[6]	LIU C H, WANG Z Y, LI Y C, et al. Research progress of computer vision technology in abnormal fish detection[J]. Aquacultural engineering, 2023, 103: ID 102350.
[7]	唐永成, 彭姣, 赵运林, 等. 池养中华绒螯蟹不同性别形态及质量差异分析[J]. 渔业科学进展, 2019, 40(6): 114-120.
	TANG Y C, PENG J, ZHAO Y L, et al. Morphological attributes and quality parameters of different sexes of Eriocheir sinensis cultured in a pond[J]. Progress in fishery sciences, 2019, 40(6): 114-120.
[8]	和飞, 王志忠, 卢红, 等. 黄河口中华绒螯蟹成蟹形态性状与体质量的相关性及通径分析[J]. 水产学杂志, 2024, 37(3): 31-36.
	HE F, WANG Z Z, LU H, et al. Correlation and path analysis on morphometric traits and body weight for adult Chinese mitten handed crab (Eriocheir sinensis) from Yellow River Delta[J]. Chinese journal of fisheries, 2024, 37(3): 31-36.
[9]	SUN D W, LI J T, LI Z, et al. Grading related feature extraction of Chinese mitten crab based on machine vision[J]. BIO web of conferences, 2024, 142: ID 02016.
[10]	CHEN K, CHEN Z Q, WANG C B, et al. Improved YOLOv8-based method for the carapace keypoint detection and size measurement of Chinese mitten crabs[J]. Animals, 2025, 15(7): ID 941.
[11]	HUO G, WU Z, LI J, et al. Underwater target detection and 3D reconstruction system based on binocular vision[J]. Sensors, 2018, 18(10): 3570.
[12]	KONG M R, LI B B, ZHANG Y H, et al. Non-intrusive mass estimation method for crucian carp using instance segmentation and point cloud processing[J]. Computers and electronics in agriculture, 2024, 226: ID 109445.
[13]	ZHOU M G, SHEN P F, ZHU H, et al. In-water fish body-length measurement system based on stereo vision[J]. Sensors, 2023, 23(14): ID 6325.
[14]	SHI C, WANG Q B, HE X L, et al. An automatic method of fish length estimation using underwater stereo system based on LabVIEW[J]. Computers and electronics in agriculture, 2020, 173: ID 105419.
[15]	SETIAWAN A, HADIYANTO H, WIDODO C E. Shrimp body weight estimation in aquaculture ponds using morphometric features based on underwater image analysis and machine learning approach[J]. Revue d'Intelligence artificielle, 2022, 36(6): 905-912.
[16]	董鹏, 周烽, 赵悰悰, 等. 基于双目视觉的水下海参尺寸自动测量方法[J]. 计算机工程与应用, 2021, 57(8): 271-278.
	DONG P, ZHOU F, ZHAO C C, et al. Automatic measurement of underwater sea cucumber size based on binocular vision[J]. Computer engineering and applications, 2021, 57(8): 271-278.
[17]	LI Q, WANG H J, XIAO Y, et al. Underwater unsupervised stereo matching method based on semantic attention[J]. Journal of marine science and engineering, 2024, 12(7): ID 1123.
[18]	汤忠强, 周波, 戴先中, 等. 基于改进DCP算法的水下机器人视觉增强[J]. 机器人, 2018, 40(2): 222-230.
	TANG Z Q, ZHOU B, DAI X Z, et al. Underwater robot visual enhancements based on the improved DCP algorithm[J]. Robot, 2018, 40(2): 222-230.
[19]	王新伟, 孙亮, 雷平顺, 等. 用于海洋宏生物原位观测的水下激光雷达相机[J]. 红外与激光工程, 2021, 50(6): 37-45.
	WANG X W, SUN L, LEI P S, et al. Underwater light ranging and imaging for macro marine life in situ observation and measurement[J]. Infrared and laser engineering, 2021, 50(6): 37-45.
[20]	HU K, WANG T Y, SHEN C W, et al. Overview of underwater 3D reconstruction technology based on optical images[J]. Journal of marine science and engineering, 2023, 11(5): ID 949.
[21]	崔海朋, 秦朝旭, 马志宇. 基于深度学习的鱼类特征点检测与体征识别方法[J]. 中国农机化学报, 2024, 45(6): 201-207.
	CUI H P, QIN C X, MA Z Y. Fish key feature point detection and sign identification based on deep learning[J]. Journal of Chinese agricultural mechanization, 2024, 45(6): 201-207.
[22]	JIAN M W, YANG N, TAO C, et al. Underwater object detection and datasets: A survey[J]. Intelligent marine technology and systems, 2024, 2(1): ID 9.
[23]	DIAZ-GARCIA P, ESCALONA F, CAZORLA M. UKDM: Underwater keypoint detection and matching using underwater image enhancement techniques[EB/OL]. arXiv: 2504.11063, 2025.
[24]	KHANAM R, HUSSAIN M. YOLOv11: An overview of the key architectural enhancements[EB/OL]. arXiv: 2024, 2410: 17725-17734.
[25]	牛子昂, 裘正军. 基于改进YOLOv11-Pose的玉米植株骨架及表型参数提取方法[J]. 智慧农业(中英文), 2025, 7(2): 95-105.
	NIU Z A, QIU Z J. Extraction method of maize plant skeleton and phenotypic parameters based on improved YOLOv11-pose[J]. Smart agriculture, 2025, 7(2): 95-105.
[26]	FU C P, FAN X, XIAO J W, et al. Learning heavily-degraded prior for underwater object detection[J]. IEEE transactions on circuits and systems for video technology, 2023, 33(11): 6887-6896.
[27]	陶洋, 钟邦乾, 赵文博, 等. 融合显示视觉中心与注意力机制的水下目标检测算法[J]. 激光与光电子学进展, 2024, 61(12): 441-450.
	TAO Y, ZHONG B Q, ZHAO W B, et al. Underwater object detection algorithm integrating explicit visual center and attention mechanism[J]. Laser & optoelectronics progress, 2024, 61(12): 441-450.
[28]	TAN M X, LE Q V. EfficientNet: Rethinking model scaling for convolutional neural networks[EB/OL]. arXiv: 1905.11946, 2019.
[29]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, New Jersey, USA: IEEE, 2018: 7132-7141.
[30]	LEE Y, PARK J. CenterMask: Real-time anchor-free instance segmentation[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2020: 13903-13912.
[31]	SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: Inverted residuals and linear bottlenecks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, New Jersey, USA: IEEE, 2018: 4510-4520.
[32]	TANG L F, ZHANG H, XU H, et al. Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity[J]. Information fusion, 2023, 99: ID 101870.
[33]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional Block attention module[M]// Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 3-19.

序号	EMBC	SDFM	GFLOPS	BOX				Pose
序号	EMBC	SDFM	GFLOPS	P/%	R/%	mAP50/%	F ₁/%	P/%	R/%	mAP50/%	F ₁/%
1	×	×	6.6	85.3	84.7	92.8	85.0	85.8	85.1	93.1	85.4
2	×	√	9.7	90.2	91.2	95.7	90.7	89.9	90.8	95.8	90.3
3	√	×	6.4	89.8	93.2	95.1	91.5	89.1	92.3	94.4	90.7
4	√	√	9.5	91.7	94.7	97.2	93.2	91.3	94.4	96.7	92.8

模型	GFLOPS	BOX				Pose
模型	GFLOPS	P/%	R/%	mAP50/%	F ₁/%	P/%	R/%	mAP50/%	F ₁/%
YOLOv5	7.3	83.2	87.5	92.4	85.3	88.3	82.2	92.3	85.1
YOLOv8n	8.3	89.3	85.4	93.7	87.3	89.3	85.4	93.7	87.3
YOLOv10n	8.0	81.4	83.4	86.0	82.4	81.4	83.4	86.2	82.4
YOLO11n	6.6	85.3	84.7	92.8	85.0	85.8	85.1	93.1	85.4
YOLOv12n	6.6	79.4	86.0	88.7	82.6	80.7	86.5	89.1	83.5
YOLOv8-ES	9.7	90.6	87.7	95.4	89.1	90.3	87.3	94.9	88.8
YOLOv11-ES	9.5	91.7	94.7	97.2	93.2	91.3	94.4	96.7	92.8