Zero-Shot Pest Identification Based on Generative Adversarial Networks and Visual-Semantic Alignment

doi:10.12133/j.smartag.SA202312014

Abstract

Abstract:

Objective Accurate identification of insect pests is crucial for the effective prevention and control of crop infestations. However, existing pest identification methods primarily rely on traditional machine learning or deep learning techniques that are trained on seen classes. These methods falter when they encounter unseen pest species not included in the training set, due to the absence of image samples. An innovative method was proposed to address the zero-shot recognition challenge for pests. Methods The novel zero-shot learning method proposed in this study was capable of identifying unseen pest species. First, a comprehensive pest image dataset was assembled, sourced from field photography conducted around Beijing over several years, and from web crawling. The final dataset consisted of 2 000 images across 20 classes of adult Lepidoptera insects, with 100 images per class. During data preprocessing, a semantic dataset was manually curated by defining attributes related to color, pattern, size, and shape for six parts: antennae, back, tail, legs, wings, and overall appearance. Each image was annotated to form a 65-dimensional attribute vector for each class, resulting in a 20×65 semantic attribute matrix with rows representing each class and columns representing attribute values. Subsequently, 16 classes were designated as seen classes, and 4 as unseen classes. Next, a novel zero-shot pest recognition method was proposed, focusing on synthesizing high-quality pseudo-visual features aligned with semantic information using a generator. The wasserstein generative adversarial networks (WGAN) architecture was strategically employed as the fundamental network backbone. Conventional generative adversarial networks (GANs) have been known to suffer from training instabilities, mode collapse, and convergence issues, which can severely hinder their performance and applicability. The WGAN architecture addresses these inherent limitations through a principled reformulation of the objective function. In the proposed method, the contrastive module was designed to capture highly discriminative visual features that could effectively distinguish between different insect classes. It operated by creating positive and negative pairs of instances within a batch. Positive pairs consisted of different views of the same class, while negative pairs were formed from instances belonging to different classes. The contrastive loss function encouraged the learned representations of positive pairs to be similar while pushing the representations of negative pairs apart. Tightly integrated with the WGAN structure, this module substantially improved the generation quality of the generator. Furthermore, the visual-semantic alignment module enforced consistency constraints from both visual and semantic perspectives. This module constructed a cross-modal embedding space, mapping visual and semantic features via two projection layers: One for mapping visual features into the cross-modal space, and another for mapping semantic features. The visual projection layer took the synthesized pseudo-visual features from the generator as input, while the semantic projection layer ingested the class-level semantic vectors. Within this cross-modal embedding space, the module enforced two key constraints: Maximizing the similarity between same-class visual-semantic pairs and minimizing the similarity between different-class pairs. This was achieved through a carefully designed loss function that encourages the projected visual and semantic representations to be closely aligned for instances belonging to the same class, while pushing apart the representations of different classes. The visual-semantic alignment module acted as a regularizer, preventing the generator from producing features that deviated from the desired semantic information. This regularization effect complemented the discriminative power gained from the contrastive module, resulting in a generator that produces high-quality, diverse, and semantically aligned pseudo-visual features. Results and Discussions The proposed method was evaluated on several popular ZSL benchmarks, including CUB, AWA, FLO, and SUN. The results demonstrated that the proposed method achieved state-of-the-art performance across these datasets, with a maximum improvement of 2.8% over the previous best method, CE-GZSL. This outcome fully demonstrated the method's broad effectiveness in different benchmarks and its outstanding generalization ability. On the self-constructed 20-class insect dataset, the method also exhibited exceptional recognition accuracy. Under the standard ZSL setting, it achieved a precise recognition rate of 77.4%, outperforming CE-GZSL by 2.1%. Under the generalized ZSL setting, it achieved a harmonic mean accuracy of 78.3%, making a notable 1.2% improvement. This metric provided a balanced assessment of the model's performance across seen and unseen classes, ensuring that high accuracy on unseen classes does not come at the cost of forgetting seen classes. These results on the pest dataset, coupled with the performance on public benchmarks, firmly validated the effectiveness of the proposed method. Conclusions The proposed zero-shot pest recognition method represents a step forward in addressing the challenges of pest management. It effectively generalized pest visual features to unseen classes, enabling zero-shot pest recognition. It can facilitate pests identification tasks that lack training samples, thereby assisting in the discovery and prevention of novel crop pests. Future research will focus on expanding the range of pest species to further enhance the model's practical applicability.

Key words: pest recognition, semantic knowledge, visual features, generative adversarial networks, contrastive learning, generalized zero-shot learning

LI Tianjun, YANG Xinting, CHEN Xiao, HU Huan, ZHOU Zijie, LI Wenyong. Zero-Shot Pest Identification Based on Generative Adversarial Networks and Visual-Semantic Alignment[J]. Smart Agriculture, doi: 10.12133/j.smartag.SA202312014.

Figures/Tables 17

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig 5

Fig.6

Table 1

Table 2

ZSL results of VSA-WGAN on five public datasets

方法	零样本
	CUB	AWA1	AWA2	SUN	FLO
	$T 1$
SE-GZSL^［27］	59.6	69.5	69.2	63.4	—
f-CLSWGAN^［12］	57.3	68.2	—	60.8	67.2
cycle-CLSWGAN^［28］	58.4	66.3	—	60.0	70.1
f-VAEGAN^［29］	61.0	—	71.1	64.7	67.7
LisGAN^［30］	58.8	70.6	—	61.7	69.6
OCD-CVAE^［31］	60.3	—	71.3	63.5	—
TF-VAEGAN^［32］	64.9	—	72.2	66.0	70.8
CE-GZSL^［15］	77.5	71.0	70.4	63.3	70.6
VSA-WGAN	79.3	70.9	70.2	62.6	70.7

Table 2

Table 3

Fig. 7

Table 4

The split of the proposed dataset of 20 pest classes

设置	训练集	测试集
零样本	16类 $×$ 100张	$4 类 × 100 张$
广义零样本	$16 类 × 80 张$	$4 类 × 100 张 + 16 类 × 20 张$

Table 4

Table 5

Results of VSA-WGAN on the proposed pest dataset

方法	零样本	广义零样本
方法	$T 1$	U	S	H
CVAE^［9］	59.6	27.3	65.9	38.1
f-CLSWGAN^［12］	73.6	56.7	71.6	63.2
CE-GZSL^［15］	75.3	70.0	86.3	77.1
VSA-WGAN	77.4	72.0	86.4	78.3
VSA-WGAN-微调	78.2	69.2	92.3	79.2

Table 5

Table 6

Ablation results of VSA-WGAN on the proposed pest dataset

方法	零样本	广义零样本
方法	$T 1$	U	S	H
$L W G A N$	66.6	59.3	80.4	67.9
$L W G A N + L A l i g n$	68.2	59.1	82.6	68.7
$L W G A N + L C o n t r a s$	69.6	64.9	86.4	73.8
$L W G A N + L C o n t r a s + L A l i g n$	77.4	72.0	86.4	78.3

Table 6

Fig.8

Fig.9

Table 7

Results of VSA-WGAN on the proposed pest dataset under different semantic features

语义编码	属性维度/维	ZSL	GZSL
语义编码	属性维度/维	$T 1$	U	S	H
Word2Vec	100	63.8	51.9	71.1	59.0
手工定义（本研究）	65	77.4	72.0	86.4	78.3

Table 7

Fig.10

References 36

1	陆宴辉, 赵紫华, 蔡晓明, 等. 我国农业害虫综合防治研究进展[J]. 应用昆虫学报, 2017, 54(3): 349-363.
	LU Y H, ZHAO Z H, CAI X M, et al. Progresses on integrated pest management (IPM) of agricultural insect pests in China[J]. Chinese journal of applied entomology, 2017, 54(3): 349-363.
2	NGUGI L C, ABELWAHAB M, ABO-ZAHHAD M. Recent advances in image processing techniques for automated leaf pest and disease recognition-A review[J]. Information processing in agriculture, 2021, 8(1): 27-51.
3	郑果, 姜玉松, 沈永林. 基于改进YOLOv7的水稻害虫识别方法[J]. 华中农业大学学报, 2023, 42(3): 143-151.
	ZHENG G, JIANG Y S, SHEN Y L. Recognition of rice pests based on improved YOLOv7[J]. Journal of Huazhong agricultural university, 2023, 42(3): 143-151.
4	吴杰, 施磊, 张志安. 基于深度学习的害虫图像识别与分类方法研究[J]. 计算技术与自动化, 2023, 42(1): 166-173.
	WU J, SHI L, ZHANG Z A. Research on recognition and classification method of pest images based on deep learning[J]. Computing technology and automation, 2023, 42(1): 166-173.
5	TANG Z, CHEN Z, QI F, et al. Pest-YOLO: Deep image mining and multi-feature fusion for real-time agriculture pest detection[C]// 2021 IEEE International Conference on Data Mining (ICDM). Piscataway, New Jersey, USA: IEEE, 2021: 1348-1353.
6	STORK N E. How many species of insects and other terrestrial arthropods are there on earth?[J]. Annu rev entomol, 2018, 63: 31-45.
7	LAROCHELLE H, ERHAN D, BENGIO Y. Zero-data learning of new tasks[C]// AAAI'08: Proceedings of the 23rd national conference on Artificial intelligence. New York, USA: ACM, 2008, 2: 646-651.
8	POURPANAH F, ABDAR M, LUO Y, et al. A review of generalized zero-shot learning methods[J]. IEEE trans pattern anal Mach intell, 2023, 45(4): 4051-4070.
9	MISHRA A, REDDY S K, MITTAL A, et al. A generative model for zero shot learning using conditional variational autoencoders[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway, New Jersey, USA: IEEE, 2018: 2188-2196.
10	SOHN K, YAN X C, LEE H. Learning structured output representation using deep conditional generative models[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2. New York, USA: ACM, 2015: 3483-3491.
11	BOSER B E, GUYON I M, VAPNIK V N. A training algorithm for optimal margin classifiers[C]// Proceedings of The Fifth Annual Workshop on Computational Learning Theory. New York, USA: ACM, 1992: 144–152.
12	XIAN Y Q, LORENZ T, SCHIELE B, et al. Feature generating networks for zero-shot learning[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, New Jersey, USA: IEEE, 2018: 5542-5551.
13	ARJOVSKY M, CHINTALA S, BOTTOU L. Wasserstein generative adversarial networks[C]// Proceedings of the 34th International Conference on Machine Learning - Volume 70. New York, USA: ACM, 2017: 214-223.
14	HAN Z Y, FU Z Y, YANG J. Learning the redundancy-free features for generalized zero-shot object recognition[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2020: 12865-12874.
15	HAN Z Y, FU Z Y, CHEN S, et al. Contrastive embedding for generalized zero-shot learning[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2021: 2371-2381.
16	ZHONG F M, CHEN Z K, ZHANG Y C, et al. Zero- and few-shot learning for diseases recognition of Citrus aurantium L. using conditional adversarial autoencoders[J]. Computers and electronics in agriculture, 2020, 179: ID 105828.
17	WAH C, BRANSON S, WELINDER P, et al. The caltech-ucsd birds-200-2011 dataset[EB/OL]. [2023-12-10].
18	GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2. New York, USA: ACM, 2014: 2672-2680.
19	Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]// International Conference on Machine Learning. New York, USA: PMLR, 2021: 8748-8763.
20	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2016: 770-778.
21	DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, New Jersey, USA: IEEE, 2009: 248-255.
22	KINGMA D P, BA J. Adam: A method for stochastic optimization[EB/OL]. arXiv:1412.6980, 2014.
23	LAMPERT C H, NICKISCH H, HARMELING S. Learning to detect unseen object classes by between-class attribute transfer[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, New Jersey, USA: IEEE, 2009: 951-958.
24	XIAN Y, LAMPERT C H, SCHIELE B, et al. Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly[J]. IEEE trans pattern anal Mach intell, 2019, 41(9): 2251-2265.
25	PATTERSON G, XU C, SU H, et al. The SUN attribute database: Beyond categories for deeper scene understanding[J]. International journal of computer vision, 2014, 108(1): 59-81.
26	NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C]// 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. Piscataway, New Jersey, USA: IEEE, 2008: 722-729.
27	VERMA V K, ARORA G, MISHRA A, et al. Generalized zero-shot learning via synthesized examples[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, New Jersey, USA: IEEE, 2018: 4281-4289.
28	FELIX R, VIJAY KUMAR B G, REID I, et al. Multi-modal cycle-consistent generalized zero-shot learning[M]// Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 21-37.
29	XIAN Y Q, SHARMA S, SCHIELE B, et al. F-VAEGAN-D2: A feature generating framework for any-shot learning[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2019: 10275-10284.
30	LI J J, JING M M, LU K, et al. Leveraging the invariant side of generative zero-shot learning[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2019: 7402-7411.
31	KESHARI R, SINGH R, VATSA M. Generalized zero-shot learning via over-complete distribution[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2020: 13300-13308.
32	NARAYAN S, GUPTA A, KHAN F S, et al. Latent embedding feedback and discriminative features for zero-shot classification[M]// Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020: 479-495.
33	HUYNH D, ELHAMIFAR E. Fine-grained generalized zero-shot learning via dense attribute-based attention[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2020: 4483-4493.
34	SCHONFELD E, EBRAHIMI S, SINHA S, et al. Generalized zero- and few-shot learning via aligned variational autoencoders[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2019: 8247-8255.
35	CHEN S M, WANG W J, XIA B H, et al. FREE: feature refinement for generalized zero-shot learning[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2021: 122-131.
36	MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. arXiv:1301.3781, 2013.

方法	CUB			AWA1			AWA2			SUN			FLO
方法	U	S	H	U	S	H	U	S	H	U	S	H	U	S	H
SE-GZSL	41.5	53.3	46.7	56.3	67.8	61.5	58.3	68.1	62.8	40.9	30.5	34.9	—	—	—
f-CLSWGAN	43.7	57.7	49.7	57.9	61.4	59.6	—	—	—	42.6	36.6	39.4	59.0	73.8	65.6
DAZLE^［33］	56.7	59.6	58.1	—	—	—	60.3	75.7	67.1	52.3	24.3	33.2	—	—	—
cycle-CLSWGAN	45.7	61.0	52.3	56.9	64.0	60.2	—	—	—	49.4	33.6	40.0	59.2	72.5	65.1
f-VAEGAN	48.4	60.1	53.6	—	—	—	57.6	70.6	63.5	45.1	38.0	41.3	56.8	74.9	64.6
CADA-VAE^［34］	51.6	53.5	52.4	57.3	72.8	64.1	55.8	75.0	63.9	47.2	35.7	40.6	—	—	—
LisGAN	46.5	57.9	51.6	52.6	76.3	62.3	—	—	—	42.9	37.8	40.2	57.7	83.8	68.3
OCD-CVAE	44.8	59.9	51.3	—	—	—	59.5	73.4	65.7	44.8	42.9	43.8	—	—	—
TF-VAEGAN	52.8	64.7	58.1	—	—	—	59.8	75.1	66.6	45.6	40.7	43.0	62.5	84.1	71.7
FREE^［35］	55.7	59.9	57.7	62.9	69.4	66.0	60.4	75.4	67.1	47.4	37.2	41.7	67.4	84.5	75.0
CE-GZSL	63.9	66.8	65.3	65.3	73.4	69.1	63.1	78.6	70.0	48.8	38.6	43.1	69.0	78.7	73.5
VSA-WGAN	69.1	67.2	68.1	63.8	74.8	68.9	63.4	78.3	70.1	51.0	35.2	41.7	66.7	83.6	74.2