[Objective] The detection of corn borer infestations is essential for improving maize yield and quality, as corn borer pests pose a significant threat to global maize production. In traditional agricultural practices, identifying corn borer infestations relies on manual field inspections or trapping tools, which are labor-intensive, time-consuming, and difficult to implement over large areas. These methods are further limited by their susceptibility to human error and inability to meet the demands of modern precision agriculture. To address these challenges, a method for detecting corn borer infestations using low-altitude, close-range imagery captured by unmanned aerial vehicles (UAVs) and a model are investigated, you only look once enhanced small object network (YOLO-ESN). By focusing on detecting boreholes rather than insect bodies, this approach overcomes the difficulties of detecting corn borers, which are nocturnal and often concealed within plant tissues, thereby enhancing the applicability of field-based detection and aligning with practical field conditions. [Methods] Based on the you only look once version 11 (YOLOv11) object detection algorithm, a model—YOLO-ESN was introduced, for corn borer infestation detection. The YOLO-ESN model has been optimized through multiple modifications. In the Backbone, an enhanced lightweight attention (ELA) mechanism was incorporated to increase sensitivity and improve the extraction of small visual features, such as boreholes, by modeling spatial dependencies in horizontal and vertical directions using one-dimensional convolutions. In the Neck, a C3k2-Spatial and channel reconstruction convolution (C3k2-SCConv) module was introduced to reduce the number of model parameters while improving feature fusion efficiency through spatial and channel reconstruction, suppressing redundant information. In the Head, a small-object detection branch, termed the P2 detection head, was added, enabling YOLO-ESN to directly utilize shallow, high-resolution features from early network layers to enhance the detection of fine-grained targets like boreholes. Additionally, a combined loss function of normalized wasserstein distance (NWD) and efficient intersection over union (EIoU) was employed to optimize bounding box regression accuracy, addressing gradient vanishing issues for small targets and improving target localization stability and robustness. A decision tree algorithm was applied to classify infestation severity levels based on borehole detection results, and heatmaps were generated to visualize the spatial distribution of corn borer infestations across the field. [Results and Discussions] Multiple experiments were conducted using a constructed dataset of corn borer infestation images. The results demonstrated that YOLO-ESN achieved an mAP@50 of 88.6% and an mAP@50:95 of 40.5%, representing an improvement of 7.6 and 4.9 percent points, respectively, compared to the original YOLOv11 model. The total number of parameters in YOLO-ESN was reduced by 11.52%, contributing to a lighter model suitable for UAV deployment. Ablation studies evaluated individual contributions: incorporating the ELA mechanism alone improved mAP@50 by 0.3 percent points, and the parameters are reduced by 10.57%; replacing the C3k2 module with C3k2-SCConv reduced parameters by 2.5% while increasing mAP@50 by 0.9 percent points; adding the P2 detection head enhanced mAP@50 and mAP@50:95 by 4.1 and 1.2 percent points, respectively; and introducing the NWD+EIoU loss function improved mAP@50 and mAP@50:95 by 1.9 and 1.2 percent points, respectively. Comparative experiments demonstrate that YOLO-ESN outperforms a range of mainstream object detection models, including Faster R-CNN, SSD, YOLOv8, YOLOv11, and YOLOv12. YOLO-ESN achieves an mAP@50 and an mAP@50:95, surpassing Faster R-CNN by 14.9 and 9.7 percentage points, respectively, and SSD by 17.8 and 11.4 percentage points, respectively. With a compact parameter size of 8.37 M, YOLO-ESN delivers excellent detection accuracy and generalization, striking a strong balance between performance and efficiency. Although its inference speed (32.48 FPS) was slightly slower than YOLOv12 (75.44 FPS), it offered a superior trade-off between accuracy and efficiency. These results validated YOLO-ESN as a lightweight, high-performing solution for small object detection tasks, such as dense small targets in remote sensing images. The decision tree algorithm classified infestation severity with high accuracy, achieving F1-Scores of 0.906, 0.803, and 0.842 for mild, moderate, and severe infestations, respectively. Heatmaps generated from borehole detection results enabled spatial visualization of infestation severity, providing a scientific basis for quantitative monitoring and targeted pesticide application in field infestations. [Conclusions] The results show that the YOLO-ESN model has more advantages in overall detection accuracy and running speed. While improving the lightweight degree and deployment efficiency of the model, it also shows better recognition ability in small target detection, and can accurately locate the wormhole area on the corn leaf, effectively improving the bounding box regression accuracy and feature extraction efficiency. Compared with the traditional insect recognition method, the use of wormholes as detection objects is more in line with the actual field situation, effectively avoiding the problems of insect occlusion and strong concealment, and improving the availability of field image data and algorithm robustness. The heat map generated by the model detection results can also effectively display the distribution changes of insect pests in farmland, providing a scientific basis for precision pesticide spraying and farmland management. Overall, this study provides an effective solution for the intelligent detection of corn borer pests, has strong versatility and promotion prospects, and can provide strong technical support for precision agriculture and smart farmland management.