[Objective] To fully utilize and protect farmland and lay a solid foundation for the sustainable use of land, it is particularly important to obtain real-time and precise information regarding farmland area, distribution, and other factors. Leveraging remote sensing technology to obtain farmland data can meet the requirements of large-scale coverage and timeliness. However, the current research and application of deep learning methods in remote sensing for cultivated land identification still requires further improvement in terms of depth and accuracy. The objective of this study is to investigate the potential application of deep learning methods in remote sensing for identifying cultivated land in the hilly areas of Southwest China, to provide insights for enhancing agricultural land utilization and regulation, and for harmonizing the relationship between cultivated land and the economy and ecology. [Methods] Santai county, Mianyang city, Sichuan province, China (30°42'34"~31°26'35"N, 104°43'04"~105°18'13"E) was selected as the study area. High-resolution imagery from two scenes captured by the Gaofen-6 (GF-6) satellite served as the primary image data source. Additionally, 30-meter resolution DEM data from the United States National Aeronautics and Space Administration (NASA) in 2020 was utilized. A land cover data product, SinoLC-1, was also incorporated for comparative evaluation of the accuracy of various extraction methods' results. Four deep learning models, namely Unet, PSPNet, DeeplabV3+, and Unet++, were utilized for remote sensing land identification research in cultivated areas. The study also involved analyzing the identification accuracy of cultivated land in high-resolution satellite images by combining the results of the random forest (RF) algorithm along with the deep learning models. A validation dataset was constructed by randomly generating 1 000 vector validation points within the research area. Concurrently, Google Earth satellite images with a resolution of 0.3 m were used for manual visual interpretation to determine the land cover type of the pixels where the validation points are located. The identification results of each model were compared using a confusion matrix to compute five accuracy evaluation metrics: Overall accuracy (OA), intersection over union (IoU), mean intersection over union (MIoU), F1-Score, and Kappa Coefficient to assess the cultivated land identification accuracy of different models and data products. [Results and Discussions] The deep learning models displayed significant advances in accuracy evaluation metrics, surpassing the performance of traditional machine learning approaches like RF and the latest land cover product, SinoLC-1 Landcover. Among the models assessed, the UNet++ model performed the best, its F1-Score, IoU, MIoU, OA, and Kappa coefficient values were 0.92, 85.93%, 81.93%, 90.60%, and 0.80, respectively. DeeplabV3+, UNet, and PSPNet methods followed suit. These performance metrics underscored the superior accuracy of the UNet++ model in precisely identifying and segmenting cultivated land, with a remarkable increase in accuracy of nearly 20% than machine learning methods and 50% for land cover products. Four typical areas of town, water body, forest land and contiguous cultivated land were selected to visually compare the results of cultivated land identification results. It could be observed that the deep learning models generally exhibited consistent distribution patterns with the satellite imageries, accurately delineating the boundaries of cultivated land and demonstrating overall satisfactory performance. However, due to the complex features in remote sensing images, the deep learning models still encountered certain challenges of omission and misclassification in extracting cultivated land. Among them, the UNet++ model showed the closest overall extraction results to the ground truth and exhibited advantages in terms of completeness of cultivated land extraction, discrimination between cultivated land and other land classes, and boundary extraction compared to other models. Using the UNet++ model with the highest recognition accuracy, two types of images constructed with different features—solely spectral features and spectral combined with terrain features—were utilized for cultivated land extraction. Based on the three metrics of IoU, OA, and Kappa, the model incorporating both spectral and terrain features showed improvements of 0.98%, 1.10%, and 0.01% compared to the model using only spectral features. This indicated that fusing spectral and terrain features can achieve information complementarity, further enhancing the identification effectiveness of cultivated land. [Conclusions] This study focuses on the practicality and reliability of automatic cultivated land extraction using four different deep learning models, based on high-resolution satellite imagery from the GF-6 in Santai county in China. Based on the cultivated land extraction results in Santai county and the differences in network structures among the four deep learning models, it was found that the UNet++ model, based on UNet, can effectively improve the accuracy of cultivated land extraction by introducing the mechanism of skip connections. Overall, this study demonstrates the effectiveness and practical value of deep learning methods in obtaining accurate farmland information from high-resolution remote sensing imagery.