Remote Sensing Based Crop Classification of Maize Improving Model Robustness in State-of-the-Art Machine Learning Models
- Agricultural monitoring is necessary. Since the beginning of the Holocene, human agricultural
practices have been shaping the face of the earth, and today around one third of the ice-free land
mass consists of cropland and pastures. While agriculture is necessary for our survival, the
intensity has caused many negative externalities, such as enormous freshwater consumption, the
loss of forests and biodiversity, greenhouse gas emissions as well as soil erosion and degradation.
Some of these externalities can potentially be ameliorated by careful allocation of crops and
cropping practices, while at the same time the state of these crops has to be monitored in order
to assess food security. Modern day satellite-based earth observation can be an adequate tool to
quantify abundance of crop types, i.e., produce spatially explicit crop type maps. The resources to
do so, in terms of input data, reference data and classification algorithms have been constantly
improving over the past 60 years, and we live now in a time where fully operational satellites
produce freely available imagery with often less than monthly revisit times at high spatial
resolution. At the same time, classification models have been constantly evolving from
distribution based statistical algorithms, over machine learning to the now ubiquitous deep
learning.
In this environment, we used an explorative approach to advance the state of the art of crop
classification. We conducted regional case studies, focused on the study region of the Eifelkreis
Bitburg-Prüm, aiming to develop validated crop classification toolchains. Because of their unique
role in the regional agricultural system and because of their specific phenologic characteristics
we focused solely on maize fields.
In the first case study, we generated reference data for the years 2009 and 2016 in the study
region by drawing polygons based on high resolution aerial imagery, and used these in
conjunction with RapidEye imagery to produce high resolution maize maps with a random forest
classifier and a gaussian blur filter. We were able to highlight the importance of careful residual
analysis, especially in terms of autocorrelation. As an end result, we were able to prove that, in
spite of the severe limitations introduced by the restricted acquisition windows due to cloud
coverage, high quality maps could be produced for two years, and the regional development of
maize cultivation could be quantified.
In the second case study, we used these spatially explicit datasets to link the expansion of biogas
producing units with the extended maize cultivation in the area. In a next step, we overlayed the
maize maps with soil and slope rasters in order to assess spatially explicit risks of soil compaction
and erosion. Thus, we were able to highlight the potential role of remote sensing-based crop type
classification in environmental protection, by producing maps of potential soil hazards, which can
be used by local stakeholders to reallocate certain crop types to locations with less associated
risk.
In our third case study, we used Sentinel-1 data as input imagery, and official statistical records
as maize reference data, and were able to produce consistent modeling input data for four
consecutive years. Using these datasets, we could train and validate different models in spatially
iv
and temporally independent random subsets, with the goal of assessing model transferability. We
were able to show that state-of-the-art deep learning models such as UNET performed
significantly superior to conventional models like random forests, if the model was validated in a
different year or a different regional subset. We highlighted and discussed the implications on
modeling robustness, and the potential usefulness of deep learning models in building fully
operational global crop classification models.
We were able to conclude that the first major barrier for global classification models is the
reference data. Since most research in this area is still conducted with local field surveys, and only
few countries have access to official agricultural records, more global cooperation is necessary to
build harmonized and regionally stratified datasets. The second major barrier is the classification
algorithm. While a lot of progress has been made in this area, the current trend of many appearing
new types of deep learning models shows great promise, but has not yet consolidated. There is
still a lot of research necessary, to determine which models perform the best and most robust,
and are at the same time transparent and usable by non-experts such that they can be applied
and used effortlessly by local and global stakeholders.