1 Introduction
In 2015, all United Nations member states adopted the Sustainable Development Goals (SDGs) to address global challenges such as climate change, environmental degradation, poverty, and inequality (UN DESA, 2023; UN-GGIM:Europe, 2019). This international plan outlines 17 global goals to achieve a better and more sustainable future (UN DESA, 2023; UN-GGIM:Europe, 2019; United Nations, 2024). Having passed the midpoint of the SDGs’ timeline with significant setbacks, the critical role of timely and high-quality data has never been more apparent (UN DESA, 2023; United Nations, 2024). These data are vital to identifying challenges, formulating evidence-based solutions, monitoring the implementation of solutions, and making essential course corrections (UN-GGIM:Europe, 2019). However, despite this necessity for high-quality data, traditional monitoring approaches, such as household- or field-level surveys (ground-acquired data), remain the primary source of data collection for key indicators of SDGs by National Statistical Institutes (NSIs) (Burke et al., 2021; UN-GGIM:Europe, 2019). These methods are expensive and time-consuming to conduct (Burke et al., 2021). As a result, the frequency of ground-acquired data varies significantly around the world; for example, the most recent agricultural census for 24% of the world’s countries was more than 15 years ago (Burke et al., 2021). Recognizing this challenge, both the United Nations SDG Report (2023, p. 49) and the Global Working Group on Big Data for Official Statistics underscore the importance of innovative methodology and data sources, including remote sensing and machine learning, to enhance the monitoring and implementation of the SDGs (UN-GGIM:Europe, 2019; United Nations, 2017).
Remote sensing — data collected from a distance via satellite, aircraft, or drones — offers a cost-effective approach for monitoring wide-ranging geographic areas (Khatami et al., 2016; Maso et al., 2023; UN-GGIM:Europe, 2019; Zhao et al., 2022). Remote sensing imagery has been limited to agricultural and socioeconomic applications for decades (Burke et al., 2021; Lavallin & Downs, 2021; Zhang et al., 2022). For instance, the Laboratory for Applications of Remote Sensing (LARS) has utilized satellite data and machine learning methods for crop identification since the 1960s (Holloway & Mengersen, 2018). However, in recent years, there has been a considerable increase in the spatial, spectral, and temporal resolution of remote sensing data, alongside a significant increase in free sensor data and computational power for complex data analysis (Burke et al., 2021; Thapa et al., 2023; Zhang et al., 2022). The magnitude of possible applications and increased availability of remote sensing data have rapidly increased the number of published research papers in this field (Burke et al., 2021; Khatami et al., 2016). Earth observation satellites alone can measure 42% of the SDG targets (Zhang et al., 2022).
Despite the increased research and availability the uptake of remote sensing data by NSIs has been slow. However, many NSIs are now capitalizing on the potential of using new and consistent data sources and methodologies to support and inform official statistics (United Nations, 2017). These can be generated by combining geospatial information, remote sensing, and other big data sources, allowing for the filling of data gaps, providing information where no measurements were previously made, and improving the temporal and spatial resolutions of data (e.g., daily updates on crop area and yield statistics). This paradigm shift from traditional statistical methods—such as counting and measuring by humans—towards estimation from sensors, simulation, and modelling, presents challenges, and requires convincing, statistically sound results, rigorous validation, and a significant shift in resources within institutions to adapt to the higher spatial and temporal resolutions necessary to address emerging policy questions (United Nations, 2017).
Given the wide variety of methodologies and contexts in previous studies, a critical question arises: What factors influence the performance of machine learning models using remote sensing data for SDG monitoring? A meta-analysis statistically combines the body of evidence on a specific topic, aiming to produce unbiased summaries of evidence (Iliescu et al., 2022). There are many potential methods to choose from to combine results. One choice that is made when conducting a meta-analysis is whether to use the study’s sample size to weigh the result of each study (sample-weighted estimate) or an unweighted approach, which treats all results equally, disregarding sample size (Hall & Rosenthal, 2018). The current standard in meta-analysis research is to use the sample-weighted estimate (Hall & Rosenthal, 2018). However, the previous meta-analyses investigating the performance of machine learning models on remote sensing data have exclusively relied on unweighted approaches. While these studies have found that certain models, such as Support Vector Machines (SVM) and deep learning methods, often outperform traditional classifiers, the magnitude of these differences can vary across applications. For example, Khatami et al. (2016) selected studies with more than one model, and by making pairwise comparisons they concluded that SVM consistently outperformed other classification models. However, these meta-analyses relied on unweighted approaches, potentially overlooking if these variations in results are due to differences in sample sizes, which could affect the reliability and precision of the findings, as larger studies generally provide more accurate estimates.
Therefore, this study seeks to address the question of how machine learning models perform when applied to remote sensing data for SDG monitoring. By conducting a meta-analysis on peer-reviewed research articles in this domain, the study aims to; (1) estimate the average performance (summary effect size), (2) determine the degree of heterogeneity within and across studies, (3) assess whether specific study features influence model performance, and (4) compare the sample-weighted and unweighted estimate summary effect.