Active dataset
Intro

The analysis on this dashboard updates daily (at 12:00 CET). Users can make contribute to the data for this analysis and see how it affects the results. For information on how to add to the dataset see Contribute data page. The table bellow shows the data collected columns are added by contributor.

Table 1: Summary table
Feature nina_2024, N = 86 test, N = 1
Overall Accuracy 0.90 (0.65 - 1.00) 0.90 (0.90 - 0.90)
Sample Size 6,401,352 (259 - 75,782,016) 100 (100 - 100)
Majority-class Proportion 0.72 (0.14 - 1.00) 0.90 (0.90 - 0.90)
Ancillary Data
Ancillary Data Included 15 (17%) 0 (0%)
Remote Sensing Only 71 (83%) 1 (100%)
Indices
Not Used 23 (27%) 0 (0%)
Used 63 (73%) 1 (100%)
Number of Spectral Bands
Low 16 (19%) 0 (0%)
Mid 28 (33%) 0 (0%)
Not Reported 42 (49%) 1 (100%)
Confusion Matrix
Not Reported 23 (27%) 0 (0%)
Reported 63 (73%) 1 (100%)
Model Group
NN 41 (48%) 0 (NA%)
Other 4 (4.7%) 0 (NA%)
RF 36 (42%) 0 (NA%)
SVM 5 (5.8%) 0 (NA%)
Unknown 0 1
Note:
Mean (Min - Max); n(%)
Meta-analysis
Methods
About the fit model

The fit model for this analysis is a multilevel meta-regression using the rma.mv function from the metafor package. This model allows for a random-effects meta-regression, where the goal is to account for variability both within and between studies. The model here is:

\[\text{overall accuracy}_{_{transformed}} = \text{proportion majority class + ancillary + indices + confusion matrix + model group}\]

# Calculate effect sizes and variance using escalc function from metafor package
ies.da  <- escalc(xi = event , ni = sample_size , data = data ,
               measure = "PFT",  # FT double  arcsine  transformation
               slab=paste(AuthorYear, " Estimate ", esid)
               ) 

# Fit a multilevel random-effects meta-regression model
meta_reg <- rma.mv(yi, vi,
  data = ies.da,
  random = ~ 1 | AuthorYear / esid,
  tdist = TRUE,
  method = "REML",
  test = "t",
  dfs = "contain",
  mods = ~ fraction_majority_class + ancillary + 
            indices +  no_band_group + Confusion_matrix +
            model_group 
)

I choose the most important models see lay-summary for more information. I also included model group because Khatami, Mountrakis, and Stehman (2016) found significant differences when comparing models groups.

Results
Heterogeneity

The table bellow shows the heterogeneity in the overall accuracy from the dataset, both with and without study features. Heterogeneity is a measure of the variability in effect sizes across studies, and it helps to understand the extent to which the included studies are similar or different from one another.

Table 2: Heterogeneity results
Without-
With- study features
$\sigma^2_{\text{level2}}$ $\sigma^2_{\text{level3}}$ $\sigma^2_{\text{level2}}$ $\sigma^2_{\text{level3}}$ $Q_E$ df $p_Q$ $F$ df $p_F$ $I^2_{\text{level2}}$ $I^2_{\text{level3}}$ $R^2_{\text{level2}}$ $R^2_{\text{level3}}$
0.01 0.017 0.009 0.006 11369118 76 0 4 9 0.018 60.53 39.47 6.3 63.8

Without Study Features: This column shows the heterogeneity estimates when no study features are considered. With Study Features: This column indicates the heterogeneity estimates when study features (model defined above) are included in the analysis. The metrics included are:

  • \(\sigma^2\) The variance estimates at different levels (level 2: within- and level 3: between- study).

  • Q and p-values: Test statistics for heterogeneity, where a significant p-value (\(p_Q\)) suggests significant heterogeneity.

  • \(I^2\): The percentage of variability due to heterogeneity rather than chance.

  • \(R^2\): The proportion of variance explained by the model.

Which study features explain the heterogeneity

The coefficient table shows the impact of each study feature on the overall accuracy. Positive or negative values indicate the direction and magnitude of the effect for each feature. It is important to note that the results shown are in a transformed scale and may not have the same interpretation when back-transformed

  • Feature Name: Each row represents a study feature.

  • Estimate: The coefficient (or beta) for each feature, representing the strength and direction of its influence.

  • Standard Error: The standard deviation of the coefficient, which helps understand the uncertainty around the estimate.

  • p-value: This indicates whether the feature is statistically significant. If the p-value is less than 0.05, the feature likely has a meaningful impact on the heterogeneity.

Table 3: Coefficients table
term estimate std.error p
intercept 0.8412431 0.1053986 <0.001
fraction_majority_class 0.3895363 0.1019149 <0.001
ancillaryRemote Sensing Only 0.0966632 0.0600436 0.112
indicesUsed 0.0215525 0.0540252 0.691
no_band_groupMid 0.0802330 0.0550866 0.149
no_band_groupNot Reported 0.0702445 0.0750367 0.371
Confusion_matrixReported 0.0413344 0.0520051 0.445
model_groupOther 0.0626488 0.0818846 0.447
model_groupRF -0.0275940 0.0605971 0.65
model_groupSVM 0.0325765 0.0599318 0.588
Note:
When the analysis is done with only the nina_2024 data, indcluding model group means that the study feature: ancillary is not signficant, this is different from the conclusions made in thesis manuscript as the model fit here is not the 'best' model.
Bubble Plot

To visualize the relationship between the proportion of majority class and overall accuracy. Each bubble represents a result from a study, with the size of the bubble indicating the study’s weight or sample size.