On the spatio-temporal representativeness of observations

Nick Schutgens1, Svetlana Tsyro2, Edward Gryspeerdt3, Daisuke Goto4, Natalie Weigum1, Michael Schulz2, and Philip Stier1 1Department of Physics, University of Oxford, Parks road, OX1 3PU, England 2Norwegian Meteorological Institute, P.O.Box 43 Blindern, Oslo, NO-0312, Norway 3Institute for Meteorology, Universität Leipzig, Stephanstr. 3, 04103 Leipzig, Germany (now at Space and Atmospheric Physics Group, Imperial College London, London, SW7 2AJ, United Kingdom) 4National Institute for Environmental Studies, 16-2 Onogawa, Tsukuba, 305-8506, Japan Correspondence to: Nick Schutgens (schutgens@physics.ox.ac.uk)

to 50 km) and show that even after substantial averaging of data significant representation errors may remain, larger than typical measurement errors.Our study considers a variety of observations: ground-site remote sensing or in-situ (PM 2.5 , black carbon mass or number concentrations), satellite remote sensing with imagers or LIDARs (extinction).We show that observational coverage (a measure of how dense the spatio-temporal sampling of the observations is) is not an effective metric to limit representation errors.Different strategies to construct monthly satellite L3 data are assessed and temporal averaging of spatially aggregated observations (super-observations) is found to be the best, although it still allows for significant representation errors.However, temporal collocation of data (possible when observations are compared to model data or other observations), combined with 1 Introduction The intermittent temporal sampling and limited field-of-view of observations reduce their representativeness for the actual weather or climate system they are intended to explore (Nappo et al., 1982).Yet relatively little work has been done on estimating these sampling impacts and how to mitigate them.At the root of this issue lies the spatio-temporal variability of the natural system, but the large variety in sampling strategies of observing systems adds significantly to the complexity of the problem.A representation error can be used to describe the ability of measurements to represent a larger area over an arbitrary (but specified) length of time.If the observations are used to evaluate models, these represented areas would coincide with the model's gridboxes.Hakuba et al. (2014b, a) studied the spatial representativeness of ground-sites for solar surface radiation measurements and Bulgin et al. (2016) parametrised the spatial sampling uncertainty in gridded SST (sea surface temperature) measurements (cloud masked) from satellite.Climate statistics were shown to differ between point data and gridded data in theoretical studies by Cavanaugh and Shen (2015) and Director and Bornn (2015).Sampling issues in trace gas measurements from either satellites or ground networks have been studied by Sofieva et al. (2014); Coldewey-Egbers et al. (2015); Lin et al. (2015) and Boersma et al. (2016).Recently, Diedrich et al. (2016) studied the impact of cloud-masking in water vapour measurements from satellite and found a 25% lower monthly global mean water vapour path.
In this paper, we will focus on aerosol but our results can be expected to have wider implications.Since the landmark study by Anderson et al. (2003) we know aerosol varies over hours and tens of km, see also Kovacs (2006); Santese et al. (2007); Shinozuka and Redemann (2011); Schutgens et al. (2013); Weigum et al. (2016).Aerosol studies are likely to show a very clear impact from spatio-temporal sampling.Kaufman et al. (2000); Smirnov (2002) and Remer et al. (2006) attempted to assess the impact of diurnal cycles on the representativeness of satellite observations for daily averages.Similarly, Sayer et al. (2010) andGeogdzhayev et al. (2014) estimated the impact of satellite sampling on monthly and yearly regional averages.These studies showed that significant differences might result from temporal sampling alone.Levy et al. (2009) studied different algorithms to create monthly MODIS (MODerate resolution Imaging Spectro-radiometer) gridded data (so-called L3) and showed large differences might result.A major issue for Levy et al. (2009) was the absence of an objective truth.
The term representation error (or representativity or representativeness error) is often used in data assimilation where a growing body of research exists, e.g.Desroziers et al. (2005); Waller et al. (2014); Hodyss and Nichols (2015); van Leeuwen (2015); Waller et al. (2016).In data assimilation the representation error concerns very short time scales: observations are compared against model data at specific times.In this paper we are also interested in representation errors after averaging over months or even years.Conceptually representation errors in data assimilation have evolved to include model errors due to poorly represented sub-grid processes.In this paper, we are only concerned with the spatio-temporal representativeness of observations.
In two recent studies, we explored temporal and spatial sampling issues using aerosol models as a truth.In Schutgens et al. (2016a) (henceforth S16a) spatial sampling issues (when evaluating global models with grid-boxes of a few 100's of kms) were explored on time-scales of hours to a month using high-resolution model data.S16a is a study of representation errors for continuously measuring (in-situ) ground-sites or incidental flight campaigns.It shows that different observations can lead to very different representation errors.It includes sensitivity studies for various strategies in comparing a global model to the observations.In Schutgens et al. (2016b) (henceforth S16b) temporal sampling issues are explored on time-scales of days to a year using global model data and real remote sensing datasets.S16b compares representation errors to actual model errors and finds them to be of similar magnitude.It shows that models compare better with real observations after temporal collocation.Also, it finds that the representation errors for visual remote sensing data depend on longitude when using daily model data.Both intensive (e.g.single scattering albedo) and extensive (e.g.aerosol optical depth) observables suffer from representativeness issues.
In S16a, we assumed that observations were made continuously in time, while in S16b we assumed that that global model data and observations had the same spatial extent (the model's grid-box).Both assumptions are idealistic and limited our analysis.In the current paper, we will study the combined impact of spatio-temporal sampling on representation errors for a wide variety of observing systems (ground-site in-situ, ground-site passive remote sensing, satellite passive 60 and active remote sensing) on a range of time-scales from hourly to semi-annually.This allows us to study e.g.sampling issues in satellite L3 data, or the magnitude of remaining representation errors after temporal collocation.It also allows us to elucidate the interplay of spatial and temporal 65 sampling in creating representation errors.
Section 2 describes the high-resolution model data and how they were used to create simulated observations.Section 3 explains how representation errors are calculated from these data.Results for semi-annual averages (Sect.4), 70 monthly averages (Sect.5), daily averages (Sect.6) and subdaily data (Sect.7) follow.The impact of precipitation on sampling issues is discussed in Sect.8.An overview of the lessons learned for different observing systems is given in Sect.9 and the paper concludes with a summary (Sect.10) 75 Note that Sect.3.2 contains some general guidelines to interpreting many of the figures and statistics that appear in this paper.

The regional models
The same simulations as in S16a are used in the cur-80 rent study and for details we refer to that paper.Briefly, the models WRF-Chem (Grell et al., 2005;Fast et al., 2006), EMEP/MSC-W (Simpson et al., 2012) and NICAM-SPRINTARS (see Goto et al. (2015) and references therein) were used to simulate common observables (aerosol optical 85 thickness, extinction, PM 2.5 , black carbon mass concentration, number densities and cloud condensation nuclei) on a 10 km grid with hourly resolution (snapshots, only precipitation data are accumulated).All models nudged windspeeds to reanalysis meteorology and used emissions with diurnal 90 profiles where relevant.Fig. 1 shows the simulation regions, and Table 1 summarises the most important information on these simulations.
As precipitation is potentially a major cause of spatiotemporal variability in aerosol, we evaluated the models 95 against GPCP (Global Precipitation Climatology Project, Adler et al. (2003); Huffman et al. (2009)) 1-degree daily combination v1.2 data (Huffman et al. (2001), see also http://precip.gsfc.nasa.gov/gpcp_daily_comb.html).Histograms of daily precipitation in the models compare 100 quite well to these observations, see Fig. 2).At higher daily precipitation, there is quite a bit of statistical noise due to the low number of cases, as can be seen by comparing the observation over W-Europe and Europe.The most notable differences from the observations are found for Congo, where 105 the model tends to overestimate precipitation, and Ocean & Japan, where the models tend to underestimate low precipitation cases.

Observable parameters
The simulated fields examined in this paper are, for obvious reasons, all observables, see Table 2.All of the models provided AOT (Aerosol Optical Thickness), AE (Ångström Exponent), SSA (Single Scattering Albedo), extinction and (dry) PM 2.5 , although WRF-Chem calculates AOT and extinction for 600 nm and EMEP and NICAM-SPRINTARS for 550 nm.WRF-Chem MADE provided CCN (Cloud Condensation Nuclei) at varying degrees of super-saturation S. Converting WRF-Chem output into observables of black carbon concentration (BC) or number densities (N10 and N50, number densities for particles with diameters exceeding 10 resp.50 nm) required some further assumptions that are detailed in S16a.
The spatio-temporal sampling of real observations is determined by their operational parameters and by adverse conditions.For simplicity's sake, we created a number of idealised scenarios for different observing systems.Additional model information like local times, cloud fraction and precipitation were used to create spatio-temporal samplings for the observations.
Ground-site in-situ measurements are assumed to occur at all times, irrespective of conditions but constrained by operational parameters, e.g.IMPROVE (Interagency Monitoring of PROtected Visual Environments) measures only a full day every three days.Note that this is a best case scenario and most ground-sites will suffer down-time due to maintenance or malfunction.In particular we assume that these measurements will occur irrespective of precipitation since this usually does not prevent measurements.Obviously, insitu ground-sites only observe a small part (here 10 by 10 km) of the atmosphere near the surface.
Ground site remote sensing observations of AOT will occur during the day-light portion of each day (here 10 hours straddling local noon), provided there are no clouds.These ground-sites will observe only a small portion (10 by 10 km) of an atmospheric column.Again, down-time due to maintenance or malfunction is not considered.
Passive satellites measurements (imager data) on polar orbiting satellites are assumed to occur once during a day at local noon, provided there are no clouds.Imagers will have swaths wide enough to allow aggregation of individual measurements over the represented area.Due to its orbital parameters and swath width, these satellites will have repeat cycles of 1, 2, 4 or 8 days.Imagers on geostationary satellites allow measurements during the day-light portion of each day (10 hours straddling local noon This section briefly describes the main methodology used in this paper.The high-resolution regional model data v can be thought of as 3-dimensional data cube v xyt (either a column or layer property) where x = 1 . . .n x and y = 1 . . .n y are indices to the horizontal coordinates, and t = 1 . . .n t is an in-65 dex to the time coordinate.As the model data has been transformed to a regular grid, equations can conveniently be written down with references to indices only.Using this data cube v xyz , we will generate both a truth (an average over a wider area that is to be represented) and a sampled but otherwise 70 noiseless (i.e.without measurement error) observation.At a single time, the truth for a represented area can be written as where L x and L y define the half-lengths of the represented 75 area.A time average of this is given by where (2L t + 1) defines the averaging period.Note that a capital variable name denotes a spatial average and an overbar a temporal average.

80
In a very similar way, a spatio-temporal average of the observations may be written as where l x , l y and l t serve a similar purpose as L x , L y and L t .The observational sampling f xyt is defined as: Note that this is a very general formulation that can be used to simulate both individual ground-sites and satellite mea-90 surements.
The relative spatio-temporal representation error in an observation for arbitary time and length-scales is now given by (5) When observations are used to evaluate models, it is possible to temporally collocate model data with observations.We simulate this by constructing Txyt from a sub-sampled number of T xyt and the resulting error will be called "representation error with collocation".Note that it is possible to aggregate observations spatially 10 before temporally averaging them: Actually, the two expressions for Ōxyt may be related to alternative averages that were proposed by Levy et al. (2009) for satellite L3 products.Their "Pixel Weighting" procedure corresponds to Eq. 3, while procedures "Equal Day Weighting" and "Threshold Equal Day Weighing" correspond to Eq. 7. The difference between the latter two is in the construction of G xyt (requiring a minimum number of pixels for a valid super-observation or not).
To conclude, we introduce three metrics of the abundance of measurements that go into Ō as this will affect how well it compares to the truth.The spatial coverage of a single superobservation is The temporal coverage of a time-averaged super-observation is defined differently because many observations are made not continuously but nevertheless regularly in time (e.g.satellite overpass times): where G * is a sampling entirely defined by the observational cycle of the observing system.This includes orbital and daylight constraints but not cloudiness.Note that in real life, these coverages are known and can be used to select observations; e.g.only aggregated satellite data with a required minimum spatial coverage will be used to compare against model results or only ground-sites with a required minimum temporal coverage will be used to construct monthly averages.
(Henceforth we will refer to required coverage and drop the word 'minimum').
Each data cube v xyt will allow us to generate n T cases of the truth Txyt , because the simulated regions are much larger than the represented areas.The number n O of possible Ōxyt cases will be less, depending on both f xyt and G xyt .This leads to the definition of a case coverage n O /n T .Ideally the case coverage is 100% which is possible even if f xyt and G xyt are not always 1 and indicates there are sufficient observations to construct valid Ōxyt anywhere and anytime.
As explained in S16a, the first two days of the highresolution simulations and the outer part of the spatial domain where excluded from analysis to prevent boundary effects to impact our results.

Some terminology
Representation error will refer to the representativeness of an observation (possibly aggregated over an area and averaged over a time period) in describing the natural system.If observations are used to evaluate temporally collocated model data, we will refer to a representation error with collocation.We will consider two collocation methodologies: to the hour or to the day.In the first case, hourly model data is temporally collocated to the hour of the observation.In the second case, daily model data is collocated to the day of the observation (and the observation itself is a daily average, to the extent that is possible).

Common characteristics of the figures in this paper
This paper contains many figures of representation error distributions.Instead of repeating the same information in each caption, some aspects of those figures are explained here.We use the so-called parametric 7-number summary of the 2, 9, 25, 75, 91 and 98% quantiles q of the errors because, for a normal distribution, these quantiles will be equally spaced.Any skewness or extended wings in a distribution will be readily visible.In addition to quantiles, we will provide RMSD (root mean square differences) and RMSE (root mean square errors, essentially RMSD after removing any bias).

Figures with grey shading
In Fig. 5 different shades of grey are used to denote these interquantile ranges: light grey for q 98 − q 2 , medium grey for the q 91 − q 9 and dark grey for q 75 − q 25 .The solid blue line represents the median error.

Figures with box-whiskers
In Fig. 6, box-whisker plots are shown of the error distributions for each of the regions.Different widths of the bars are used to denote different inter-quantile ranges: narrow for q 98 − q 2 , medium for q 91 − q 9 and wide for q 75 − q 25 .The In Fig. 9 error distributions for two different experiments are shown side-by-side (much like a violin plot), for each region and usually as a function of an independent parameter 20 (e.g.represented area size in this example).The values above each box-whisker is the ratio of the right error distribution's RMSD to the left one's.

Figures with line graphs
A very different figure is Fig. 8 where error statistics are summarised as a function of required spatial or temporal coverage.The coloured lines represent RMSE (solid) and bias (dashed) using the left-hand axis.The colours are identical to the ones used in the box-whisker plots to help identify different experiments.The black lines use the right-hand axis and denote the case coverage (solid), and achieved spatial (dashed) and temporal (dotted) coverage.The latter have of course been averaged over all relevant cases.

Representativeness of semi-annual data
Only the EMEP simulation, Table 1, allows us to explore sampling issues in semi-annual data, assuming ground-sites representing an area of 210 × 210 km 2 .Figure 3 shows relative representation errors in AOT and surface BC mass concentrations.The surface BC measurements are continuous through the 6 months while the AOT measurements are only made during day-time and cloud-free conditions, see Sect.2.1.
Representation errors in surface BC measurements are clearly related to emissions sources (notice major cities like Paris and Madrid) and orography (notice the Alps, the Apennines and the Carpathian mountains).On the other hand, representation errors in AOT are dominated by temporal sampling and show a clear region-wide bias as observable AOT tends to be lower than average AOT (mostly due to increased humidity in cloudy columns).In both cases, representation errors can be several 10's of percent.If the AOT measurements are used for model evaluation, temporal collocation of model data to the observations (as advocated in S16b) is possible and the errors are reduced significantly.In particular, the region-wide bias is much reduced and the remaining error pattern is more similar to that for BC, see Fig. 4.
Table 3 shows representation errors for several ACTRIS (Aerosol, Clouds & Trace gases Research Infra-Structure) sites within the Europe domain, not just for long-term averages but daily RMSD as well.Representation errors driven by spatial sampling often benefit from temporal averaging unlike errors due to temporal sampling.Collocation removes the difference in temporal sampling and allows remaining representation errors to be reduced through temporal averaging.Note that sources and orography can create conditions where temporal averaging is not very beneficial.
The impact of averaging period on spatial representation (AOT is now assumed to be measured continuously) can be seen in Fig. 5.This suggests that averaging over less than 10 hours or more than 1000 hours (6 weeks) has little impact on spatial representation errors.
Note that in S16a we showed that the EMEP simulation yielded smaller spatial representation errors than the WRF-Chem simulation (although they agreed in magnitude and spatial patterns).

Representativeness of monthly data
The following analysis was made for a represented area of 210 × 210 km 2 , with exceptions noted.All data were averaged over a month.

Remote sensing ground-site
We start with the case of a remote sensing ground-site, see Sect.2.1.Figure 6 shows representation errors for different regions as box-whisker plots.The figure shows that temporal sampling significantly increases representation errors.Over Ocean and Japan, that even leads to region-wide biases.Temporal sampling is dominated by cloudiness, and cloudy AOT (included in the area data) is larger than clear-sky AOT for these regions.
When evaluating models, Fig. 7 shows that temporal collocation of area data with the observations can substantially reduce representation errors.Here we limited ourselves to locations with at least 25% temporal coverage.Note that temporal coverage is a 100% if each day during the month yields 10 hours of observations.Obviously, representation errors after collocation can never be smaller than purely spatial representation errors.Interestingly, collocation to the day is much less beneficial than collocation to the hour, even after averaging over a month.
Figure 8 shows error estimates as a function of required temporal coverage for two regions that are typical.As a rule, with increasing temporal coverage the case coverage will go down.This means that the number of ground-sites supplying sufficient observations goes down.Representation errors may go down (Japan) but it is also possible they remain constant (Oklahoma).For all regions, collocation to the hour allows smaller representation errors at lower temporal coverage and higher case coverage than no collocation.
Representation errors are remarkably insensitive to the size of the represented area, unless area data can be temporally collocated, see Fig. 9.This is unsurprising as we earlier pointed out that temporal sampling dominates the representation error.
Figure 10 shows maps of the monthly representation errors.It shows that without collocation, or with collocation to the day, representation errors may strongly correlate over a large part of the region.Although Fig. 7 suggested that representation errors without collocation were unbiased for Oklahoma, this is only because those errors are positive in 20 lower half of the region and negative in the upper part.With collocation to the hour not only are the representation errors smaller but they correlate over smaller distances.Hence collocation to the hour makes it more likely that subsequent spatial averaging (e.g. over multiple ground-sites) will further reduce representation errors.

Passive remote sensing measurements from polar orbiting measurements
Next we turn to polar-orbiting satellite measurements with repeat cycles of 1 or 8 days, see Sect.2.1.For now, we will assume that individual pixel measurements are averaged together (i.e.no super-obbing), see Eq. 3. Fig. 11 shows representation errors for different regions as box-whisker plots.Due to the aggregation of measurements, purely spatial representation errors are zero.But the spatio-temporal errors are substantial.Depending on the repeat cycle, either cloudiness or the observational cycle is more important to these errors, although it is cloudiness that leads to region-wide biases in the errors (see Ocean & Japan).Note also the very similar spatio-temporal representation errors, despite very different spatio-temporal sampling, for a ground-site, Fig. 6, or a satellite with a repeat cycle of 1 day.
The strong impact of cloudiness on temporal sampling and hence representation errors, shown both here and in the previous sub-section, suggests that area data calculated for clear skies only would yield smaller representation errors.This indeed reduces the region-wide biases over Ocean and Japan for a 1 day repeat cycle, but the representation RMSE are much the same.We will continue to calculate area data as a total sky average.
Figure 12 shows the impact of temporal collocation.Again, collocating area data to the hour yields smaller representation errors than collocating to the day.For longer repeat cycles monthly representation errors after collocating will be larger because there is less data to average out spatial representation errors.Spatial and temporal coverage requirements were set at 25%, meaning that at each of at least 25% of the overpasses at least 25% of the represented area was observed.
Alternative methods exist to construct monthly observations, for example by temporally averaging superobservations, see Eq. 7.This has a small but beneficial impact on representation errors.Figure 13 shows representation errors when using super-observations, either straight as in Eq. 7 or log-transformed before temporal averaging.Neither method is capable of achieving the small representation errors obtained after temporal collocation.
Adjusting required temporal coverage has a similar impact as for ground-sites, see Figure 14.Case coverage (percentage of the region observed by the satellite) goes down as temporal coverage increases.But there is no unequivocal impact on representation errors: they may remain similar (e.g.Oklahoma) or decrease (e.g.Japan).On the other hand, increasing required spatial coverage has a detrimental effect on representation errors.The reason is that increasing spatial coverage is accompanied by reduced temporal coverage which makes the observations less representative for the full month.The obvious exception is representation errors with collocation (to the hour) that decrease with increasing spatial coverage.We conclude that generally coverage is not a good measure for representation errors but spatial coverage provides a good control on representation errors with collocation to the hour.
Currently satellite super-observation products (L3) for AOT are usually produced at 1 o × 1 o (110 × 110 km 2 at the equator).Using such a product to represent the natural system at different spatial scales yields similar representation errors (as temporal sampling issues dominate), see Fig. 15.But when using it to evaluate collocated model data, representation errors can be expected to be smallest for 1 o × 1 o model grid-boxes.Note that larger grid-boxes may be filled in multiple super-observations, and so reduce representation errors with collocation.
Finally, we return to the work by (Levy et al., 2009) as several of their strategies for calculating monthly L3 data are easily evaluated in the context of our work (Sect.3).The aforementioned Fig. 13 shows that "Pixel Weighting" (brown) generally allows larger representation errors than "Equal Day Weighting" (dark blue)."Threshold Equal Day Weighting" is studied in Fig. 14 (dark blue line as function of spatial coverage) and also shown to allow larger errors than "Equal Day Weighting" (which is identical to "Threshold Equal Day Weighting" with c spat > 0).Thus we conclude that "Equal Day Weighting" is, from a spatio-temporal sampling perspective) the best choice.This will nevertheless allow monthly representation RMSD of 10 to 40%.

Passive remote sensing measurements from geostationary satellites
Geostationary satellites with passive remote sensing instrumentation allow for spatial aggregation of observations and multiple measurements per day.Consequently sampling issues are entirely dominated by cloudiness.Figure 16 shows that even for an imager in geostationary orbit, monthly representation errors are quite substantial.Actually, they are not that different from an imager on a polar orbiting satellite (Fig. 12) with a 1 day repeat cycle or a ground-site (Fig. 7). 10 The reason is of-course that cloudiness is the main reason for representativeness issues (in monthly averages, for platforms with high repeat frequencies).Note that representation errors after collocation are substantially lower for the geostationary imager than for a ground-site but again similar to those for polar-orbiting imager.

LIDAR measurements from polar orbiting satellites
An idealised polar orbiting LIDAR, see Sect.2.1, allows for limited aggregation (along its track) but will have a long re-20 peat cycle (here: 12 days).Figure 17 shows the resulting representation errors with and without collocation.These errors are large, even with collocation, and may preclude the use of satellite LIDAR data on monthly and 100 km scales.However, further averaging of temporally collocated data over larger regions (say Europe or the Atlantic dust outflow region) is likely to reduce representation errors as they are often not strongly correlated over distances exceeding the size of the represented area (e.g.see Fig. 3 or Fig. 10).

In-situ ground-sites
The IMPROVE network operates on a regular schedule of measuring one day out of three.Figure 18 shows that this has a relatively mild impact on representation errors.Still, errors may increase two-fold and collocation will usually bring representation errors down to the level of purely spatial errors.

35
Due to the observing cycle, it doesn't matter whether this is collocation to the hour or day.Similar results can be shown for BC concentration or number density measurements.
6 Representativeness of daily remote sensing data The following analysis was made for a represented area of 40 210 × 210 km 2 , with exceptions noted.All data were averaged over a day.

Remote sensing data
Figure 19 shows daily representation errors for either ground-sites or imagers on polar-orbiting satellites with a repeat-cycle of 1 day.Spatial representation errors are quite large for ground-sites but they are zero for the satellite.Yet spatio-temporal representation errors (without collocation) are very similar (although a bit smaller for the imager).Collocation to the hour reduces representation errors, but more so for the aggregated satellite observations.Actually, collocation for ground-sites allows for still significant spatial sampling issues in daily data.Typical impacts of observational coverage are shown in Fig. 20.For the ground-sites more stringent conditions on temporal coverage of the observations are relatively ineffective, irrespective of collocation or not: the spatial sampling issue always remains.In model evaluations, collocation to the hour will allow representation errors in satellite data to be arbitrarily reduced by specifying a spatial coverage requirement.Note however that case coverage drops steadily as required spatial coverage is increased.
The imager on a geo-stationary satellite again shows similar representation errors to the other observing systems with the exception of W-Europe where an RMSD of 20% was found, a significant improvement over ground-sites (37%) and polar-orbiting satellites (29%).

In-situ ground-sites
In-situ ground-sites that observe continuously during the day will have identical daily representation errors, with or without collocation.Here we find daily representation RMSD for PM 2.5 to range from 7% (Ocean) to 100% (Congo) with most values between 10 and 30%.; and for surface BC concentrations 40-100%.
7 Improving representativeness for data at less than daily time-scales Sofar we have tacitly assumed that daily averages over a larger area are best represented by daily observations.Here we will determine the optimal averaging time-scales for observations (from ground-sites) when the represented area consists of hourly or daily data.In particular, slightly longer averaging time-scales for the observations allow a larger part of the atmosphere to be advected over the measurement site possibly resulting in smaller representation errors.Remote sensing observations will be treated as uninterrupted by clouds or nighttime, to allow easier comparison to in-situ measurements.
When considering represented areas at daily time-scales, the optimal period for averaging observations (at which the representation RMSD is minimal) is usually slightly more than a day, see Fig. 21 and Table 4.However, using 24 hours for averaging observations doesn't result in significant increases in representation error and justifies the analysis in Sect.6.
Figure 22 shows hourly representation errors as a function of averaging period of surface PM 2.5 observations.It is ob-vious that hourly observations do not guarantee the smallest representation error.Averaging the observations over several hours results in substantially better representation.There is quite a bit of variety in optimal averaging period but it turns out that 6 hours is a good recommendation, also for other observables, see Table 5.This optimal period is the result of a golden middle way: for both short and long periods large representation errors due to spatial or temporal sampling issues may be expected.In between there is a fairly large range of periods (including 6 hours) for which the representation error is close to minimal.
In a few cases optimal averaging periods can be linked to the time needed for aerosol to drift a distance similar to the extent of the represented area (so-called transit time), see Fig. 23.But this was possible only for a few observables and seldom for surface measurements (N10 at 2 km is the best example we found).We surmise that turbulent flow and evolving aerosol make the link between transit times and optimal averaging periods rather tenuous.
At smaller representative areas of 110 × 110 km 2 , an aver-20 aging period of 4 hours is recommended.
8 Impact of precipitation on representation errors for in-situ measurements Due to its importance in removing aerosol from the atmosphere, precipitation is expected to be a leading cause of spatio-temporal variability in aerosol.In this section we explore if it is feasible to control representation errors by selecting observations for dry days only.
Precipitation is measured either locally by directly measuring the rain flux (e.g.rain buckets), or regionally through 30 remote sensing measurements (e.g.scanning rain radar).This suggests two potential predictors for the impact of precipitation on representation errors: 1) a local precipitation measurement sited near the in-situ aerosol measurement can be used to identify cases of strong precipitation; 2) regional measurements can be used to identify cases where precipitation over the ground-site and the wider represented area differ greatly.
Figure 24 shows a rather typical example of how daily representation errors for in-situ measurements correlate with local precipitation.It is obvious that the impact is not overly large considering the already sizeable representation errors at low precipitation.Most observables and regions show even less dependence on precipitation.Over the Congo, higher local precipitation actually leads to smaller representation errors.The second predictor, the relative difference in precipitation over the wider area and at the ground-site, shows even less conclusive results.
Fig. 25 examines how monthly representation errors change due to the discarding of observations with potentially 50 high representation errors (based on the aforementioned predictors).This has only a marginal impact and quite often that impact is to increase representation errors, albeit slightly.This happens because the temporal averaging over less data leads to larger representation errors, similar to what we saw for remote sensing observations.These results do not depend on the chosen observable, region or (arbitrarily chosen) threshold for the predictor.Only surface aerosol extinction over Japan showed a small but beneficial impact on representation errors due to filtering out high precipitation events.Note that the area data were collocated to the hour with available observations before monthly averaging, to provide a best case.
Concluding, our analyses suggest that no systematic beneficial impact due to discarding cases of high precipitation or strong spatial gradients in precipitation can be found.This holds also at smaller sizes of the represented area (down to 50 × 50 km 2 ).Studying movies of the evolving aerosol in our simulations offers an explanation: precipitation is seldom limited to the ground-site and the represented area will be affected as well; also, precipitation does not necessarily correlate with loss of aerosol as converging air motions near updrafts or the sulfate production in associated cloud fields may actually increase aerosol; finally, the spatio-temporal distribution of emission sources combined with turbulent or shearing wind-fields are strong drivers of spatial variability by themselves.

Lessons learned
While representation errors can be significant, they behave differently depending on whether spatial or temporal sampling dominates the error.In case of spatial sampling, representation errors can often be reduced through spatiotemporal averaging (see also S16a).In the case of temporal sampling, representation errors are unlikely to be reduced through such averaging (see also S16b).If observations are used for model evaluation, it is possible to temporally collocate the model data with the observations; subsequent temporal averaging then reduces representation errors.
Typical representation RMSD errors and other numerical results quoted below refer to a represented area of 210 × 210 km 2 .For other area sizes, see S16a or this paper.For model evaluation, we used a required spatial and/or temporal coverage of 25% and collocation to the hour.
To have observations optimally represent a larger area, they will need to be averaged over time.While monthly area data is best represented by monthly observations, hourly area data is better represented by observations averaged over 6 hours.

In-situ ground-sites
If such sites allow for continuous operation the measurements from these sites only suffer representation errors due to spatial sampling.Temporal averaging may reduce such er-rors but emissions sources and orography may cause a constant component in representation error that can not be eliminated.We found errors up to 40% in 6-months averages of surface BC mass concentrations, Sect. 4. We suggest vetting such observations for location.
For model evaluation: Averaging both model data and observations over multiple sites can be used to increase representativity (see also S16a).

Passive remote sensing ground-sites
These observations suffer from both spatial and temporal 10 sampling issues and the latter is usually more important.A representation error driven by temporal sampling is unlikely to be reduced through temporal averaging, see Sect. 4 and also S16b.Further study is required to validate the use of such observations to construct climatologies.Using a minimum required number of observations cannot be relied upon to control representation errors (see Sect. 5) or only has a weak impact (see Sect. 6).Representation errors in AOT are typically 10-40% (monthly) and 20-50% (daily).For model evaluation: Collocating model data to the hour of 20 observations should be a first step to reduce representation errors.The representation error due to spatial sampling may be reduced by temporally averaging the collocated data.In this case, a minimum required number of observations can be used to control representation errors.Representation errors in AOT are typically 5-15% (monthly) and 10-30% (daily).Collocation to the day of observation is sub-optimal; we found very similar representation errors as when no collocation is used (see Sect. 5).See also in S16b how collocation to the day creates a longitude dependence in representation 30 errors.

Passive remote sensing imagers on satellites
These observations suffer from both spatial and temporal sampling issues but often allow spatial aggregation over the represented area.Temporal sampling will dominate representation errors and prove insensitive to temporal averaging, see Sect. 4 and also S16b.Further study is required to validate the use of such observations to construct climatologies.Using a minimum required number of super-observations cannot be relied upon to control monthly representation errors (see 40 Sect.5).For imagers on polar-orbiting satellites, monthly representation errors in AOT are typically 10-40% (repeat cycle: 1 day) or 35-55% (repeat cycle: 8 days).Daily representation errors in AOT are 25-40%.For imagers on geostationary satellites, representation errors are similar to those for polar-orbiting satellites with a 1 day repeat cycle.For model evaluation: temporal collocation of model data to the hour of super-observations is the best strategy.In principle, the representation error due to spatial sampling can be arbitrarily reduced through a required minimum spatial 50 coverage of the super-observations.Monthly representation errors can also be reduced through a minimum required temporal coverage.The flip side will be a lower case coverage.Monthly representation errors in AOT are typically 5-15% (repeat cycle: 1 day) or 10-15% (repeat cycle: 8 days).Daily representation errors in AOT are 10-15%.This daily representation error is significantly lower than that for groundsites due to the spatial aggregation.As in the case of remote sensing ground-site observations, collocation to the day of observation is sub-optimal (see Sect. 5).

Active remote sensing satellites
Due to their narrow swath, LIDAR observations from space will have long repeat-cycles causing significant representation errors.Monthly representation errors in aerosol extinction are 70-160% with significantly skewed error distributions.Note that we only considered a single atmospheric level near the top of the boundary layer in our very limited study.
For model evaluation: monthly representation errors after collocation to the hour were still 20-40%, with one region (Ocean) showing errors of 140%.Further reduction of representation errors should be possible by averaging data over larger geographic regions.

Conclusions
Measurements always have a discontinuous spatio-temporal sampling, unlike the natural system they are trying to observe.As a consequence, actual daily, monthly and yearly averages over areas may be very different from those based on the undersampled observations.This limits the information present in observations and their usefulness in describing nature or evaluating models.In this paper, we have estimated these representation errors using high-resolution models to generate an objective truth and synthetic observations for a slew of idealised observing systems (in-situ ground-sites, remote sensing ground-sites, passive and active remote sensing satellites).For a wide range of time-scales (hour-dailymonthly to semi-annually) and length-scales (50 -300 km), representation errors were shown to be significant, ranging from 10-100%.
In particular, we study typical aerosol observables like AOT, PM 2.5 , BC concentrations and number concentrations for idealised observing systems that capture the essence of real-life observing systems like AERONET (AErosol RObotic NETwork), SKYNET, IMPROVE, EMEP (European Monitoring & Evaluation Programme), MODIS, AATSR (Advanced Along-Track Scanning Radiometer), MISR (Multi-Angle Spectro-Radiometer) and CALIOP.Typical length-scales at which we estimate representation errors (100's of kms) are based on the grid-resolution of the global models often used in our field.
Our study not only allows us to estimate representation errors but also assess various ways in which to reduce them.In particular, we were able to assess the usefulness of different methods to generate gridded satellite L3-data (Levy et al., 2009).Our results suggest that the current practice of unconditional averaging of super-observations into a monthly product is a good procedure but still allows for significant monthly representation errors (10-40% at best).Small improvements are possible if the super-observations are logtransformed before averaging.
When using observations to evaluate models, it is possible to temporally collocate model data with the observations and we showed this to be a very efficient way to reduce representation errors, especially if this is followed up by temporal averaging.However, such collocation should use hourly model data collocated to the hour of the observation.Currently, daily model data is often collocated to the day of the observation and this is sub-optimal (and sometimes no better than no collocation).Also, collocation allows some control on representation errors through the number of observations used.
Some other interesting finds are: 1) to better represent hourly data for a larger area, observations should be averaged over 6 hours (210 km 2 ) or 4 hours (110 km 2 ); 2) representation errors for either remote-sensing ground sites or imagers on polar-orbiting (1 day repeat cycle) or geostationary satellites are very similar on daily and monthly scales, despite very different sampling; 3) representation errors often depend counter-intuitively on observational coverage (the number of observations used); 4) temporal sampling issues clearly dominate representation errors in remote sensing data on monthly scales and less clearly dominate on daily scales; 5) local precipitation does not appear to be a major cause of representation errors, and vetting observations based on precipitation measurements does not improve representativity; 6) emission sources and orography can give rise to persistent and significant representation errors.
Since we used simulations to assess representation errors, our results depend on the quality of the numerical models.In (Schutgens et al., 2016a) we showed that two different models estimated very similar representation errors over the same region.A more fundamental issue is that we only have simulations over 6 different regions for a few months.Obviously we cannot claim our results are universal.We surmise that error values will be different in detail for other regions or months but still be of similar magnitude.The consistency across our 6 regions and 3 models in this study, and similarly the consistency of temporal representation errors estimated in Schutgens et al. (2016b) for global model data, support this.In particular, our simulations consistently showed that increasing required spatial coverage of satellite observations leads to decreasing temporal coverage and increasing representation errors, unless collocation can be used.
It is possible that the representation errors estimated in this paper are under-estimates.As argued in S16a, 1) model vari-ability tends to increase with increasing resolution, 2) at 10 km resolution, we can not resolve the fine-structure at the scale of in-situ sampling volumes, 3) we use assumed temporal profiles of our emission that do not capture day-to-day or week-to-week variations, and 4) our models offer only a 60 bulk abstraction of aerosol without all the detail nature has to offer.

Code availability
Copies of the code used in our analysis are readily available from the corresponding author.Goto, D., Dai, T., Satoh, M., Tomita, H., Uchida, J., Misawa, S., Inoue, T., Tsuruta, H., Ueda, K., Ng, C. F. S., Takami, A., Sugimoto, N., Shimizu, A., Ohara, T., and Nakajima, T.: Application of a global nonhydrostatic model with a stretched-grid system to regional aerosol simulations around Japan, Geoscientific Figure 1.Three models were used in this study to simulate a variety of aerosol fields.The regional names used to identify these simulations are given in large font, while the models are denoted in small font.MADE and GOCART refer to the WRF-Chem version used.Hourly representation errors as a function of averaging period ∆T used for surface PM25 observations.In the top-left corner, the ratio of q98 − q2, q91 − q9 and q75 − q25 for ∆T = 0 to optimal ∆T is given.Results for a 210×210 km 2 grid-box.Further explanation in Sec.3.2 .
f x+i;y+j;t+k v x+i;y+j;t+k , (3) 85 black rectangle represents the median error and the black circle the mean error.On top of each bar, the RMSD is shown.The colours of the bars refer to different experiments and are explained in the caption of each separate figure.If a required spatial or temporal coverage was used, this will be shown in the lower left and right corners of the figure.Case coverages per region are shown just above the region names.

Figure 3 .
Figure 3. Relative representation errors in AOT and surface BC concentrations in 6-month averages.The black dots show the locations of major ACTRIS measurement sites.Results for a 10 × 10 km 2 observation against a 210 × 210 km 2 area.

Figure 4 .
Figure 4. Relative representation errors in AOT in 6-month averages.The represented area data were temporally collocated to the hour with the observations.The black dots show the locations of major ACTRIS measurement sites.Results for a 10 × 10 km 2 observation against a 210 × 210 km 2 area.

Figure 5 .
Figure 5. Relative spatial representation errors in AOT and surface BC mass concentrations as a function of averaging period.Both AOT and BC measurements were assumed to be continuous in time.Results for a 10×10 km 2 observation against a 210×210 km 2 area.Further explanation in Sect.3.2.

Figure 7 .Figure 8 .
Figure7.Monthly representation errors after collocation for remote sensing ground-sites: purely spatial sampling (grey), no collocation (brown), area data collocated to the day of observations (bright orange), and area data collocated to the hour (red).The grey and brown error estimates are similar to Fig.6, except for a required temporal coverage of 25%.Further explanation in Sect.3.2.

Figure 10 .
Figure10.Relative monthly representation errors in AOT for a remote sensing ground-site over Oklahoma.From left to right, the following scenarios are considered: 1) only spatial sampling contributes to the representation error; 2) both temporal and spatial sampling contribute; 3) both temporal and spatial sampling contribute but data are collocated to the day; 4) both temporal and spatial sampling contribute but data are collocated to the hour.Results for a 10 × 10 km 2 observation against a 210 × 210 km 2 area.See also Fig.7.

Figure 14 .
Figure14.Monthly mean (dashed) and RMS (solid) of representation errors for an imager on a polar-orbiting satellite as a function of required spatial or temporal coverage of the observations.Results are shown for no collocation (brown), no collocation but using superobservations (dark blue), collocation to the day (orange), and finally model data collocated to the hour (red).Results for a repeat cycle of 1 day.Further explanation in Sect.3.2.

Figure 20 .
Figure20.Daily representation errors for remote sensing instruments as a function of required coverage.Results shown for no collocation (brown), and area data collocated to the hour (red).The left panel is for a ground-site, the right panel for an imager on a polar-orbiting satellite with a 1 day repeat cycle.Further explanation in Sect.3.2.

Figure 21 .
Figure21.Daily representation errors as a function of averaging period ∆T used for surface PM25 observations.In the top-left corner, the ratio of q98 − q2, q91 − q9 and q75 − q25 for ∆T = 0 to optimal ∆T is given.Results for a 210 × 210 km 2 grid-box.Further explanation in Sec.3.2.

Figure 23 .
Figure23.Relative representation RSMD for N10 measurements as a function of transit time over averaging period, for W-Europe (red), Oklahoma (blue) and Congo (green).Further explanation in Sec.3.2.

Figure 24 .Figure 25 .
Figure 24.Impact on daily representation errors from precipitation.The symbols use the left-hand axis (colours indicate relative difference in precipitation between observation and wider area), the grey quantile boxes the right-hand axis.Results for a 210 × 210 km 2 grid-box for Ocean.
Orthogonal Polarization) has a repeat cycle of 16 days but allowing the LIDAR swath to revisit different parts of the same 55 210 by 210 km 2 area brings the typical cycle down to about 12 days.As we do not consider measurement errors, it matters little if the LIDAR measurement is made during the day or night.Down-time due to malfunction is not considered.
). Satellite LIDAR (LIght Detection And Ranging) measurements observe a narrow north-south transect (see also S16a) within the represented area once a day at local noon with a repeat cycle of 12 days.CALIOP (Cloud-Aerosol LIDAR with

Table 1 .
Simulations analysed in this study

Table 3 .
Semi-annual relative representation errors for ACTRIS sites

Table 5 .
Optimal averaging periods for ground-site measurements used to represent a 210 × 210 km 2 area (hourly).The colours indicate an increase of representation RMSD representation by less than 5%, less than 10% or less than 20% when using the recommended period of 6 hours instead.