Comment on "Quantitative performance metrics for stratospheric-resolving chemistry-climate models" by Waugh and Eyring (2008) Deutsches Zentrum für Luft- und Raumfahrt, Institut für Physik der Atmosphäre, Oberpfaffenhofen, 82230 Wessling, Germany
Received: 15 May 2009 – Published in Atmos. Chem. Phys. Discuss.: 26 June 2009 Abstract. This comment focuses on the statistical limitations of a model grading, as
applied by D. Waugh and V. Eyring (2008) (WE08). The grade g is calculated
for a specific diagnostic, which basically relates the difference of means of model
and observational data to the standard deviation in the observational
We performed Monte Carlo simulations, which show that this method has the potential to lead to
large 95%-confidence intervals for the grade. Moreover, the difference
between two model grades often has to be very large to become statistically
significant. Since the confidence intervals were not considered in detail
for all diagnostics, the grading in WE08 cannot be interpreted,
without further analysis. The results of the statistical tests performed in WE08
agree with our findings. However, most of those tests are based on special
cases, which implicitely assume that observations are available without any
errors and that the interannual variability of the observational data and the
model data are equal. Without these assumptions, the 95%-confidence intervals become even larger.
Hence, the case, where we assumed perfect observations (ignored errors), provides
a good estimate for an upper boundary of the threshold, below that a
grade becomes statistically significant. Examples have shown that the
95%-confidence interval may even span the whole
grading interval [0, 1]. Without considering
confidence intervals, the grades presented in WE08 do not allow to decide whether
a model result significantly deviates from
reality. Neither in WE08 nor in our comment it is pointed out,
which of the grades presented in
WE08 inhibits such kind of significant deviation.
However, our analysis of the grading method demonstrates the unacceptably high
potential for these grades to be insignificant.
This implies that the grades given by WE08 can not be interpreted by the reader.
We further show that the inclusion of confidence intervals
into the grading approach is necessary,
since otherwise even a perfect model
may get a low grade.
Revised: 23 November 2009 – Accepted: 25 November 2009 – Published: 01 December 2009
Citation: Grewe, V. and Sausen, R.: Comment on "Quantitative performance metrics for stratospheric-resolving chemistry-climate models" by Waugh and Eyring (2008), Atmos. Chem. Phys., 9, 9101-9110, doi:10.5194/acp-9-9101-2009, 2009.