Best estimate for the extinction bin value

The problem of choosing a best estimate for the bin value and uncertainty has been carefully considered using various techniques of statistical analysis.

In the case of aerosol extinction, the problem is to get rid of outliers coming from poor measurements, and mainly from extinction measurements affected by a cloud compound.

The influence of poor measurements, possibly due to a lack of sensitivity or a saturation of the instrument, may greatly affect the calculation of the bin value and its error, for a poor choice of statistical estimate. Among other, the very common choice of the mean (error-weighted or not) may be very sensitive to the presence of outliers, and reveals to be a quite poor estimator. A part of the study realized in the frame of this PROMOTE Service consisted of investigate the influence of the choice of estimator on the bin value and its uncertainty, for a set of statistical estimators, and a wide set of typical bin population (low/high volcanic, cloudy/cloud free, etc.), to get the most robust estimate and the less sensitivity to outliers.

Clouds are known to present a high extinction value, with a rather flat spectral response due to the presence of thick particles with respect to the wavelength in the UV-visible range. Therefore, standard detection methods use the ratio between extinction at two spectral channels (for instance, around 500nm and 1 µm) to identify cloud occurrence in a set of events [G.S Kent and al., 1993].

Figure 1 illustrates the influence of the choice of statistical estimator on the bin value estimate. The histogram on the upper size and on the right side show respectively the distributions of extinction value in the bin population, and their error estimate as provided by the original data set. The central panel combines both information and allows distinguishing the region of the event population in which it is expected to find the best estimate of the bin value and uncertainty. Several estimators are tested for the considered event population: the common mean and standard deviation, the median and Median absolute deviation (MAD; = median of the difference between each event and the median value), bisquare estimator, Hampel estimator, and the wave estimator with their respective standard deviations. In all cases, we consider unweighted “mean” values in this example. For a description of all these statistical estimators, see V. Barnnet and T. Lewis, Outliers in Statistical Data. This example shows that the (weighted) mean values are greatly affected by outliers, and may poorly describe the most probable value of the bin extinction.

figure1

Figure 1. Choice of an estimator for an event population corresponding to the SAGE II extinction bin at 1020 nm, January 1985, z = [18 km, 18.5 km], latitude bin [20°,30°].Central panel indicates the spread of the events as a function of the extinction logarithm log(b) and its error d(log(b)). Histograms show the distribution of extinction values (above) and their error (right). Values of the estimator and its spread are indicated for various statistical estimators in the text box and plot on the extinction histogram.

In the following example (Figure 2), the spread of the events population for the considered bin shows two aerosol modes, around extinction values of about 10-3.5 km-1 and 10-3.2 km-1 respectively. Error-weighted values are computed for the same estimators. Using a bimodal distribution allows retrieving the characteristics of both modes (See Figure 3). Nevertheless, as only one bin value is considered for the description of the extinction value, we use a monomodal distribution for the overall characterization of the aerosol extinction bin.

figure2

Figure 2: Statistical analysis of an extinction bin characterized by 2 aerosol modes. Event population corresponding to the SAGE II extinction bin at 1020 nm, November 1985, 5km above the tropopause, latitude bin [-70°,-60°].For a description of the different panels, see Figure 1.

figure3

Figure 3: Same as Figure 2, with characterization of the extinction bin using bimodal distribution functions.

Next case (Figure 4) shows a bin where two modes are present in the extinction population: one aerosol mode around an extinction value of about 10-3.7 km-1 corresponding to measurements in clear sky conditions, and one cloud mode with extinction values higher than about 10-2.5 km-1. Notice the high value of the cloud extinction, compared to the aerosol value. In such a case, the use of a bimodal distribution allows getting the best estimate of the aerosol mode, and a description by the median and MAD estimates is used for characterizing the extinction bin value. Figure 5 gives the computation of median and Hampel estimators with a monomodal distribution function for comparison.

figure4

Figure 4: Statistical analysis of an extinction bin characterized by 2 aerosol modes. Event population corresponding to the SAGE 3 extinction bin at 1019 nm, July 2003, at the tropopause, latitude bin [40°, 50°]. The size distribution is fitted using a bimodal distribution function. For a description of the different panels, see Figure 1.

figure5

Figure 5: Same as Fig. 4. The event population is modelled using monomodal distribution functions.

The determination of extinction values has been performed using 1 month bins spread over 10 latitude intervals, at a given value of the altitude. In each case, the best choice an analysis is performed to distinguish a possible cloud mode. When a cloud mode is identified, a bimodal distribution is used for describing the event population. In the other cases, a monomodal distribution function is considered. The median estimator and MAD are used respectively for the determination of the bin value and its uncertainty.

Problems are encountered in some cases where it is difficult to discern the aerosol and cloud modes. It is particularly the case in the case of high volcanic load after the Pinatubo eruption, when aerosol particles and clouds show similar spectral characteristics. In such a case, it may be not possible to make a clear decision about the right extinction value, and the most probable value is assessed, possibly using comparisons with the situation in the closest bins.

References:

  • Barnet V. Lewis and T., "Outliers in statistical data", 3rd edition, Wiley series in probability and mathematical statistics.
  • Kent ,G.S. and McCormick, M.P., "Separation of cloud and aerosol in two-wavelength satellite occultation data",Geophys. Res. Lett. , 18, 428-431, 1991.
  • Kent ,G.S., Winker, D.M., Osborn, M.T. and Skeens, K.M., "A model for the separation of cloud and aerosol in SAGE II occultation data",J. Geophys. Res., 98, 20,725-20,735, 1993.
  • Wang P.-H., Minnis, P., McCormick, M.P. Kent ,G.S. and Skeens, K.M., "A 6-year climatology of cloud occurrence frequency from Stratospheric Aerosol and Gas Experiment II observations (1985-1990)", J. Geophys. Res., 101, 29, 407-29,429, 1996.