ISM Symposium on Environmental Statistics 2018

22-23 March, 2018

Admission Free,No Booking Necessary

The Institute of Statistical Mathematics, Tokyo, Japan.
Host Organization
The Institute of Statistical Mathematics
Koji Kanefuji (ISM)
Alan Welsh (ANU)
Kunio Shimizu (ISM)
Kenichiro Shimatani (ISM)

【22 March】

Opening Address
Tomoyuki Higuchi (Director General, ISM)

Session 1 (10:10-12:10)
Chaired by Yoshinori Kawasaki (ISM)


Modelling rainfall in the Murray-Darling Basin
Alan H. Welsh (Australian National University)

Gen Nowak, A.H. Welsh, T.J. O'Neill and Lingbing Feng
(Australian National University, Bond University, Jiangxi University of Finance and Economics)

The Murray-Darling Basin (MDB) is a large geographical region in southeastern Australia that contains many rivers and creeks, including Australia’s three longest rivers, the Murray, the Murrumbidgee and the Darling. Understanding rainfall patterns in the MDB is very important due to the significant impact major events such as droughts and floods have on agricultural and resource productivity. We discuss our experience of building a model for monthly rainfall data obtained from weather stations in the MDB and for producing predictions in both the spatial and temporal dimensions. The model we built is a hierarchical spatio-temporal model fitted to geographical data that utilises both deterministic and data-derived components. Specifically, rainfall data at a given location are modelled as a linear combination of these deterministic and data-derived components. We describe the step-by-step approach we used, our use of a block bootstrap to produce standard errors and the way we used the model to make predictions.


Multi-fractal analysis, high-dimensional data, and natural structures, or How to analyze statistically computed tomography scanning data collected for crowns of miniature conifers
Pierre Dutilleul (McGill University)

Multi-fractal analysis, high-dimensional data, and natural structures, or How to analyze statistically computed tomography scanning data collected for crowns of miniature conifers
Pierre Dutilleul (McGill University)

The structural complexity of branching patterns lies at the heart of many fundamental natural processes such as photosynthesis in plants (excluding algae), even though photosynthesis takes place in the leaves deployed in 3D by the branches to which they are attached. Computed tomography (CT) scanning technology, originally designed for medical diagnostics, has recently been adapted for plant materials of a size as large as that of a human being. Thus, it allows the collection of original, complex data sets that require novel, appropriate methods for their statistical analysis.
My talk here will be in three parts, namely: The Data; The Mathematical and Statistical Methods; and The Results and Discussion. First, I will explain how CT scanning data (numbers and images) were collected and processed for 15 miniature conifers, 10 with needlelike leaves and 5 with scale like leaves. A specimen of the Picea glauca Pixie species (common name: White Spruce) will be used for illustration, together with specimens of three Japanese Cedar species (Cryptomeria japonica Compressa; Cryptomeria japonica Gyokuryu; Cryptomeria japonica Monstrosa Nana). Secondly, the multi-fractal spectra of singularity and Rényi will be presented as a new way to analyze the structural complexity of plant branching patterns, from CT images appropriately processed. A variety of questions (statistical inference: precision in estimation; technical: interpolation; applied: skeletal vs. point-pattern images) will be addressed and answered. Last but not least, results will be discussed for several of the 15 miniature conifers, in relation to the leaf type, the relevance or not of the hypothesis of mono-fractality, and any better pattern (skeletal or point pattern) depending on the criterion, for applying multi-fractal analysis to study the branching structure in plants.
This phytometric work is joint work with Liwen Han (Dept. of Plant Science, McGill University). The literature review includes the publications below, from the field of plant CT scanning and applications of multi-fractal analysis in soil science.


Spatial distribution of coefficients of variation for earthquake recurrence intervals in Japan
Shunichi Nomura (ISM)

Spatial Distribution of Coefficients of Variation for Earthquake Recurrence Intervals in Japan
Shuichi Nomura (ISM)

A Bayesian method is proposed for probability forecasting for recurrent earthquakes of inland active faults in Japan. Renewal processes with the Brownian Passage Time (BPT) are applied for over a half of active faults in Japan by the Headquarters for Earthquake Research Promotion (HERP) of Japan. Long-term forecast with the BPT distribution needs two parameters; the mean and coefficient of variation (COV) for recurrence intervals. The HERP applies a common COV parameter for all of these faults because most of them have only one or a few specified paleoseismic events, which is not enough to estimate reliable COV values for respective faults. The COVs of recurrence intervals depend on stress perturbation from nearby seismicity and have spatial trends. Thus we introduce a spatial structure on its COV parameter by Bayesian modeling with a Gaussian process prior. It is found that the spatial trends in the estimated COV values coincide with the density of active faults in Japan. We also show Bayesian forecasts by the proposed model using MCMC methods.

Lunch Break

Session 2 (13:10-15:10)
Chaired by Jiancang Zhuang (ISM)


Air quality assessment with spatial and temporal adjustment to meteorological confounding
Song Xi Chen (Peking University)

Air Quality Assessment with Spatial and Temporal Adjustment to Meteorological Confounding
Song Xi Chen (Peking University)

Motivated by the task of quantifying the change in the underlying pollution concentration from of the meteorological confounding over a 5180 square kilometer region around Beijing, we propose a spatial and temporal adjustment for the PM2.5 and other pollutants with respect to the meteorological conditions to remove the meteorological confounding. The adjusted mean pollution concentration is shown to be able to capture the change in the underlying emission rather than that of the weather condition. Estimation of the adjusted mean is proposed together with asymptotic and numerical results. We apply the approach to conduct assessment on six pollutants for Beijing region from Years 2013-2016.

This is a joint work with Shuyi Zhang, Bin Guo, Hengfang Wang and Wei Lin.


Statistical modelling of atmospheric profiles and uncertainty
Alessandro Fassò (University of Bergamo)

Statistical modelling of atmospheric profiles and uncertainty
Alessandro Fassò (University of Bergamo)

Measurement uncertainty of atmospheric profiles obtained by remote sensing and radiosoundings is crucial in climate change studies. This talk discusses some modelling issues related to functional data representation of temperature and humidity profiles, which arise in two applications related to the GAIA-CLIM Horizon 2020 research project.
The first case study is involved in co-location mismatch of two atmospheric observations, typically a satellite profile and a radiosonde profile. The objective is the assessment of the vertical smoothing mismatch uncertainty related to this profile comparison. To see this, radiosonde are harmonised to match the satellite data in a two steps procedure, which is based on a maximum likelihood approach and exploits the measurement uncertainties in a natural way. At the first step radiosonde profiles are transformed into continuous functions using splines. At the second step radiosonde profiles are harmonised by considering weighting functions based on the generalised extreme values probability density function with parameters depending on altitude. The variation between harmonised and non-harmonised radiosonde is then informative on vertical smoothing mismatch.
The second case study is related to geographic gaps of radiosonde monitoring networks. In particular, a gap region is defined as an atmospheric region where the spatial prediction uncertainty is high.
To do this global bi-daily radiosonde profiles are modelled as a spatio-temporal process with functional values and a functional kriging variance is used to identify the gaps. Techniques for large data sets are considered.


Latent mixture modeling for clustered data
Shonosuke Sugasawa (ISM)

Latent Mixture Modeling for Clustered Data
Shonosuke Sugasawa (ISM)

Clustered data which has a grouping structure (e.g. postal area, school, individual, species) appears in a variety of scientific fields. The goal of statistical analysis of clustered data is modeling the response as a function of covariates while accounting for heterogeneity among clusters. In this talk, we are concerned with cluster-wise conditional distributions and develop a mixture modeling approach to estimating them. We propose structures that cluster-wise conditional distributions are represented by nite mixtures of latent conditional distributions with cluster-wise mixing proportions. We develop an iterative algorithm for parameter estimation. The performance of the proposed approach is investigated through simulations and an application to ecological data.


Session 3 (15:30-17:30)
Chaired by Shogo Kato (ISM)


A new test of discordancy in cylindrical data
Adriana Irawati Nur Ibrahim (University of Malaya)

A New Test of Discordancy in Cylindrical Data
Nurul Hidayah Sadikona*, Adriana Irawati Nur Ibrahima*, Ibrahim Bin Mohamed*, and Kunio Shimizu**
 (*Institute of Mathematical Sciences, University of Malaya, Kuala Lumpur, Malaysia
**School of Statistical Thinking, The Institute of Statistical Mathematics, Tokyo, Japan)

Cylindrical data are bivariate data from the combination of circular and linear variables. However, up to now no work has been done on the detection of outlier in cylindrical data. We introduce a definition of outlier for cylindrical data and present a new test of discordancy to detect outlier in this type of data, based on the k-nearest neighbor's distance. Cut-off points of the new test statistic based on the Johnson-Wehrly distribution are calculated and the statistic's performance is examined using simulation. A practical example is presented using wind speed and wind direction data obtained from the Malaysian Meteorological Department.


Hidden Markov random fields for segmenting circular and cylindrical spatial series
Francesco Lagona (University of Roma Tre)

Hidden Markov random fields for segmenting circular and cylindrical spatial series
Francesco Lagona (University of Roma Tre)

Hidden Markov random fields are convenient tools for segmenting environmental spatial data according to a finite number of regimes that represent the conditional distributions of the data under specific environmental conditions. Under this setting, the data are modelled by a finite mixture of parametric densities, whose parameters vary across space according to a latent Markov random field. As such, it can be viewed as an extension of a mixture model to the spatial setting. Motivated by environmental studies that require the segmentation of angular data, I describe two hidden Markov random fields for the analysis of a spatial series of angular measurements and, respectively, for the analysis of a cylindrical spatial series, i.e. a bivariate spatial series of directions and intensities. Both models are estimated by composite-likelihood methods, because of the numerical intractability of the likelihood function. The core of the estimation procedure is a computationally efficient expectation-maximization algorithm that iteratively alternates the maximization of a weighted composite likelihood function with weights updating. These proposals are illustrated on two cases studies of wildfire seasonality and sea current circulation. In the first case, the model indicates the most likely places where fires could occur in specific periods of the year and captures the association between fire occurrences and land cover within each season of the year. In the second case, the model offers a clear-cut segmentation of sea current dynamics, which reflects the orography of the study area and captures regime-specific, non-linear relationships between the speed and the direction of the currents.


Direct sampler from A-hypergeometric distribution for count data analyses
Shuhei Mano (ISM)

Direct sampler from A-hypergeometric distribution for count data analyses
Shuhei Mano (ISM)

Count data appears frequently in environmental problems. In the statistical data analyses, such as hypothesis testing with a small sample, we sometimes have to sample from a given probability functions. For such situation, the Markov chain Monte Carlo (MCMC) sampling may be the standard choice. The fact that a MCMC sampler does not require the normalization constant is the great advantage, however, in the implementation we have to avoid non-stationarity and autocorrelation. Therefore, if it exists, a direct sampler may be better choice. In this presentation, I will introduce a direct sampler from the A-hypergeometric distributions, which is a family of distributions extending a distribution of contingency table with fixed marginal sums. Thanks to the holonomic gradient methods, which stand on computational algebra, we can compute the normalization constants. Some distributional properties of the A-hypergeometric distribution are also introduced.


【23 March】

Session 4 (9:30-11:45)
Chaired by Kenichiro Shimatani (ISM)


The environmental Kuznets curve for China PM2.5 emissions:
A panel functional least squares analysis

Yundong Tu (Peking University)

The Environmental Kuznets Curve for China PM2.5 Emissions: A Panel Functional Least Squares Analysis
Yundong Tu*, Yingqian Lin*, Ying Wang** and Siwei Wang* (*Peking University and **University of Auckland)

This paper applies the panel of 31 provinces in China from 2000Q1 to 2014Q4 to study the Environmental Kuznets Curve (EKC) for PM2.5 emissions. Our study shows that the PM2.5 emissions increase with economic development at low levels of income but decreases with further economic development at high level of income. The results are robust in the sense that, when the provinces are classified into high and low income groups, the low income group PM2.5 emissions monotonically increase with income level, while that of the high income group maintain the inverted U-shaped relationship as described by EKC. Our analysis shows that most of the provinces that have not reached the turning point of the inverted U-shaped curve will do so in about 2024. The estimation of the panel data model is performed via the new estimator, Panel Functional Least Squares (PFLS) estimator, constructed based on characteristic functions of the residuals. The PFLS estimator is shown to be asymptotically normal and adaptive to residual distribution. Simulation results demonstrate its nice performance in finite samples.


Non-Gaussian bivariate spatio-temporal statistical models for atmospheric trace gas inversions
Andrew Zammit Mangion (University of Wollongong)

Non-Gaussian bivariate spatio-temporal statistical models for atmospheric trace gas inversions
Andrew Zammit Mangion (University of Wollongong)

Atmospheric trace gas inversion is a method for assessing the spatial distribution of gas emissions/sinks, or flux, from (i) mole fraction measurements and (ii) atmospheric simulations from deterministic computer models. Studies to date are predominantly of a data assimilation flavour, implicitly considering univariate statistical models with the spatially distributed flux as the variate of interest. Here, we show that a more appropriate approach to the problem is through a non-Gaussian bivariate statistical model constructed via a conditional approach, where the atmospheric simulator is used to explicitly elicit the distribution of the mole fraction field (variate #2) given the flux field (variate #1). This model offers several interpretable and computational advantages. First, through the conditional approach we can cater explicitly for inaccuracies in the spatio-temporal simulator, which are usually ignored. Second, following a Box-Cox transformation of the flux field, we obtain interpretable quantities for the expectations of, and the covariance between, the mole-fraction field and the flux field, as well as all the auto- and cross-cumulant functions of the joint process. Third, the decoupling of the observation locations from those at which the processes are evaluated allows us to use computationally-efficient spatio-temporal model representations. These offer a way forward for the inversion of large to massive datasets, which will soon be available from remote sensing instruments. We show how a Markov Chain Monte Carlo (MCMC) scheme can be used to make inferences on the model parameters and non-Gaussian flux field using moderate computational resources. The approach is illustrated on a simple one-dimensional simulation study as well as a case study of methane (CH4) emissions in the United Kingdom and Australia. This is joint work with Noel Cressie and Anita Ganesan.


Statistical methods for null model analysis
Louis-Paul Rivest (Université Laval)

Statistical Methods for Null Model Analysis
Louis-Paul Rivest (Université Laval)

In Ecology, null model analysis is used to investigate the association between species. Suppose that c sites are sampled and that r species are seen at least once on these sites then the rxc data matrix has entry 1 in cell (i,j) if species i was seen at site j and 0 otherwise. A null model analysis uses a statistic T to characterize the aggregation between species. Several T statistics are available; some measure the species nestedness while others try to capture the species co-occurrence. The statistic T is then calculated on the data matrix. To evaluate the significance of this statistic several random 0-1 data matrices are generated using a Monte Carlo method and the statistic T is evaluated on each one. A Monte Carlo p-value for the test of no-association between species based on the statistic T can then be evaluated. First the methods used to generate the Monte Carlo data matrices will be reviewed. Then a statistical model for the 0-1 data matrices will be proposed.  The null model analysis tests will be shown to be score tests for parameters of this model. Methods to fit non null models to these data matrices, and to produce graphs summarizing the association between species, will be presented.

Closing Address
Alan Welsh (ANU)

Koji Kanefuji (ISM) :