ISM Symposium on Environmental Statistics 2024

22 March, 2024
Auditorium @ The Institute of Statistical Mathematics
This symposium will be conducted face-to-face. However, the number of participants is limited to 50 or less. If the number of participants exceeds 50, we regret to inform you that registration will be closed. If you wish to attend, please register using the Google form below.

Please note that only those who have registered will receive a PDF copy of the proceedings.
There is no registration fee.
【Main Subjects】
Statistical methods supporting environmental statistics
・Spatial statistics
・Space-time modeling
・Model selection
・Bayesian inference
・Markov Chain Monte Carlo
・Directional Statistics
Daisuke Murakami (The Institute of Statistical Mathematics)
Alan H. Welsh (The Australian National University)
Shogo Kato (The Institute of Statistical Mathematics)
Koji Kanefuji (The Institute of Statistical Mathematics)
【Senior Adviser】
Kunio Shimizu (The Institute of Statistical Mathematics)
【Invited Speakers】
Alan H. Welsh (Australian National University, Australia)
Andrew Zammit-Mangion (University of Wollongong, Australia)
Gianluca Mastrantonio (Polytechnic of Turin, Italy)
Song Xi Chen (Peking University, China)
Keiichi Fukaya (National Institute for Environmental Studies, Japan)
Daisuke Murakami (The Institute of Statistical Mathematics, Japan)
Shogo Kato (The Institute of Statistical Mathematics, Japan)
Program (* means the presenter)
10:00—10:10 Opening Address
Hiroe Tsubaki (Director-General, The Institute of Statistical Mathematics)
〈Session 1〉 Chairperson: Yoshinori Kawasaki (The Institute of Statistical Mathematics)
10:15—11:00 Insights into Small Area Estimation using the Nested Error Regression Model
Ziyang Lyu (University of New South Wales) and Alan H. Welsh* (Australian National University)

Estimating characteristics of domains (referred to as small areas) within a population from sample surveys of the population is an important problem in survey statistics. In this presentation, we consider model-based small area estimation under the nested error regression model. We discuss the construction of mixed model estimators (empirical best linear unbiased predictors, EBLUPs) of small area means and the conditional linear predictors of small area means. We obtain asymptotic results for these estimators which allow us to establish asymptotic equivalences between estimators, approximate their sampling distributions, obtain simple expressions for and construct simple estimators of their asymptotic mean squared errors, and justify asymptotic prediction intervals. We then present simulations which explore the model-based and randomization - or design - based properties of the model-based procedures. These results give us insight into the effects of treating effects as random or fixed (i.e. conditioning on them) and the use and nature of shrinkage methods.

11:00—12:10Lunch Break
〈Session 2〉 Chairperson: Shonosuke Sugasawa (Keio University)
12:10—12:55 From Dolphins to Lemurs: Two Applications of Statistical Modeling in Animal Sound Analysis
Gianluca Mastrantonio* (Polytechnic of Turin), Enrico Bibbona (Polytechnic of Turin), Hiu Ching Yip (Polytechnic of Turin), Giovanna Jona Lasinio (Sapienza University of Rome), Alessio Pollice (University of Bari Aldo Moro), Daniela Silvia Pace (University of Rome Tor Vergata), Maria Silvia Labriola (University of Rome Tor Vergata), Daria Valente (University of Turin), and Marco Gamba (University of Turin)

Sound plays a crucial role in the communication and behavior of various animal species. Understanding and analyzing the sound emitted by a species can provide valuable insights into their ecological dynamics and social interactions. In this presentation we propose the use of Gaussian processes as a powerful statistical modeling tool to capture the complex patterns and variability. By treating the sound emission process as a continuous function, we can leverage the inherent smoothness and correlation properties of Gaussian processes to capture the underlying acoustic patterns Throughout this presentation, we will outline the methodologies applied to tackle these intriguing challenges. We will discuss the data collection process for both dolphin whistle recordings and lemure spectrograms. This presentation delves into two distinct applications that utilize statistical modeling techniques to address specific questions and challenges. In the first application, we focus on analyzing the whistle sounds emitted by dolphins. These whistles are represented as a time-dependent function of their most important frequency. To determine the number of unique animals and predict future observations, a mixture model is employed. In the second application, we shift our attention to spectrograms, viewing them as processes on a two-dimensional space. It is noteworthy that these spectrograms possess discretization artifacts that need to be considered during the modeling process. Our primary objective in this dataset is to accurately describe the underlying spectrogram, particularly in relation to a specific lemure species’ distinctive call. The proposed approaches hold great promise for advancing our understanding of these animal species.

12:55—13:40 The trivariate wrapped Cauchy copula - a multi-purpose model for angular data
Shogo Kato* (The Institute of Statistical Mathematics), Christophe Ley (University of Luxembourg), Sophia Loizidou (University of Luxembourg), and Kanti V. Mardia (University of Leeds)

Toroidal data consist of observations comprising multiple angles, commonly found in environmental sciences such as wind directions and wave directions. In this talk, we propose a new distribution for three-dimensional toroidal data, which we call a trivariate wrapped Cauchy copula. The proposed copula has the following benefits: (i) a simple form of density, (ii) an adjustable degree of dependence between every pair of variables, (iii) interpretable and well-estimable parameters, (iv) well-known conditional distributions, (v) a simple data generating mechanism, (vi) unimodality. As is the case with general copula models, the proposed copulas can be extended to have any specific marginal distributions and hence can be utilized for flexible modeling. Moreover, our construction allows for linear marginals, implying that our copula can also model cylindrical data, which consist of both angular and linear observations. Parameter estimation via maximum likelihood is explained, a comparison with the competitors in the existing literature is given, and a real meteorological dataset obtained by a buoy in the Adriatic Sea is considered.

〈Session 3〉 Chairperson: Daisuke Kurisu (University of Tokyo)
13:40—14:25 Amortised inference in environmental applications
Andrew Zammit-Mangion (University of Wollongong)

Neural networks can provide solutions to tasks that were inconceivable just a few years ago and have benefitted society in numerous ways. These benefits primarily stem from a property often referred to as "amortisation": Training a neural network usually requires significant effort and resources but, once trained, the network can solve similar problems repeatedly and rapidly with virtually no additional computational cost. The substantial initial training cost of training neural networks is said to be "amortised" over time. Amortisation can also be used to enable fast inference with parametric statistical models: Once a network is trained using observational data as input and parameters as output, the network can make inference with future data in a tiny fraction of the computing time needed by conventional likelihood or Monte Carlo methods. These amortised inferential tools have several compelling advantages over classical methods: they do not require knowledge of the likelihood function, are relatively easy to implement, and facilitate inference at a substantially reduced computational cost. This makes them ideal for environmental applications involving complex models and big data, where fitting the statistical model to data is typically computationally demanding. In this talk I first justify why amortisation is possible and why it works in practice, and then discuss a geophysical spatial application that leverages this property for making fast statistical inference. I then evaluate the merits and drawbacks of amortised inference and conclude by outlining the challenges that need to be overcome for these inferential tools to gain widespread acceptance.

14:25—15:10 Scalable space-time model selection through reluctant interaction modeling
Daisuke Murakami (The Institute of Statistical Mathematics)

Spatially and temporally varying coefficient (STVC) models have been used to estimate regression coefficients smoothly varying over space and time. Existing studies typically assume a single process for each coefficient. However, in many real-world cases, multiple processes may influence on each coefficient. For example, in crime modeling, the coefficients may vary periodically in time to explain increases in crime during the day and decreases at night. Coefficients may also vary cyclically to explain the dynamics of crime risk over years. Unfortunately, consideration of multiple processes is challenging in terms of computational cost and overfitting. The objective of this study is to develop a fast and stable model selection method appropriately specifying the STVC model considering purely spatial, purely temporal, and space-time interaction processes with/without time cyclicity. To lighten the computational cost, we introduce a fast model selection procedure inspired by reluctant interaction modeling, distinguishing between interaction and non-interaction terms. Monte Carlo experiments show that the proposed method outperforms alternatives in terms of modeling accuracy and/or computational efficiency. Finally, the proposed method is applied to an air pollution analysis under the COVID-19 pandemic.

15:10—15:30Coffee Break
〈Session 4〉 Chairperson: Koji Kanefuji (The Institute of Statistical Mathematics)
15:30—16:15 Ensemble Kalman Filter for High Resolution Data Assimilation
Shouxia Wang (Peking University), Song Xi Chen* (Peking University), Hao-Xuan Sun (Peking University), and Xiaogu Zheng (Shanghai Zhangjiang Mathematics Institute and International Global Change Institute, Hamilton)

The ensemble Kalman Filter (EnKF), as a fundamental data assimilation approach, has been widely used in many fields of earth science, engineering and beyond. However, there are several unknown theoretical aspects of the EnKF, especially when the state variable is of high dimensional accompanied with high resolution observation and physical models. This paper first proposes several high dimensional EnKF methods which provide consistent estimators for the important forecast error covariance and the Kalman gain matrix. It then studies the theoretical properties of the EnKF under both the fixed and high dimensional state variables, which provides the mean square errors of the analysis states to the underlying oracle states offered by the Kalman filter and gives the much needed insight into the roles played by forecast error covariance on the accuracy of the EnKF. The accuracy of the data assimilation under the misspecified physical model is also considered. Simulation studies on the Lorenz-96 and the Shallow Water Equation models illustrate that the proposed high dimensional EnKF algorithms perform better than the standard EnKF methods as they provide more robust and accurate assimilated results.

16:15—17:00 Towards even more efficient biodiversity monitoring through eDNA metabarcoding: an approach based on site occupancy modeling
Keiichi Fukaya*, Natsuko I. Kondo, Shin-ichiro S. Matsuzaki, Taku Kadoya (National Institute for Environmental Studies)

Environmental DNA (eDNA) analysis is an increasingly widely used method for effectively assessing species distribution and diversity. This talk will introduce a multispecies site occupancy modeling framework that accounts for imperfect detection in eDNA metabarcoding (i.e., detection of DNA sequences in the environment from broad taxa based on PCR amplification and high-throughput sequencing of target sequences). The model includes submodels representing ecological processes (species occurrence) and observational processes (detection of species DNA sequences via metabarcoding) to estimate species detectability across the sequential eDNA metabarcoding workflow. In addition, the model allows a Bayesian decision analysis based on predictions of the expected number of species detected to identify effective study designs for biodiversity assessment. An R package implementing this methodology, occumb, is available from CRAN. The proposed methods and tools will help to make the application of metabarcoding more error-tolerant and efficient.

17:00—17:10 Closing Address
Shogo Kato (Director of Risk Analysis Research Center, The Institute of Statistical Mathematics)