## Research

**Data assimilation in meteorology and oceanography****Probability projection map of climate change****Data assimilation in space science****Hardware random number generators****Advanced Monte Carlo m ethods and applications****Statistical analysis system for parallel computing environment****Towards understanding whole-brain activity in C. elegans****Live forecasting and control for biological systems****Mathematical modelling of infectious diseases**

Data assimilation deals with nonlinear state-space models with a huge number of state variables and observed variables. Model approximation and efficient algorithms are then required to complete the computation. Insufficient observation can also increase the difficulty of state estimation. Through data assimilation with an El Niño model, we proposed a method for dealing with the high dimensionality of observations by a small sample size, and a method for selecting better models for data assimilation. Bayesian statistics and high-performance computing enable us to make better predictions.

As a member of the Program for Risk Information on Climate Change (the SOUSEI Program) by MEXT in Japan, we have developed a method for probabilistically evaluating climate change using multiple climate simulation models. The method is based on a regression model including a concept of sparsity. We can discard a simulation model that does not additional explanation abilities and we can estimate probability density considering the effect of redundancy of multiple simulation models.

The magnetosphere is the outermost region of the atmosphere filled with charged particles. Although a large number of spacecraft are in operation in the Earth’s magnetosphere, spacecraft observations cover only a small portion of the magnetosphere. Thus our knowledge of the magnetosphere is as yet limited. We hence aim at obtaining a global and comprehensive view of the magnetosphere by incorporating spacecraft observations into a physical model through data assimilation techniques.

Rapid growth of orbital debris population is a serious concern for our space exploration. Even a piece of Orbital debris in 1cm size can destruct operational satellites. It is important for demonstration of mitigation and remediation of orbital debris environment to predict orbital debris population. We apply data assimilation techniques to integrate their orbit data into the population models to improve the prediction accuracy of orbital debris population.

We have designed various hardware to generate random numbers based on the Avalanche breakdown of a Zener diode. In January 2010, we established a new world record for generation speed of at least 400 MB/sec using a hardware random generator Board-A. The latest generator Board-C has broken the record, achieving over 640MB/sec. We plan to further speed up the hardware random number generators and also improve the quality of the generated numbers. Hardware random number sequences are available from our website “Random Number Library”.

We study advanced Monte Carlo methods and their applications. Recently, we showed that multicanonical and replica exchange MCMC are useful tools for sampling very rare events in various complicated systems, such as random matrices, networks, error-correcting codes, and chaotic dynamical systems. We are also interested in developing new algorithms for sequential Monte Carlo and MCMC, as well as their applications in Bayesian statistics and machine learning.

The development of computer technology enables us to perform computer-intensive data analysis including data assimilation. It has changed how to conduct researches, publish results, and disseminate methods. The open-source statistical analysis system R has especially influenced, to a great extent, both theoretical and applied statistics. Our institute hosts several kinds of supercomputers dedicated to statistical researches. It is therefore important to develop an environment for an effective and efficient use in R on our facilities. For this purpose, we improve and tune R for individual machine, as well as develop technologies to run R on different ones simultaneously.

The nervous system of living organisms realizes advanced and robust information processing in response to diverse external signals. However, there are many missing pieces of the puzzle about in the principle of action in the neuronal circuit system. To clarify the mechanism of the dynamic system, we are developing a live-cell imaging system and a data analysis pipeline that can visualize a fine texture of nervous activities of C. elegans at the molecular and cellular level.

With real-time data acquired by live-cell imaging or omics technologies, we aim to predict and control spatiotemporal dynamics of complex biological processes. We create innovative technologies of bioimage informatics, omics data analysis, mathematical modeling and parallel computation. We demonstrate the proof-of-concept of the live forecasting and control for biological systems with Bayesian statistics and data assimilation. Another objective is to identify problems from a unique perspective of statistical science. New strategic applications are pioneered in life science.

Simulations of infectious diseases assist the prediction of the epidemic evolution and the evaluation of intervention strategies. Mathematical models of infectious disease have been developed with data assimilation techniques begin incorporated to combine epidemiological data and human activity information.