## Dimensionality Reduction and Correlation of Biomolecular Motions

In this review, we explain dimensionality reduction techniques applied in the field of biomolecular simulation. Methods based on the linear transformation are reviewed. We also investigate how these methods use correlation for reducing the dimension of biomolecules. Our review includes functional mode analysis, principal component analysis, full correlation analysis, quasi anharmonic analysis, and independent subspace analysis. We also discuss the physical interpretation and implications of the methods.

Key words: Biopolymers, molecular dynamics, dimensionality reduction, independent component analysis, independent subspace analysis, energy landscape.

## Perturbation Analyses of Biomolecular Simulations

To understand protein functions in atomic detail, we need to investigate different main-chain/side-chain conformational states, interaction patterns to other molecules (compounds, DNA, and other proteins), and contributions of solvents (water and ions). We review our recent approaches to identify complex conformational states and their interactions by performing perturbation analyses of atomic interactions. First, we introduce PEPCA (potential energy principal component analysis), which is a perturbation analysis of atomic interactions within proteins or complex. PEPCA is implemented by performing the principal component analysis of interaction energies. PEPCA principal component scores identify conformational states that have different interaction patterns within proteins or complex. PEPCA eigenvector components identify interactions that differentiate each state. Next, we introduce DIPA (distance-dependent intermolecular perturbation analysis), which is a perturbation analysis of atomic interactions between proteins/complex and solvents. DIPA is implemented by performing the functional principal component analysis by using products of intermolecular forces and average numbers of solvent atoms. DIPA principal component scores identify conformational states that have different interaction patterns between proteins/complex and solvents. DIPA eigenfunction components identify interactions that differentiate each state. Finally, we apply PEPCA and DIPA to the chignolin folding simulation. Results are visualized by using biplots that are scatter plots of eigenvector (or eigenfunction) components and principal component scores.

Key words: Molecular dynamics simulation, perturbation analysis, principal component analysis, functional principal component analysis, biplot.

## Dynamics of Protein Structure: Analysis with Wavelet Transformation

Structural dynamics of proteins are closely related to activity of their functions. Therefore, characterizing and analyzing structural dynamics provide us with important clues to understanding the essence of these functions. However, dynamics of proteins exhibit complex temporal-spatial hierarchy, which makes it difficult to capture motion features involving multiple temporal-spatial scales. After reviewing previous approaches toward protein dynamics, this article presents a new method for extracting features of protein motion from time-series data. This novel method uses the wavelet transformation together with the singular value decomposition (SVD). The wavelet analysis enables us to characterize time varying features of the dynamics and SVD reduces the degrees of freedom of the data. We apply the method to extract structural dynamics of {\it Thermomyces lanuginosa} lipase (TLL) from time-series data of molecular dynamics.

Key words: Wavelet transform, molecular dynamics of proteins, time-series analysis, structural dynamics.

## Relaxation Mode Analysis of Biopolymer Systems

For simulation methods that satisfy the detailed balance condition, relaxation rates and modes of a system concerned can be defined from the eigenvalues and eigenvectors of the time-evolution operator, respectively. The relaxation mode analysis method, which estimates the relaxation rate distribution and relaxation modes on the basis of the variational method, is explained. Applications of the method to Monte Carlo simulation of a peptide system and molecular dynamics simulation of a protein system are reviewed.

Key words: Relaxation modes, relaxation rates, relaxation mode analysis, simulations, time-displaced correlation functions.

## Independent Component Analysis tICA to Unravel Complex Protein Motions

Molecular dynamics (MD) simulation is a powerful tool that is widely used to elucidate dynamic behavior of proteins and to reveal molecular mechanisms of their functions at an atomic resolution. Protein motions occur over a wide range of time scales, but not all are important for protein functions. Because time scales of the functions are generally longer, it would be reasonable to consider that slower motions of proteins are more relevant to their functions. To identify and examine such slow dynamics of proteins from simulation results, we recently proposed a method of time-structure based independent component analysis (tICA). Here, we review the approach of tICA and present the results of its application.

Key words: Protein dynamics, molecular dynamics simulation, independent component analysis, tICA, slow motion, rare event.

## Discrimination of Molecular States in Single-molecule Experimental Data by Statistical Data Analysis

In recent biology study, it is becoming increasingly important to understand the behavior and functions of biomolecules such as proteins. Single-molecule measurement enables us to observe individual molecules and has been widely used to investigate molecular dynamics directly. One of major problems of single-molecule experimental data is weakness of the signal obtained from only a single molecule, leading to the significant fluctuation. On the other hand, its time series can be interpreted as consecutive transitions among a limited number of molecular states in many cases. Since both of those features are stochastic processes, statistical data analysis is suitable to handle such signals. To date, a number of methods have been developed to discriminate molecular states buried in apparently noisy signals. This article introduces some statistical methods to treat experimental data of single-molecule FRET (smFRET) measurements, which can examine the structural dynamics of biomolecules. The hidden Markov model (HMM) reproduces a state transition trajectory from a time series of smFRET data. Maximum likelihood estimation (MLE) or variational Bayes (VB) inference is employed to solve the HMM. The VB-HMM is modified to treat time-dependent time-stamp signals and some examples of applications to simulated signals and experimental data are shown. Finally, I would like to mention another method to analyze time series data: change point detection.

Key words: Biomolecules, single-molecule measurement, FRET, hidden Markov model, variational Bayes, state discrimination.

## Cascade-type Massive Parallel Simulation for Protein Conformational Transition Pathway Search

Free energy landscape calculation and analysis play important roles in elucidating the relationship between protein conformation and function. Calculating the free energy landscape of proteins by using molecular dynamics simulations often requires long calculation time. This is because many degrees of freedom in proteins generate a large number of conformational substates and proteins are frequently confined in few substates. To explore protein conformational space widely and efficiently, a novel free energy calculation method is proposed as the combination of Parallel Cascade Selection Molecular Dynamics (PaCS-MD) and Markov State Model (MSM). This method is applied to folding free energy landscape analysis of a small protein as a test case.

Key words: Free energy landscape, conformational substate, molecular dynamics, folding, Markov process.

## Conformational Transition Pathways in Proteins Explored by the String Method

Conformational transitions of proteins are essential for the regulation of important chemical reactions in cells. Although all-atom molecular dynamics simulation is a powerful technique to investigate atomic details of protein dynamics, it is hard to access the time-scale of such slow transitions with conventional brute-force simulations. This article reviews a method, called the string method, which is designed for efficiently finding transition pathways in proteins. In particular, practical issues of the method, such as free energy calculations along the pathways, are discussed in detail. The method is applied to the conformational transition of Adenylate Kinase. It is shown that the transition pathway identified by the method reveals atomic details of the transition mechanism.

Key words: Rare event, string method, minimum free energy path, free energy calculation, Adenylate Kinase.

## Molecular Dynamics Using Path Sampling and Bayesian Inference

Recently ``path sampling'' methods to generate non-equilibrium path ensembles have been extensively used along with molecular dynamics simulations. These techniques are closely related to the parameter estimation using the Bayesian estimation in statistics. Reviewing related works of molecular dynamics simulations, this article explains how the dynamic parameters and/or reaction coordinates in such a molecular system can be efficiently estimated using both path sampling and Bayesian inference.

Key words: Molecular dynamics, path sampling, Bayesian inference, transition state, transition path, minimum energy path.

## Meta-analysis Based on Individual Participant Data

Meta-analysis based on individual participant data (IPD) involves obtaining and then synthesizing raw individual level data from multiple studies. IPD meta-analysis uses the most precise available information and is called a `gold standard' of meta-analysis. The use of individual participant data has various advantages in both statistical and clinical aspects, and has been increasingly applied in systematic reviews in the past 10 years. However, recent methodological researches have pointed out that several particular biases are involved in the IPD reviews. This article provides a comprehensive review for statistical methodologies and several meaningful applications of IPD meta-analysis.

Key words: Systematic review, meta-analysis, individual participant data, aggregate data, bias.