Proceedings of the Institute of Statistical Mathematics Vol.65, No.1, 5-20 (2017)

## Modeling Intraday Stock Price Dynamics Using Diffusion Processes and Estimating Volatility and Covariation

(The Institute of Statistical Mathematics/PRESTO, Japan Science and Technology Agency)

This paper introduces modeling of intraday stock price dynamics using diffusion processes, and statistical inference of volatility and covariation. In particular, we study the problems of `market microstructure noise' and `nonsynchronous observations'. First, we look back at the history of nonparametric estimation of volatility and covariation. Then we construct maximum-likelihood-type estimators and show their asymptotic mixed normality. We also study local asymptotic mixed normality of statistical models, which is significant when we discuss asymptotic optimality of estimators. Finally, we construct a Bayes-type estimator and study its asymptotics.

Key words: Asymptotic efficiency, high-frequency data, local asymptotic mixed normality, market microstructure noise, maximum-likelihood-type estimation, nonsynchronous observations.

Proceedings of the Institute of Statistical Mathematics Vol.65, No.1, 21-38 (2017)

## On Stepwise Estimation of Lévy Driven Stochastic Differential Equation

(Graduate School of Mathematics, Kyushu University)
(Graduate School of Mathematics, Kyushu University)

We consider estimation of a non-Gaussian Lévy driven stochastic differential equation. Under high-frequency data and exponential ergodicity, we propose the stepwise estimation procedure based on a Gaussian quasi-score function: first we estimate the scale parameter while ignoring the drift coefficient, and then focus on the drift parameter by plugging in the estimated scale parameters, and derive the asymptotic normality and the tail probability estimate of the proposed estimators. This stepwise strategy not only reduces computational cost but may also stabilize estimate accuracy. Unlike the diffusion case, the asymptotic covariance matrix associated with the drift parameter takes a different form when there is a common parameter in the coefficients.

Key words: Ergodicity, Gaussian quasi-score function, high-frequency sampling, Lévy driven stochastic differential equation, stepwise estimation.

Proceedings of the Institute of Statistical Mathematics Vol.65, No.1, 39-69 (2017)

## Hybrid Estimation for Stochastic Differential Equations Based on High-frequency Data

(Graduate School of Engineering Science, Osaka University/MMDS, Osaka University/CREST)

In this paper, we survey previous researches on hybrid estimation for unknown parameters of stochastic differential equations based on high-frequency data. Using a Bayes type estimator with a non-optimal rate of convergence as the initial estimator, we obtain a multi-step estimator and an adaptive maximum likelihood type estimator, and show their asymptotic properties. For three kinds of diffusion models, ergodic diffusions, non-ergodic diffusions and small diffusions, we give some examples and simulation results.

Key words: Bayes type estimator, diffusion process, hybrid estimator, maximum likelihood type estimator, multi-step estimator, stochastic differential equation.

Proceedings of the Institute of Statistical Mathematics Vol.65, No.1, 71-85 (2017)

## Whittle Estimation for High-frequency Data

(Graduate School of Engineering Science, Osaka University/Center for Mathematical Modeling and Data Science, Osaka University)

In this paper, we consider a statistical estimation problem of the diffusion term of a continuous Itô process based on high-frequency data. This problem has been extensively studied in the literature, where it has been typically assumed that the Itô process itself is observed discretely. In particular, it is well-known that a quasi-likelihood based on the Euler-Maruyama approximation yields an asymptotically efficient estimator. Here, we study the case where the Itô process is hidden but its integrated process is observed at high-frequency. It is known that a naive method that simply uses numerical derivatives of observed integrated processes results in an inconsistent estimation. We prove a central limit theorem for quadratic forms of first-order differences of the numerical derivatives. Using Whittle's approximation of the inverse of a covariance matrix, we construct a consistent estimator of the quadratic variation of the Itô process of which the asymptotic variance is smaller than those of previously proposed estimators.

Key words: High-frequency data, Whittle estimation, central limit theorem, stable convergence, Langevin model.

Proceedings of the Institute of Statistical Mathematics Vol.65, No.1, 87-111 (2017)

## Analysis of High Frequency Reactions on Tokyo Stock Exchange

(Mitsubishi UFJ Trust Investment Technology Institute Co., Ltd.)
(Mitsubishi UFJ Trust Investment Technology Institute Co., Ltd.)

In recent stock markets, the presence of high-frequency trading (HFT) has globally increased. In the Tokyo Stock Exchange (TSE), the market share of high-frequency trading increases year after year and market players can trade stocks faster than before due to the renewal of the trading system in September 2015.
In this paper, we analyze the high-frequency order book dynamics on TSE. For this purpose, we focus on the ``short interval order'', which is submitted immediately after an order submission, and discuss its characteristics. We also analyze whether the high-frequency reaction to the order changes or not with the renewal. We find that ``short interval order'' is likely to be submitted continuously after the submission of an order that has a large impact, such as market order, and continuity is more remarkable after the renewal on TSE. Moreover, we estimate multivariate Hawkes models that represent order interval among the different order types, and we observe two kinds of players: one that immediately reacts to an order and the other that reacts about 10 milliseconds behind. We then find that the latter players increase the aggressiveness of the order submission of the specific order type after the renewal and change their order activity depending on the characteristics of limit order books.

Key words: Market microstructure, high-frequency trading, order book dynamics, multivariate Hawkes process.

Proceedings of the Institute of Statistical Mathematics Vol.65, No.1, 113-139 (2017)

## Statistical Analysis of High-frequency Limit-order Book Data:On Cross-market, Single-asset Lead-lag Relationships in the Japanese Stock Market

We are concerned with very short-term, lead-lag relationships between market prices of identical stocks traded concurrently on multiple trading venues in Japan, specifically the Tokyo Stock Exchange and two Proprietary Trading Systems, namely Japannext PTS and Chi-X Japan. In this paper, we conduct an empirical analysis with a modified version of the methodology recently proposed by Dobrev and Schaumburg (2015). This methodology focuses on the arrival times of the ``events'' for the paired point processes. That is, it utilizes (irregularly-spaced) timestamp records of the trading activities alone, and hence is not (directly) influenced by ``microstructure noise'' pertaining to the behavior of the observed, ``inefficient'' prices. As in our previous work (Hayashi, 2015, 2016) based on the methodology by Hoffmann et al. (2013), we empirically measure the magnitudes of the lead-lag times using high-frequency, limit-order book data for major Japanese stocks with milli-second time resolution obtained from the three venues.

Key words: Dobrev and Schaumburg estimator, high-frequency data, high-frequency trading, Hoffmann, Rosenbaum and Yoshida estimator, lead-lag analysis, market microstructure.

Proceedings of the Institute of Statistical Mathematics Vol.65, No.1, 141-154 (2017)

## Estimating Truncated Realized Volatility and Time Interval: Evidence from Japanese Stock Market

Many studies document jumps that are significant in asset returns by analyzing high-frequency data. The estimator of realized volatility is biased by the jumps. Truncated realized volatility is proposed to solve this problem. In this paper, the realized volatilities and the truncated realized volatilities of 100 Japanese stocks are estimated using high-frequency data from July 22nd to October 27th, 2014 at a sampling interval of from 5 to 1800 seconds. The conclusion is that Brownian motion does not dominate each stock price process. However, the truncated realized volatility become progressively smaller with decreasing sampling interval. Zero return is only one factor affecting the decreasing truncated realized volatility. Choosing the optimal sampling interval and threshold level to estimate accurate truncated realized volatility is a remaining issue.

Key words: Truncated realized volatility, high-frequency data, jump diffusion process, sampling interval, micro-price, TOPIX100.

Proceedings of the Institute of Statistical Mathematics Vol.65, No.1, 155-180 (2017)

## Volatility Forecasting with Empirical Similarity: Japanese Stock Market Case

(School of Science and Technology, Kwansei Gakuin University)
(The Institute of Statistical Mathematics/Department of Statistical Science, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies)

In this research, we compare the forecasting ability of various volatility models through within-sample and out-of-sample forecasting simulations. Models considered here are heterogeneous auto regression models (HAR), a 1/3 model where the weight coefficients are all set to 1/3 in the HAR model (ES0), and an HAR model in which weight coefficients are determined by empirical similarity. We also try AR(1), ARCH/GARCH and their variants, and models incorporating the Realized Quarticity (RQ), which are referred to as ARQ, HARQ and ESQ. As stock data, we pick 6 index series from the Tokyo Stock Exchange, and 24 individual stock series all of which had enough liquidity from April 1st 1999 to December 30th 2013. Minute-by-minute data were created based on high-frequency data. Forecasting evaluation depends on what kind of evaluation function we employ. We make use of Patton's error function. Changing the length of the estimation period and the forecasting period, and also the parameter of Patton's error function, we try 27,000 patterns of forecasting simulations. We find that ESQ and HARQ are almost comparative in within-sample forecasting, whereas ES0 is outstanding in out-of-sample forecasting experiments. We also tried model comparison based on the pair-wise testing procedure proposed by Hansen et al. We see similar results but the details are a little bit different in index series and in individual stock series.

Key words: Empirical similarity, realized measures, HARQ, ESQ, model confidence set.