## Basics of Applied Spatial Statistics: Spatial Statistics and Spatial Econometrics

Applied spatial statistics mainly involves applying spatial statistics and spatial econometrics, both of which provide excellent tools for modeling geospatial data. Spatial statistics, which includes elements of geostatistics, originated in mining engineering, and spatial econometrics originated in regional science. Thus, both these fields of study developed independently of each other. However, the recent diffusion of geographic information systems (GIS) has provided researchers with access to detailed spatial datasets and allowed them, for example, to apply spatial statistics to socioeconomic data.

This paper reviews the different modeling techniques used in spatial statistics and spatial econometrics and discusses the similarities and differences among them. It also discusses the possible future direction of these fields. First, the two characteristic properties of spatial data, spatial autocorrelation and spatial heterogeneity, are explained, and the methods used to detect these properties are reviewed. Second, the differences between the two fields are highlighted, especially the differences in regard to how target space is grasped. These differences are characterized by discussing, in detail, the roles of the so-called spatial weight matrix in spatial econometrics and prediction (spatial interpolation) in spatial statistics; relevant examples are provided. Third, the possible future application of these fields to spatio-temporal data is discussed.

Key words: Spatial statistics, spatial econometrics, spatial weight matrix, stationarity, prediction.

## Time Course and Spatial Distribution of Selection Pressure on Virus Proteins and Their Adaptation to Host Population

Proteins manage their adaptations to environments and/or gains of functions by substituting amino acid sequences. Therefore, mutations on a protein-coding gene are subject to the selection pressure of their environment. The strength and character of selection pressure may vary among the spatial regions of the protein structure and the temporal domains on evolutionary process. Thus, revealing the spatio-temporal fluctuation of the selection pressure gives us greater knowledge about the adaptive evolution of the protein. In this work, we first followed an evolutionary process of the influenza A hemagglutinin for 35 years and examined its long-term adaptation to its host population. By monitoring changes in the binding ability of hemagglutinin to antibodies, we can determine the changes in the selection pressure on the hemagglutinin. Second, we developed a mathematical model that describes the population dynamics of viruses, antibodies, and normal/infected cells within a host. The coefficients describe the binding affinity between the virus and the induced antibody and that between the virus and its receptor. We estimated the effect of a mutation in a binding region on the binding affinity. Using population genetic theory, we evaluated the probability that a mutant is fixed in a host population. We simulated the adaptive evolution of coronavirus, the etiological agent of severe acute respiratory syndrome, and showed that some mutations in the binding region may have high fixation probabilities in the vaccinated host population.

Key words: Molecular evolution, selection pressure, spatial distribution, fixation probability.

## Maximum Pseudo-likelihood Analyses of Clustering Point Processes and Some Properties of Palm Intensity

The Neyman-Scott point process provides clustering models for spatial point patterns. However, their efficient estimation and goodness-of-fit by the maximum likelihood method have not been implemented. This is because point data have no distinction among clusters. The authors considered the point process of difference vectors in the original clustering point coordinates, and represent them by a likelihood function assuming that they are distributed according to a non-homogeneous Poisson intensity. We call this pseudo-likelihood Palm-type likelihood. By maximizing the logarithm of the Palm-type likelihood, we demonstrated the consistency and efficiency of the parameter estimation by numerical experiments. Recently, the maximum Palm-type likelihood method has been supported by an asymptotic theory of point processes.

The authors have further considered the more general problem for point coordinates from a mixture of different multiple Neyman-Scott point processes, to estimate the parameters of each. However, in Palm intensity solely, the parameter values of each Neyman-Scott point process cannot be identified in this general case. This is because the Palm intensity is one of the second-order statistics of point processes. Thus, we have solved the identification problem in the Palm-type likelihood function combined with a pseudo-likelihood for nearest neighbor distance distribution to estimate the parameters of each. Some theoretical research on the mixed Neyman-Scott point process is also in progress.

Key words: (Superposed) Neyman-Scott point process, Palm intensity, maximum Palm-type likelihood procedure, identification problem on point process models, maximum NND-type likelihood procedure.

## Statistical Analysis of Large Spatio-temporal Data Sets

We review various time-saving spatio-temporal statistical methodologies and discuss problems to be solved in future. First we consider covariance tapering for the best linear unbiased estimator (BLUP), which is called kriging in geostatistics. Secondly we consider likelihood approximation in both spatio-temporal and frequency domains. Thirdly we describe latent process models which reduce the number of the parameters substantially so that it is able to analyze large spatio-temporal data sets within feasible computational time. Finally we discuss open problems to be solved in future

Key words: Spatio-temporal statistical analysis, large spatio-temporal data sets, covariance tapering, latent processes.

## Predicting Waveheight Based on Ground-based Monitoring of Wind

The ocean wave is one of the physical factors that cause serious sea disasters, and its prediction provides information for various human activities related to the sea. However, the prediction has been a difficult problem until now, because it is usually difficult to carry out constant monitoring of changes in various meteorological factors relating to a sea area of interest. To solve this problem, in the present article, we develop a statistical predictor for waveheight based on changes in meteorological factors obtained by ground-based monitoring.

In our country, the Japan Meteorological Agency has set up many regional stations for ground-based meteorological monitoring from an Automated Meteorological Data Acquisition System (AMeDAS), and sensors for observing wave recorders in many coastal areas. In this article, we use measured data on wind speed and wind direction from multiple AMeDAS stations and data on waveheight obtained from a wave recorder, measured at Matsumae-oki, Hokkaido, Japan. Some preliminary statistical analyses suggest that the correlation structure obtained from the regional AMeDAS station, which gives the maximum impact on the cross correlation between the wind and the waveheight, changes over time. Therefore, we developed a space-time model for predicting the waveheight. More precisely, assuming that the change of wind direction follows von Mises process, we developed a nonstationary time series model based on wind speed, wind direction as well as waveheight, which takes into account the possibility that the wind direction changes within a short time.

In order to examine the applicability of the developed model, we carried out prediction experiments. The results suggested that the developed model can improve prediction accuracy by using the predictors based on the observation at the nearest AMeDAS station from the observation point of the waveheight. It also suggested that the above effectiveness on the prediction tends to be robust throughout a year, although there is room for improving its accuracy in winter. The developed model gave a time series structure for taking into account the von Mises process. Further consideration on the directional process which is more flexible for expressing the change of wind direction remains as a future work.

Key words: Sea state data, meteorological data, prediction, von Mises process, space-time model.

## Hotspot Detection Using Scan Method Based on Echelon Analysis

There are several approaches to detecting hotspots from different kinds of spatial data. Recently, a spatial scan statistical method for finding hotspot areas based on a likelihood ratio has been a very common and useful method. However, this method tends to detect hotspots much larger than the true hotspot. Therefore it does not always detect hotspots with high relative risk. A problem is how to scan regions that have a high likelihood ratio and relative risk. Echelon analysis is a useful technique for systematically and objectively investigating the phase-structure of spatial lattice data. In this study, we use an echelon scan method to explore hotspot regions based on spatial structure, and compare them with those detected by a previous study's method. In addition, we newly propose a method for scanning all hotspot candidate regions. Finally, we evaluate the validity of the echelon scan by comparison with all possible scans for simulated data.

Key words: Hotspot, spatial data, spatial scan statistic, Echelon analysis.

## Nonlinear Regression Modeling of Forest-coverage Ratios

The worldwide decrease of forest area is chiefly due to the negative impact of human activities, but this is restricted to topographical circumstances. In this research, detailed regression models are proposed to elucidate forest area ratio observed with a grid-cell system by two covariates: population density and relief energy, which is the difference between the maximum and minimum altitudes at each site.

Tanaka and Nishii (2009, *IEEE Transaction on Geoscience and Remote Sensing*) explored a regression model of logit-transformed forest area ratios, where the mean function is given by the sum of two non-linear parametric functions of the covariates. Based on their results, we consider the mean structure represented by the sum of two natural spline functions of the respective covariates. The spatial dependency from the first- and second-order neighborhoods is also considered. The proposed models applied to real data in Hiroshima prefecture demonstrate outstanding superiority to the previous models in terms of AIC.

Key words: Logistic regression, natural spline, deforestation, human population density, relief energy.

## Modeling Spatial Dependence in Origin-destination Flows Based on a Negative Binomial Gravity Model

Spatial econometric modeling for origin-destination flows proposed by LeSage and Pace (2008) has succeeded in overcoming the problem of the classical gravity model: non consideration of spatial dependence. The model is based on the log-normal gravity model, which assumes log-normal distribution of observed flow data; however, assuming contiguous distribution over discrete data leads to statistical problems, which results in biased and inefficient estimates, as Flowerdew and Aitkin (1982) pointed out, and modeling that assumes discrete distribution is preferable. This study proposes spatial econometric modeling for origin-destination flows based on a negative binomial gravity model: the spatial autoregressive negative binomial model. The estimation result using the 2006 interregional migration flow data of Japan shows that, compared to the nonspatial negative binomial gravity model, the proposed model improves the log likelihood and root mean squared error. An increase in the log-likelihood value of the model was also seen in LeSage and Pace (2008), indicating that their model is better than the commonly used log-normal gravity model. This similar result suggests that the proposed spatial negative binomial gravity model is credible.

Key words: Flow data, negative binomial distribution, gravity model, spatial econometrics, spatial interaction.

## Sequential Simulation of Gaussian Random Fields Based on Block Matrices

As to exact (at least for bounded range covariance function case) simulation methods of Gaussian random fields, the most direct one is based on Choleski decomposition of covariance matrices. But this method becomes intractable as the size of covariance matrix becomes larger. On the other hand, the most frequently used method based on FFT (Fast Fourier Transform) can be used only for simulations on regular grids. In this paper, we derive a general sequential formulas for conditional means and covariances of multivariate Gaussian random vectors and apply them to a sequential (blockwise) simulation method of Gaussian random fields on general type of locations. The computational cost of this method is less than that of FFT method at least theoretically.

Key words: Gaussian random fields, sequential formulas for conditional means and covariances, block matrix, sequential simulation.

## A Spatio-temporal Filtering Method Based on an AR Type Model for Analysis of Single-trial Biological Imaging Data

Regression and cross correlation analyses have been widely used to detect neural activation in dynamic brain imaging data. These analyses require a preliminarily assumed reference function, which reflects temporal changes in neural activation. In other words, only those neural activations whose temporal patterns resemble the reference function can be detected. In cases that reference functions are hardly defined, these analyses are not applicable. In our previous study, we proposed a method of spatio-temporal filtering to overcome these disadvantages. This method enables us to detect the time and region when and where dynamical state transition according to neural activation arises in repeatedly recorded data (multiple trial data). However this method cannot be directly applied to single-trial data, such as recording of spontaneous brain activity. In the present study, we have modified the spatio-temporal filtering method using a sliding time window, and shown its capability to detect neural activation in single-trial data.

Key words: Spatio-temporal filtering, innovation approach, brain functional imaging, optical imaging.

## Generalized Whittle Likelihood Method for Irregularly Spaced Data

It is sometimes difficult to evaluate the likelihood function of large irregularly spaced data. This paper proposes a method that approximates it on the frequency domain, which is considered to be a generalized version of Whittle likelihood for time series to that for irregularly spaced data. The generalized Whittle likelihood function provides a computer efficient algorithm for evaluating likelihood functions for large irregularly spaced data. After providing the consistency and asymptotic normality of the estimators that maximize the generalized Whittle likelihood function, we examine empirical performances of the Whittle likelihood method by using simulation and real data of land prices in Kanto area.

Key words: Spectral density function, periodogram, finite Fourier transform, Gram-Schmidt orthogonalization, log likelihood function, stationary spatial process.

## State-space Method for Estimating Neuronal Firing Rates

It is an important task in the field of neural coding to estimate firing rates from single spike trains. Here, we propose a state-space method for this purpose. Specifically, we develop an approximate method for computing posterior estimates of the state, an EM algorithm for estimating the model parameters, and a model selection technique based on marginal likelihood. We demonstrate the performance of the proposed method by applying it to the simulated spike trains to show the validity of the method.

Key words: Neuronal spike trains, firing rate estimation, state-space models.

## Automatic Segmentation of Mouse States Using Hidden Markov Model and Characterization of Mouse Strain Using 2-state Markov Model

Research has been conducted to find the relation between genetic character and social behavior of mice based on the consomic mouse B6-ChrN$^{\rm MSM}$, which is consomic strain between C57BL/6JJcl (B6) and MSM/Ms (MSM). A pair of genetically identical mice were put in a square open field and an expert identified segments of interactive states between the pair such as “indifference”, “sniffing”, “following”, and so on. If we had numerous pairs to observe, this task would have been a major obstacle and a laborious and time-consuming process. In this study, we automated segmentation utilizing the hidden Markov model. Specifically, we developed a model that recognized whether the pair was “indifferent” or “interactive.” The plausibility of the obtained segmentation was carefully examined from various aspects. Based on the segmentaion by the hidden Markov model, we calculated the Markov transition probability of each consomic strain. Markov transition probability is a two-dimensional quantity, and we proposed this quantity as a characteristic of social behavior of each consomic strain. Through a plot of Markov transition probability of consomic strains in a 2-dimensional plane, we observed the following two interesting facts: (i) consomic strains are located along the line connecting B6 and MSM reflecting the nature of consomic strains, (ii) of these, the Chr 6C consomic strain showed an exceptional singular behavior.

Key words: Mice, hidden Markov Model, indifferent behavior, interactive behavior, Markov transition probability.

## Rank Statistic for the Non-parametric Change Point Problem

A new interpretation for the rank statistic for the non-parametric change point problem introduced by Nishiyama (2011, *Journal of the Japan Statistical Society*) is presented. Also, the statement of the alternative is improved.

Key words: The change point problem, rank statisitc, test.

## Surveys on Ideal Persons for Elementary School Pupils at the End of the Meiji Period

— Comparison of Calibration Techniques —

This paper presents results of the surveys conducted on the ideal person for elementary school pupils, which were reported in the Yomiuri Shimbun in 1908, i.e., in the 41st year of the Meiji Period. These surveys were conducted in self-selected volunteered schools, which responded to the newspaper's advertisement. The selection of respondents was biased with regard to their gender, age, and the region they belong to. Hence, the results of the surveys were compared by various calibration weighting techniques in order to obtain the underlying rules. First, we proposed to select covariates by considering both the amount of change yielded in calibration estimation and the amount of unequal weighting effect. Second, it was illustrated that the estimates remained almost unchanged even when the distance function used in the calibration was changed. In addition, in order to restrict the maximum value of weights, it is preferable to use a logit function rather than weight trimming or using truncated distance functions. As a result of the calibration, the top five names from the surveys on ideal persons for elementary school pupils were Masashige Kusunoki, Sontoku Ninomiya, Hideyoshi Toyotomi, Toju Nakae, and Florence Nightingale.

Key words: Self-selection method, calibration adjustment of weights, unequal weighting effect.