Proc. Inst. Statist. Math. 73-1

Recent Advances in Spatial Statistics

Daisuke Murakami

(The Institute of Statistical Mathematics)

Statistical methods for geospatial data have been studied in spatial statistics. Especially in recent years, when spatio-temporal data have become larger and more diverse, methods that are both computationally efficient and flexible have been developed rapidly. This study therefore summarizes recent advances in spatial statistics. We first introduce basic spatial statistical models and their challenges. Next, approximations for large samples are classified into low-rank approximation, covariance approximation, and approximation of the precision matrix, and studies are reviewed for each. Then, methods for flexibility modeling spatial processes and observations are reviewed respectively. Software packages for implementing spatial statistical methods are explained after that. Finally, future directions in spatial statistics are discussed.

Key words: Spatial statistics, spatiotemporal modeling, Gaussian process, spatial correlation, neural networks.

Recent Methodological Developments in Spatial Econometrics

Hajime Seya

(Department of Civil Engineering, Graduate School of Engineering, Kobe University)

Masashi Tomari

(Department of Civil Engineering, Graduate School of Engineering, Kobe University)

Almost 50 years have passed since the emergence of the field of spatial econometrics. Standard methodologies were largely established by the 2000s, and a vast amount of empirical research has been conducted to date. This paper discusses our personal views on recent methodological developments in the field of spatial econometrics. Specifically, we review important recent methodological developments in terms of 1) identification and causal inference, 2) specification of spatial weight matrices, 3) flexible model structures, 4) spatiotemporal data modeling, and 5) dirty data modeling.

Key words: Spatial econometrics, spatial weight matrix, statistical causal inference, machine learning, dirty data, spatial panel data.

Spatio-temporal Data Analysis Using Gaussian Processes

Yusuke Tanaka

(NTT Communication Science Laboratories)

Naonori Ueda

(RIKEN Center for Advanced Intelligence Project)

A Gaussian process (GP) is a probabilistic model for estimating a function from data and has long been widely used in spatio-temporal data modeling. This paper addresses two significant problems related to spatio-temporal data analysis and GP models. The first problem is the modeling of aggregate data. Due to privacy concerns and administrative reasons, spatio-temporal data (e.g., poverty rates) obtained in cities are not associated with points but with regions. This paper introduces a way to handle aggregate data by an integral of the GPs over regions. The second problem is the modeling of dynamical systems defined by ordinary differential equations. The system's dynamics are defined by the time derivative of the state and are represented as a vector field in phase space. Although vector fields can be modeled naively using multi-output GPs, the vanilla GPs do not always satisfy physical laws. In this paper, we introduce a method for modeling vector fields that adhere to energy conservation or dissipation laws by combining the theory of Hamiltonian mechanics with GPs.

Key words: Gaussian processes, spatio-temporal data, aggregate data, dynamical systems, Hamiltonian mechanics.

Bayesian Model Synthesis for Spatial Prediction

Shonosuke Sugasawa

(Faculty of Economics, Keio University)

Estimating a model from spatial data observed at finite locations and predicting unobserved locations are the crucial tasks in spatial data analysis. Recently, various models ranging from classical geostatistics and spatial econometrics to machine learning approaches have become available. Therefore, selecting appropriate analysis methods for the data is a significant issue in spatial data analysis. This paper reviews the methodology of Bayesian model synthesis for spatial prediction in situations where multiple predictive models are obtained. Specifically, we discuss the differences between recently proposed Bayesian spatial predictive synthesis and classical synthesis methods, as well as provide explanations of concrete estimation algorithms.

Key words: Bayesian model averaging, stacking, Gaussian process, Bayesian predictive synthesis.

Deep Learning Extensions of Cox Regression and Their Applications to Rental Property Market in Tokyo

Yuji Komiyama

(Graduate School of Economics and Management, Tohoku University)

Yasumasa Matsuda

(Graduate School of Economics and Management, Tohoku University)

This paper applies Cox regression models to dataset in Tokyo rental property market collected from March, 2019 till March, 2021. We try a deep learning extension of Cox regression models to let liquidity and price elasticity of rental properties be dependent on space and time in a nonlinear way, and analyze the effects by the pandemic of COVID-19 to the rental market. We apply neural networks only to express liquidity and elasticity from which interpretability retains in our model. We find by the model applications that the pandemic makes the tendencies of decreasing liquidity increasing and elasticity all over the Tokyo 23 wards.

Key words: Cox regression, hazard function, liquidity, neural networks, price elasticity, spatio-temporal models.

Spatial Clustering Based on Topological Hierarchical Structures for Continuous Values

Fumio Ishioka

(Graduate School of Environmental and Life Science, Okayama University)

One of the concerns in spatial data analysis is determining whether the subjects of interest are concentrated in a certain region (i.e., whether spatial clustering exists). In recent years, spatial scan statistics, in which each region of spatial data is scanned based on specific rules and evaluated for spatial clustering by its likelihood, have been widely used in various fields. However, many previous studies have primarily used count data (discrete values) as their objects of interest, and the shapes of the clustering regions have also been limited. In this study, we attempt to answer the question, “Can we evaluate arbitrarily shaped spatial clustering for spatial data that does not take discrete values?” by using echelon analysis. The echelon analysis method classifies spatial data into groups of regions (echelons) composed of the same phase based on the univariate value of each region and the neighborhood information between regions, representing them as a graph with a hierarchical structure. By establishing a method for detecting spatial clustering based on this echelon analysis method, it becomes possible to discuss spatial aggregation for data such as estimates based on a certain prediction model, which are often obtained as continuous values. In this paper, the effectiveness of the proposed method is verified through numerical experiments, demonstrating that the proposed method has a more stable and higher likelihood than the conventional method and can detect spatial clustering with flexible shapes. When the proposed method was applied to the predictions by kriging, it revealed trends of spatial clustering that could not be captured by the conventional method. Furthermore, when applied to Bayesian estimates of the total fertility rate, spatial clusterings were identified in terms of both hot-spot and cold-spot clusters in geospatial data presented on a map.

Key words: Spatial clustering, spatial scan statistic, echelon analysis, echelon scan method.

Evaluation of Interactions among Tourist Spots in Nagasaki Using Location Data

Yu Ichifuji

(Graduate School of Integrated Science and Technology, Nagasaki University)

Daisuke Murakami

(The Institute of Statistical Mathematics)

Tourism is a major industry in Japan, playing a crucial role in regional revitalization, especially in local cities. Despite being heavily impacted by the novel coronavirus, tourism is starting to recover. The number of tourists is returning to pre-pandemic levels or even surpassing them. To further attract tourists and manage overtourism, it will be necessary to implement strategies and measures. However, the increase in individual travel makes it challenging to analyze overall trends and movements between tourist destinations. To address this, the study aims to propose a method for evaluating relationships between tourist destinations using location registration data from mobile carriers. Through a quantitative assessment, the study seeks to create fundamental data to support tourist attraction efforts. Our proposed method evaluates the influence between tourist destinations using the number of unique users at each destination, the number of movements between destinations, and users' residential information. It is based on a time series model assuming that the influence of an increase in the number of visitors at each location diffuses proportionally to the number of movements estimated from the location registration data. When applied to tourist destinations in Nagasaki Prefecture, the evaluation of relationships between destinations revealed differences in behavior according to residence and age group. The findings suggest that modifying advertising methods based on residence and age group could efficiently attract tourists.

Key words: Tourism, location data, personal attribute information, cross-correlation, boosting.

Relation between Nonresponse Error and the Return Timing of Questionnaires: Unit Nonresponse and Item Nonresponse in the Takatsuki Citizen Mail Survey by Takatsuki City and Kansai University

Wataru Matsumoto

(Faculty of Informatics, Kansai University)

Nonresponse includes unit nonresponse and item nonresponse. Although eliminating nonresponse errors is important, forcibly reducing nonresponse by increasing item nonresponse does has been suggested to not lead to an overall reduction in errors. This study examined the relation between the return timing of surveys and unit or item nonresponse using the “Takatsuki Citizen Mail Survey by Takatsuki City and Kansai University 2011–2022.” The results at the aggregate level show a negative correlation between the nonresponse error for the average age and the response rate and confirm that increase in unit nonresponse influences the nonresponse error. An analysis of the raw data showed that the average number of return days tended to be longer in the item nonresponse than in the completed response. Therefore, we examined the relation between item nonresponse and the number of return days, and found the possibility of a reciprocal relationship between them. On the other hand, the number of item nonresponses tended to increase when the response rate was high. In a mail survey, if the introduction is of high quality, the questionnaires are returned quickly and the response rate increases; however, the number of item nonresponses may increase because of reluctant respondents. The results confirmed the possibility that unit nonresponse represents the extreme limit of the increase in item nonresponse. Further, considering the reciprocal relationship between item nonresponse and number of return days, unit nonresponse can be viewed as an extension of a prolonged return period.

Key words: Mail survey, nonresponse error, unit nonresponse, item nonresponse, return days, response rate.