Proceedings of the Institute of Statistical Mathematics Vol. 61, No. 1, 3-16 (2013)

Interior-point Algorithms, Information Geometry and Optimization Modeling

Takashi Tsuchiya
(National Graduate Institute for Policy Studies)

This paper deals with three topics related to algorithm, mathematics and modeling aspects of convex optimization. First it deals with the information geometric framework of interior-point algorithms. The iterative complexity of interior-point algorithms is expressed in terms of an information geometric integral. Through experiments on fairly large instances, the number of iterations of the interior-point algorithm is shown to be a differential geometrical integral over the central trajectory, both in classical linear programs and in semidefinite programs. The second topic is an application of the interior-point method to estimation of a large-scale normal distribution in data assimilation. The size of the matrix is up to $34300\times 34300$ without any block structure, and the number of parameters in the model, which is compared with AIC or BIC, is up to 150381. In the last topic, we estimate the population of a village in ancient Mesopotamia called Nuzi using convex quadratic programming. The estimation is based on personal records written in Cuneiform on clay tablets excavated from the ruins.

Key words: Interior-point algorithm, information geometry, graphical model, data assimilation, Mesopotamia, Nuzi, population estimation.

Proceedings of the Institute of Statistical Mathematics Vol. 61, No. 1, 17-46 (2013)

Estimation of Large-scale Graphical Models for Data Assimilation

Genta Ueno
(The Institute of Statistical Mathematics; Japan Science and Technology Agency, PRESTO)

Covariance matrices are often introduced when we construct statistical models that have multiple variables. In the present article, I review covariance matrices used in large-scale time-series analysis, specifically data assimilation. In data assimilation, covariance matrices are introduced in order to prescribe the weights of the initial state, model dynamics, and observation, and suitable specification of the covariances is known to be essential for obtaining sensible state estimates. The covariance matrices are specified by sample covariances and converted according to an assumed covariance structure. Modeling of the covariance structure consists of the regularization of a sample covariance and the constraint of a dynamic relationship. Regularization is required for converting the singular sample covariance into a non-singular sample covariance, removing spurious correlation between variables at distant points, and reducing the number of parameters required to specify the covariances. In previous studies, regularization of sample covariances has been carried out in physical (grid) space, spectral space, and wavelet space. I herein review a method for covariance regularization in inverse space, in which we use the covariance selection model (the Gaussian graphical model). For each variable, we assume neighboring variables, i.e., a targeted variable is directly related to its neighbors and is conditionally independent of non-neighboring variables. Conditional independence is expressed by specifying zero elements in the inverse covariance matrix. The non-zero elements are estimated numerically by the maximum likelihood using Newton's method. Appropriate neighbors can be selected with the AIC or BIC information criteria. I address some techniques for implementation when the covariance matrix has a large dimension. I present an illustrative example using a simple $3\times 3$ matrix and an application to a sample covariance obtained from sea surface height (SSH) observations.

Key words: Gaussian graphical model, covariance selection, Newton's method, parallel computing, data assimilation.

Proceedings of the Institute of Statistical Mathematics Vol. 61, No. 1, 47-78 (2013)

Parallelizing the Constraint Integer Programming Solver SCIP

Yuji Shinano
(Zuse Institute Berlin; JST CREST)
Tobias Achterberg
(ILOG, IBM Deutschland GmbH)
Timo Berthold
(Zuse Institute Berlin)
Stefan Heinz
(Zuse Institute Berlin)
Thorsten Koch
(Zuse Institute Berlin)
Stefan Vigerske
(GAMS Software GmbH)
Michael Winkler
(Zuse Institute Berlin)

The paradigm of constraint integer programming (CIP) combines modeling and solving techniques from the fields of constraint programming (CP), mixed-integer programming (MIP) and satisfability problem (SAT). This paradigm allows us to address a wide range of optimization problems. \textsf{SCIP} is an implementation of the idea of CIP and is now being continuously extended by a group of researchers centered at Zuse Institute Berlin (ZIB). This paper introduces two parallel extensions of \textsf{SCIP}. One is \textsf{ParaSCIP}, which is intended to run on a large scale distributed memory computing environment, and the other is \textsf{FiberSCIP}, intended to run on a shared memory computing environment. \textsf{ParaSCIP} has been run successfully on the HLRN II supercomputer utilizing up to 7,168 cores to solve a single difficult MIP. It has also been tested on an ISM supercomputer (Fujitsu PRIMERGY RX200S5 using up to 512 cores). The previously unsolved instance dg012142 from MIPLIB2010 was solved by using the ISM supercomputer.

Key words: Mixed integer programming, constraint integer programming, SCIP, parallelization.

Proceedings of the Institute of Statistical Mathematics Vol. 61, No. 1, 79-95 (2013)

A Network Representation of Feasible Solution Space for a Subproblem of Nurse Scheduling

Hiroki Akita
(Faculty of Science and Technology, Seikei University)
Atsuko Ikegami
(Faculty of Science and Technology, Seikei University)

This paper deals with the nurse scheduling problem. When making a roster for nurses in a hospital, a scheduler has to assign the required number of nurses with specified skill levels to each shift while balancing the workload among the nurses. This problem is known to be difficult to solve using computers, because it is impossible to explicitly describe all of considerations. This paper describes a system for enumerating good solutions and a tool for clarifying the space of feasible solutions in order to help the scheduler's decision. First we focus on a subproblem to find the optimal schedule or feasible schedules for a nurse. We then construct a network in which any path from the source node to the sink node represents a feasible schedule and all the schedules are included. Given that the other nurses' schedules are fixed, we can obtain the optimal and other good schedules for each nurse efficiently by finding the shortest path and the $k$-shortest paths on the corresponding network. We also show how to construct the network for a nurse and features of the networks generated for all nurses. Though further studies are needed to reduce the size of the networks in terms of the numbers of nodes and arcs, it will be made clear that they can be of tremendous help in scheduling nurses efficiently.

Key words: Nurse scheduling, search space, visualization, network representation, dynamic programming.

Proceedings of the Institute of Statistical Mathematics Vol. 61, No. 1, 97-109 (2013)

Optimization Model for Forest Resource Management and Its Application
—Harvest Scheduling Problems—

Masashi Konoshima
(Faculty of Agriculture, University of Ryukyus)
Atsushi Yoshimoto
(The Institute of Statistical Mathematics)

Optimization modeling approaches have been widely applied in forest resource management to support decision-making in the forest industry for a long time. Optimization model formulations have been evolved and modified to address various socio-economic challenges and issues faced by communities. The spatial optimization model developed recently features a mechanism for evaluating various spatial allocation patterns of management activities and has the potential to be widely used for forest “ecosystem services” management problems. This paper reviews applications of optimization models in forest resource management, and proposes a model for resolving recent aggregation issues on forest management over multiple periods. It summarizes needs for successful application for newly emerging complex resource management problems.

Key words: Optimization model, forest resource management, linear programming, spatial structures, aggregation.

Proceedings of the Institute of Statistical Mathematics Vol. 61, No. 1, 111-122 (2013)

Convex Optimization in Measure Spaces: Discreteness and Continuity in Infinite Dimension

Satoshi Ito
(The Institute of Statistical Mathematics)

Most, if not all, optimization problems in infinite dimension can be regarded as those of some measure. A measure generally has an absolutely continuous component and a discrete component with respect to Lebesgue measure. We often observe that many optimization problems in measure spaces have discrete optimal solutions. A question then arises: in what conditions does a solution to a given class of optimization problem become discrete or absolutely continuous? As an answer to such a question, this paper is concerned with convex optimization of measures defined over some compact topological spaces. This paper is a revised version of a manuscript in the Proceedings of the 24th RAMP (Research Association of Mathematical Programming) Symposium of the Operations Research Society of Japan.

Key words: Probability measure, infinite programming, optimal control, semi-infinite programming, generalized moment problem, communication channel capacity.

Proceedings of the Institute of Statistical Mathematics Vol. 61, No. 1, 123-134 (2013)

A Review of Decoding Algorithm for LDPC Codes Based on Numerical Optimization Techniques

Tadashi Wadayama
(Nagoya Institute of Technology)

This paper reviews a decoding algorithm for low-density parity-check (LDPC) codes based on convex optimization according to the reference Wadayama (2010). The decoding algorithm, called interior point decoding, is designed for linear vector channels. The linear vector channels include many practically important channels such as inter-symbol interference channels and partial response channels. It is shown that the maximum likelihood decoding (MLD) rule for a linear vector channel can be relaxed to a convex optimization problem called a relaxed MLD problem. The decoding algorithm is based on the primal path-following interior point method with a barrier function. Approximate variations of the gradient descent and the Newton methods are used to solve the convex optimization problem.

Key words: Interior point method, convex optimization, low-density parity-check codes.

Proceedings of the Institute of Statistical Mathematics Vol. 61, No. 1, 135-146 (2013)

Molecular Electron Density Estimation with X-ray Free Electron Laser

Shiro Ikeda
(The Institute of Statistical Mathematics)
Hidetoshi Kono
(Japan Atomic Energy Agency)

New laser equipment x-ray free electron laser facilities are built in the United States, Europe, Japan, and Korea. A variety of new experiments in physics, chemistry, and biology are being carried out with this new X-ray laser, and new data will be collected. We explain one such experiment, electron density estimation from a single biological particle, and show our recent result on phase retrieval, which recovers the two-dimensional electron density from a two-dimensional X-ray diffraction pattern.

Key words: Phase retrieval, X-ray diffraction imaging, MAP estimation, sparsity.

Proceedings of the Institute of Statistical Mathematics Vol. 61, No. 1, 147-166 (2013)

Introduction to Eigenfactor

Naoki Masuda
(Graduate School of Information Science and Technology, The University of Tokyo)

A widely known index called Impact Factor quantifies the value of individual academic journals. Many people working in academy related fields refer to Impact Factor, although its plausibility is often disputed. In 2007, Eigenfactor$^{\rm TM}$, which takes advantage of the structure of a citation network among journals, was proposed. Since 2009, Eigenfactor$^{\rm TM}$ has been included in Journal Citation Reports by Thomson Reuters, which issues Impact Factor. Those who refer to these indices should be aware of their correct meanings and implications. In this article, I explain how Eigenfactor$^{\rm TM}$ works, in particular in comparison to Impact Factor. In fact, Eigenfactor$^{\rm TM}$ is essentially the same as PageRank, which Google uses for its search engine to rank websites. Therefore, I start with an extended explanation of PageRank, which is presumably more popular than Eigenfactor$^{\rm TM}$, to guide readers' understanding of Impact Factor and Eigenfactor$^{\rm TM}$. I also discuss some possible misunderstandings regarding these different indices.

Key words: Directed graph, network, PageRank, Impact Factor, centrality, bibliometrics.

Proceedings of the Institute of Statistical Mathematics Vol. 61, No. 1, 167-176 (2013)

Calculation of Neighboring Information Using Spatial Database and Its Application to Spatial Clustering of Suicide Data

Takafumi Kubota
(The Institute of Statistical Mathematics)
Tomokazu Fujino
(The International College of Arts and Sciences, Fukuoka Women's University)
Makoto Tomita
(Clinical Research Center, Tokyo Medical and Dental University Hospital of Medicine)
Fumio Ishioka
(School of Law, Okayama University)
Toshiharu Fujita

Neighboring information is necessary for analyzing lattice data. One of the challenging tasks is to calculate irregular lattice data. Therefore, the authors proposed using a spatial database to calculate neighboring information of irregular lattice data. With this proposed usage of the spatial database, the authors applied neighboring information to spatial clustering for small area data of suicide in Japan. One advantage of using the spatial database is that it can be used for different area sizes. Therefore, the authors calculated two kinds of neighborhood information that have different area sizes: municipalities and secondary medical care zones. Finally, they were applied to spatial clustering to discuss the effect of changing the area sizes in the case of suicide in the Kanto region of Japan.

Key words: Spatial database, neighboring information, spatial clustering, suicide data.