Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 3-27(1999)

Hidetoshi Shimodaira

(The Institute of Statistical Mathematics)

Data analysis based on stochastic model has been shown useful in many application fields. However, it is often difficult to specify a unique good model from prior knowledge, and so we need a methodology for selecting models from data. Akaike gave the information criterion to evaluate the model in terms of prediction, and he advocated the importance of modeling in data analysis. Up to now, several kinds of information criteria have been proposed in literature, and we have to choose an appropriate one according to our purposes and the situations. In this article, we discuss the derivations of information criteria for several inference schemes. We also make some comments on the consistency of model selection. The issue of consistency concerns the limit of large sample size, but the sample size is finite in actual applications. Thus, it is important to consider the sampling error of the information criterion to evaluate the reliability (or uncertainty) of model selection. Methods such as the bootstrap selection probability, the model selection test, and the multiple comparisons of models are discussed for assessing the reliability of model selection. Further, we give a graphical method to visualize the relative locations of predictive densities for exploratory model building. Illuminating examples from variable selection in multiple regression as well as practical examples from the evolutionary tree reconstruction are given to illustrate the methodology.

**Key words: Information criterion, AIC, predictive density, variable selection, Bayes model, multiple comparisons.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 29-48(1999)

—Towards Flexible Modeling—

Shinto Eguchi

(The Institute of Statistical Mathematics)

This paper introduces a near-parametric inference to extend a working area of the usual likelihood method to a wider area where the proposed method performs well against a slight departure from assumptions for a parametric model with possible directions. A diversity of semiparametric approaches has been established in order to bridge a gap between parametric and nonparametric methods. In this approach along semiparametrics the key idea is to enlarge a parametric model into the tube neighborhood so that it may relax the inflexible relation of the parametric model with the likelihood function.

Three typical applications to near-parametric inference are given as follows: (1) Density estimation by local likelihood method is discussed, where a given model is enlarged according to a data point of which density is to be estimated. In effect a structure of incomplete observation is mounted by kernel function. In this context the structure becomes vanishing as the bandwidth becomes infinity. A large bandwidth asymptotics is discussed under near parametric situation where the underlying distribution is asymptotically reduced to the parametric one. (2) In neural computational algorithm we introduce a self-organizing rule to likelihood method by considering a latent variable indexing whether each observation comes from the assumptions in the parametric setting. In particular we present a special application to principal component analysis. The proposed algorithm is of EM-type, where the conditional probability that the respective observation is well controlled given the observation is imputed in the E step; the principal component vector on the sample covariance matrix by weighting the conditional probabilities is calculated in the M step. (3) We introduce a sensitivity approach to observational bias by modeling a selectivity parameter. The key point is that the selectivity parameter is not estimated but assessed the influence against the observational possible bias deviate from pure randomness assumption under missing or allocation sampling. A selectivity index invariant with the selectivity parametrization gives a reasonable assessment whether the observational assumption is broken down.

Through these applications an advantageous point is commonly addressed such that near parametric inference keeps the same efficiency as the parametric inference reasonably, and performs well against the departure from parametric setting.

**Key words: Local likelihood, near parametrics, observational bias, principal component analysis, selectivity parameter.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 49-61(1999)

Jinfang Wang

(The Institute of Statistical Mathematics)

This paper concerns nonlinear regression models based on estimating functions. A general estimating function, *g*(\theta), is typically nonconservative, that is, *g*(\theta) is not the gradient of any scalar function. In such cases, neither quasi-likelihood nor quasi-likelihood ratio can be uniquely defined. In this paper we study the problem of nonconservative estimating functions and the associated difficulties in general linear regression. We propose semi-parametric inference approach based on artificial likelihood functions derived from vector field decomposition associated with estimating functions. Further properties of Helmholtz-type quasi-likelihood proposed by Wang (1999) are studied. In particular, we propose a method for root-selection based on bootstrap quasi-likelihood ratio. The method is applied to logistic regression with measurement error model.

**Key words: Bootstrap, estimating function, generalized linear model, logistic regression with measurement error, multiple roots, vector field.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 63-69(1999)

Hidehiko Kamiya

(The Institute of Statistical Mathematics)

Theory of statistical inference from the point of view of invariance is based on the assumption that the underlying distributions belong to the so-called invariant probability models. This paper deals with the problem of characterization of invariant probability models. In the general setting where neither the sample space nor the parameter space is isomorphic to the acting group, a characterization is given in terms of the functional form of the densities.

**Key words: Invariant probability model, group action, orbit, global cross section, orbital decomposition, maximal invariant.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 71-79(1999)

—In Terms of Ancillarity and Sufficiency

in the Presence of a Nuisance Parameter—

Sakutaro Yamada and Toshihide Kitakado

(Department of Fisheries Resource Management, Tokyo University of Fisheries)

The estimation problem of the population size of fish by a tag experiment including incomplete reports is considered. Two parameters in this problem, the reporting rate and the population size, were estimated based on the conditional and marginal distribution, respectively, in the previous works. In this paper, justification for these estimation methods is given through some notions of ancillarity and sufficiency in the presence of a nuisance parameter.

**Key words: Ancillarity, sufficiency, nuisance parameter, incomplete observation, tag experiment.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 81-90(1999)

under a Vague Prior Distribution

Takemi Yanagimoto

(The Institute of Statistical Mathematics)

One of critiques on the statistical test procedure is raised in association with Lindley's paradox. In the test procedure the null hypothesis is rejected with the probability \alpha, which is a prefixed value named the significance level. The critique asserts that such a probability should tend to zero as the sample size tends to infinity. This requirement is called consistency of a test. The same controversy is found also in the model selection problem.

In this article we make it clear that the problem appears when the amount of information of data at hand is large while that of a prior distribution is relatively small. Then the variance of an estimator becomes much less than that of a prior distribution. Such a prior distribution is called vague. Emphasized here are: 1) Such a prior distribution is not realistic in practical scientific reasonings, 2) Careful considerations on a sample size are not taken account into, and 3) Difficulties in interpreting the posterior distribution arise. We conclude that Bayesian test will not be useful in scientific reasonings.

**Key words: Confidence interval, consistency of test, model selection, posterior distribution, statistical test.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 91-104(1999)

in a Higher-order Markov Chain

Masayuki Uchida

(The Institute of Statistical Mathematics)

Let *X*_{-m+1}, *X*_{-m+2},..., *X*_{0}, *X*_{1}, *X*_{2},... be a time-homogeneous {0, 1}-valued *m*-th order Markov chain. Distribution of the numbers of trials until the first success, i.e., geometric distribution, in the sequence *X*_{1}, *X*_{2},... is studied. Geometric distribution of order *k* in the sequence *X*_{1}, *X*_{2},... is also obtained. The probability distribution of number of "1", i.e., binomial distribution, in the sequence *X*_{1}, *X*_{2},..., *X _{n}* is studied. The probability distributions of number of runs of "1" of exact length

**Key words: Geometric distribution, binomial distribution, geometric distribution of order k, binomial distribution of order k, probability generating function, Markov chain. **

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 105-118(1999)

for Distribution Theory of Runs

Katuomi Hirano

(The Institute of Statistical Mathematics)

Sigeo Aki

(Department of Informatics and Mathematical Science, Osaka University)

Let *X*_{1}, *X*_{2},... be a sequence of independent and identically distributed {0, 1}-valued random variables, a {0, 1}-valued Markov chain or a {0, 1}-valued higher order Markov chain. Let *E*_{0} be the event that a run of "0" of length *r* occurs and let *E*_{1} be the event that a run of "1" of length *k* occurs in the sequence *X*_{1}, *X*_{2},... .

In the case of Markov dependent trials, discrete distributions related to the events *E*_{0} and *E*_{1} are studied. The probability generating functions of the distributions of the waiting times of the sooner and later occurring events are given. To obtain the probability generating function of the distribution of the number of occurrences of *E*_{1} in *X*_{1}, *X*_{2},..., *X _{n}*, we use the

The distributions of numbers of overlapping and non-overlapping occurrences of succes-runs of length *l*, and the distributions of numbers of occurrences of success-runs of exact length *l* and of length *l* or more until the first occurrence of success-run of length *k* in the *m*-th (*m* __<__ *l* < *k*) order Markov dependent trials are studied: The distribution of overlapping occurrences is the geometric distribution of order (*k* - *l* + 1), and the both of exact length *l* and of length *l* or more are the geometric distribution of order 1. To show these results we describe how to solve by means of the conditional probability generating function method. A finite-state Markov chain imbedding technique is also illustrated.

**Key words: Probability generating function, distribution theory of run, Markov dependent trials, waiting time problems.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 119-142(1999)

Gamma Function Ratios

Tadashi Matsunawa

(The Institute of Statistical Mathematics)

Tomohiro Takei

(Graduate School of Science and Engineering, Chuo University)

Several approximations to estimate the incomplete gamma function ratio \gamma(*p*, *x*) having parameter *p* > 0 and variable *x* > 0 are presented in some situations. The approximations are realized by giving double-sided inequalities as \underline{\gamma}(*p*, *x*) __<__ \gamma(*p*, *x*) __<__ \bar{\gamma}(*p*, *x*). The resultant bounds are expected to be useful to approximation problems in statistics and in related mathematical sciences. Our approaches to obtain the bounds, (a)when *p* is a positive integer and (b) when *p* is a positive general real number, are fairly different. Numerical and graphical results on the approximations are also presented.

**Key words: Incomplete gamma function ratio, approximation, double-sided estimating inequality, Maclaurin's formula, inverse factorial series, absolute convergent series, Ramanujan's conjecture.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 143-156(1999)

by Regular Variation (II)

Takaaki Shimura

(The Institute of Statistical Mathematics)

The Mellin-Stieltjes convolution (MS-convolution) and related decomposition of distributions in some classes characterized by regular variation are investigated. Maller shows that if *X* and *Y* are independent non-negative random variables with distributions \mu and \nu, respectively, and both \mu and \nu are in **D**_{2}, the domain of attraction of Gaussian distribution, then the distribution of the product *XY* (that is, the MS-convolution \mu \circ \nu of \mu and \nu) also belongs to it and he shows that if a distribution of product of two independent random variables belongs to **D**_{2} and one of them has finite variance, then the other is in **D**_{2}. He conjectures that, conversely, if \mu \circ \nu belongs to **D**_{2}, then both \mu and \nu (factors of \mu \circ \nu) are in it. The first purpose of this paper is to deal with this problem in detail. It is well-known that **D**_{2} is identical with the class of distributions whose truncated variance \int_{| t | < x }*t*^{2} \mu(*dt*) is slowly varying. We deal with the following class that is an extension of **D**_{2} : the class of distributions \mu on [0, \infty) with slowly varying \alpha-th truncated moments \int^{x}_{0} *t*^{\alpha} \mu(*dt*). Some subclasses of ** M** (\alpha) are given with the property that if \mu \circ \nu belongs to it, then \mu and \nu are in

The second purpose is to consider same problem for * D* (\alpha) (the class of distributions \mu on [0, \infty) with regularly varying tails \mu(

**Key words: Regularly varying function, Mellin-Stieltjes convolution, tail of distribution, truncated moment, decomposition of distribution.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 157-174(1999)

by Martingales and Their Statistical Applications

Yoichi Nishiyama

(The Institute of Statistical Mathematics)

The purpose of this study is to develop entropy methods, which were first introduced for empirical processes of I.I.D. data, in order to handle some martingales with applications to statistical inference for stochastic processes.

The motivation is as follows. Since the prominent work of Dudley in 1978, the entropy methods were studied to establish laws of large numbers and central limit theorems for empirical processes indexed by classes of sets or functions in the 80's. Furthermore, some recent works have shown that the methods are useful not only for those limit theorems but also for other problems in statistics. The book by van der Vaart and Wellner in 1996 gives a nice exposition of the methods as well as a lot of applications, with emphasis on I.I.D. data. However, although some parts of the methods have a good potential to be applied also for non-I.I.D. data, no systematic study has been done in the framework of martingales, which are known to be important for analyzing a rich class of statistical models. We intend to make a step to fill this gap in the literature.

Section 1 contains an intuitive explanation about generalization of Ossiander's central limit theorem. For simplicity, the rest part of the paper is devoted only to continuous local martingales and applications to the Gaussian white noise model. Based on maximal inequalities derived in Section 2, a highlight is Section 3 that gives a weak convergence theorem. By using them, we derive the asymptotic behavior of local random fields of kernel estimators, the rate of convergence of some parametric and non-parametric *M*-estimators, and the asymptotic normality of integral type estimators.

**Key words: Martingale, maximal inequality, central limit theorem, kernel estimator, change point, maximum likelihood estimator.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 175-199(1999)

Yuji Sakamoto

(Nagoya University)

Nakahiro Yoshida

(University of Tokyo)

When we derive asymptotic expansions of random variables, the smoothness of their distributions becomes a subject of discussion. In the case where the random variables are functionals of continuous-time stochastic processes, we need an infinite dimensional analysis for the study of their analytic properties, and the Malliavin calculus provides the key to the problem of the smoothness of their distributions. In the Malliavin calculus, the integration-by-parts formula plays an important role. We will first mold it for the finite dimensional case from a well-known identity, and will illustrate significance of the smoothness of the distribution in the derivation of asymptotic expansion on a finite dimensional space, with the relation to the integration-by-parts formula. Next, we will introduce the foundation of the Malliavin calculus, and will explain the theory of asymptotic expansions of the generalized Wiener functionals and their applications to the statistics. Moreover it will be shown that expansion formulas for the shrinkage estimators also follow from such a general theory as above.

**Key words: Stein's identity, integration-by-parts formula, Sobolev space, generalized Wiener functional, diffusion process, shrinkage estimator.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 201-221(1999)

Tube Method and Euler Characteristic Method

Satoshi Kuriki

(The Institute of Statistical Mathematics)

Akimichi Takemura

(Faculty of Economics, University of Tokyo)

Let *X*(*t*), *t* \in *I*, be a Gaussian random field with mean 0 and variance 1. Assume that *X*(*t*) has a representation *X*(*t*) = \sum^{p}_{i = 1} \phi^{i}(*t*) *z _{i}*, where

**Key words: Asymptotic expansion, Gauss-Bonnet theorem, integral geometry, Karhunen-Loeve expansion, tail probability, tube formula.**

Proceedings of the Institute of Statistical Mathematics Vol.47, No.1, 223-241(1999)

from Other Earthquake Clusters

Yosihiko Ogata and Tokuji Utsu

(The Institute of Statistical Mathematics)

This paper reviews our papers (Ogata et al., 1995, *Geophysical Journal International*, **121**, 233-254; 1996, *Geophysical Journal International*, **127**, 17-30). When earthquake activity begins at some place, it may be a foreshock sequence of a larger earthquake, or it may be a swarm or a simple mainshock-aftershock sequence. This paper is concerned with the conditional probability that it will be foreshock activity of a later larger earthquake, depending on the occurrence pattern of some early events in the sequence. The earthquake catalogue of the Japan Meteorological Agency (1926-1993, *M _{J}*

**Key words: Magnitude differences, multi-element prediction formula, epicentre separations, logit models, origin time spans, probability forecasts.**