## Epidemiologic Studies with Location Information and Their Statistical Methods

In recent years, spatial data consisting of observed data with location information has been more frequently used to assess geographical variation of events and perform inter-regional comparisons of health risk. In epidemiology, spatial epidemiological studies dealing with descriptions of the geographical gap of incidence of disease and considering the geographical variation of various factors, attempts at spatial modelling for health risk estimation, and related statistical methods have attracted attention. This paper introduces some examples of epidemiological studies using spatial data, and outlines how to proceed with such research and use statistical methods.

Key words: Spatial data, spatial epidemiology, disease mapping, disease clustering, disaster, radiation.

## Theory and Methods for the Case-cohort Studies

Clinical and epidemiological cohort studies often require large numbers of subjects and long-time follow ups, and assembly of covariate histories on all cohort members can be prohibitively expensive. Case-cohort designs are common means of reducing the cost of covariate measurements for large cohort studies. This article provides a comprehensive review of theory and statistical methods for case-cohort studies. It reviews conventional methods for estimating the measures of exposure effects, risk-ratio and odds-ratio, and modified partial likelihood methods for the Cox regression model. In addition, it reviews recently developed estimating methodologies using auxiliary information of whole cohort subjects. Simulation studies using data taken from Wilms' tumor studies are provided.

Key words: Epidemiology, case-cohort design, two-phase design, weighted estimating equation, semiparametric inference, incomplete data.

## Probabilities of Causation and Their Evaluations

This paper reviews a basic framework of ``probabilities of causation'' defined by Pearl (1999), namely, probabilities that one observed event was a necessary (or sufficient, or both) cause of another. In particular, we discuss the identification problems of probabilities of causation, and formulate their bounds when the causal effect is estimable from experimental or observational studies.

Key words: Back door criterion, linear programing method, monotonicity, strongly ignorable treatment assignment (SITA).

## Estimation and Sensitivity Analysis of Direct and Indirect Effects

Estimation of causal effects of an exposure on the disease occurrence is an overarching goal of epidemiological studies. Once it is established that an exposure affects disease risk, questions often arise as to the relative importance of different possible pathways for the effect. This can be done by effect decomposition of a total effect of an exposure into a direct and an indirect effect. This article reviews statistical methods on direct and indirect effects based on a potential outcome model. It discusses the definition, assumptions, and identification formulas for direct and indirect effects. It also discusses a sensitivity analysis method for these effects that is applicable when an assumption of no unmeasured confounding is violated. We apply these methods to the 2003 US birth certificate and infant death data provided by the National Center for Health Statistics.

Key words: Causal inference, sensitivity analysis, effect decomposition, direct and indirect effects, unmeasured confounders.

## Sensitivity Analysis for Biases in Observational Studies

Most researchers recognize that conventional statistical analysis of observational data require assumptions like no selection bias, no information bias, no unmeasured, missing at random\dots etc. It is almost impossible to assess that these assumptions are met with study data. If these assumptions are unmet, the results from conventional analysis have uncertainty and are biased. In this article, we reviewed bias analysis, focusing on probabilistic sensitivity analysis.

Key words: Observational study, epidemiology, uncertainty, bias, probabilistic sensitivity analysis.

## Statistical Analysis Using Pairwise Conditional Likelihood Methods

Mantel-Haenszel estimators are well-known to the estimator for common odds ratio in the stratified contingency tables. It is not only simple estimator but also it has high efficiency. The reason why the estimator are preferable is that it can be constructed by using the pairwise conditional estimating functions. In this paper we described usefulness of the pairwise conditional likelihood methods. We provide the relationship between Mantel-Haenszel estimator and the pairwise conditional method. We also show two examples which can be applied the pairwise conditional likelihood methods.

Key words: Conditional estimation, odds ratio, Mantel-Haenszel estimator, nested case-control study, counter-matching.

## Semiparametric Statistical Methods for Missing Data

In observational studies such as cohort studies in epidemiology, results of statistical analysis can be biased for various reasons such as missing data. There are two kinds of statistical methods for missing data. One is a parametric method such as the maximum likelihood method and multiple imputation. The other is a semiparametric method such as inverse probability weighting and doubly robust estimation. This paper focuses on the latter, which has been rapidly developed in resent years especially in biostatistics. Although there have been few applications of the semiparametric method in practice, it is expected as a powerful method for missing data and its demand should increase in the future. Semiparametric statistical methods are widely used for various problems in biostatistics. The argument in this paper is based on general theory of semiparametric inference in order to help our theoretical understanding of other cases as well as missing data problems.

Key words: MAR, influence functions, estimating functions, nuisance tangent spaces, asymptotic efficiency.

## Integration of Tumor Molecular Features into Epidemiologic Studies for Assessing Etiologic Heterogeneity

Cancer epidemiologic research typically investigates the associations between exposures and risk of a disease, in which the disease of interest is treated as a single outcome. However, most cancers, including colon cancer, breast cancer and lung cancer, are comprised of a range of heterogeneous molecular and pathologic processes, likely reflecting the influences of diverse exposures. The approach, which incorporates data on the molecular and pathologic features of a disease directly into epidemiologic studies, has been increasingly recognized to better identify causal factors and better understand how potential etiologic factors influence disease development. This paper introduces the conceptual framework and methodological development for investigating etiologic heterogeneity.

Key words: Etiologic heterogeneity, subtypes, competing risks, clustering, molecular epidemiology, cancer genomics.

## Note of Risk Caused by Shortage of Margin in Commodity Futures Listed on Tokyo Commodity Exchange under the Global Financial Crisis

We propose a method of evaluating risk when prices of commodity futures listed on Tokyo Commodity Exchange (TOCOM) fluctuated widely before and after the global financial crisis in late 2008. An investor needs to deposit an additional margin when the loss of investor asset exceeds half the margin. That is defined as `Margin call risk'. If the investor wants to exit trading without depositing the additional margin, brokers must liquidate the assets of investors the next day or later. Even if the loss of assets of investors is greater than the margin, the broker must temporarily compensate for the excess of loss. After that, the investor should pay the excess to the broker. We define the shortage of margin with respect to the loss of asset as `Investors Default Risk'. In those days, in order to prevent rapid price fluctuations, TOCOM adopted the daily limit system in commodity futures trading. We define two prices, true price and observation price, in order to analyze the price fluctuation under the daily limit. When a price limit is hit, we cannot observe the true prices on limit days. Hence, using the Markov Chain Monte Carlo method, we solve interpolation of censored data simultaneously with the estimation of the parameter. Analyzing impacts of the daily limit in terms of the two types of risk via empirical study of commodity futures price data, we obtain the following results. Since the daily limit system has a function to reduce price fluctuations, there is work to suppress margin call slightly during the period in which price fluctuations are heavy. However, in order to lose the liquidation opportunity, the default risk is increased.

Key words: Commodity trading, price limit, margin call, financial crisis.