## Current Status and Future Direction of Cancer Statistics in Japan

Accurate cancer statistics are essential for conducting an evidence-based cancer control program. Major cancer statistics consist of mortality, incidence and survival. Vital statistics and population-based cancer registries have been the only system for collecting mortality and incidence, respectively, but survival data can be collected by either population-based, hospital-based or site-specific cancer registries. Currently in 2010, 38 prefectures and 1 city are operating of population-based cancer registries, but the completeness is not sufficiently good to be used for the estimation of national cancer incidence. Only 10-15 registries have been used for this purpose. Since the settlement of the Cancer Control Act, however, the number of prefectures that actively operate population-based cancer registries has been increasing and their completeness are expected to improve because of increasing reports from designated cancer care hospitals which started hospital-based cancer registries. We need to promote mutual collaboration among population-based, hospital-based and site-specific cancer registries and data usage, especially in the field of official statistical data and mathematical modeling.

Key words: Cancer control, cancer registry, mortality, incidence, survival, prevalence.

## Overview of the Mathematical Models Used for Cancer Statistics

in the United States of America

Cancer statistics are an essential component of the development and evaluation of a national cancer control plan. In Japan, latest cancer mortality data are released after only an approximately one-year delay, while latest cancer incidence data are available after an approximately five-year delay. Thus, updating cancer incidence statistic has been a significant challenge. As of 2010, in the U.S., a state-space model has been used for a three-year prediction of cancer mortality, and a combination of a spatial model, a spatio-temporal model, and a Joinpoint regression model has been used for a four-year prediction of cancer incidence, making both real-time data available. We in Japan also need to prepare timely cancer incidence by developing a short-term prediction method, and to build an information infrastructure for evidence-based cancer control.

Key words: Cancer, incidence, mortality, prediction, state-space model, spatio-temporal model.

## Application of Transition Models to Cancer Mortality Data

Cancer mortality data are often published as specific statistics such as a table of age- and period- by public institutions. In this study, we focus on analysis and prediction of cancer mortality using data collected by MHLW (Ministry of Health, Labour and Welfare). Generally an age-period-cohort model is used in the analysis of these data, but this model has a problem of being non-identifiable due to over parameterization. Applying a transition model to these data overcomes that problem and makes it fit well. In this study we predict cancer mortality applying a state space model which is used in a time series analysis, and we compare our prediction method with the classical prediction method of the age-period-cohort model. We use the particle filter for the state space model, and we predict the death count by prediction distribution.

Key words: Transition model, age-period-cohort model, particle filter.

## Trends in Cancer Incidence in Japan from 1975–2005

A Joinpoint regression model was used to analyze the long-term trend regarding the cancer incidence rates based on the estimates as determined based on the information obtained from the Research Group for Population-based Cancer Registration in Japan. The overall cancer incidence rates increased in the most recent time period for both men (1.7% per year from 2000 to 2005) and women (2.8% per year from 1999 to 2005). An increasing trend was also confirmed for both lung and lymphoma cancer in both sexes, bladder cancer in men and pancreas, breast, uterus, ovary and kidney cancer in women during the most recent time period. On the other hand, cancers of the liver and gallbladder among both sexes, the large intestine in men and the stomach in women showed a decreasing trend, while cancers of the esophagus and leukemia for both sexes, stomach, pancreas, prostate and kidney in men, and colorectal cancer and bladder cancer in women leveled off during the most recent time period. An effective cancer control program, including primary prevention, such as the cessation of smoking and diet modifications, and early detection by the performance of mass screening, is therefore expected to reduce the cancer incidence rates, particularly for cancer sites that show either high incidence rates or increasing trends.

Key words: Cancer, incidence rates, trend, Joinpoint analysis.

## Statistical Analysis of Time Trend in Cancer Mortality

Using Time Varying Coefficient

Disease risk, which is usually quantified by incidence or mortality, may have patterns of change by region or time due to varying factors such as level of exposure to environmental hazards or lifestyle habits of residents. Our aim is to assess how regional variations of cancer mortality change over time and what characteristics influence such changes using longitudinal cancer mortality data. In this paper, we develop a statistical method for longitudinal cancer mortality data using an observation driven model with time-varying coefficient. As an illustration based on real data, we apply the proposed method to a set of prefecture-specific longitudinal data on mortality from males' large bowel cancer in Japan.

Key words: Observation driven model, generalized linear ARMA, varying coefficient, test of uniformity.

## Cancer Mortality Risk Visualization on Age-period Plane

Cancer information obtained by a suitable method becomes a basis for planning an effective cancer control program. Our purpose in this paper is to visualize cancer mortality risk as the curved surface on the age-period plane. To achieve this objective, a geographically weighted generalized linear model (GW model) may be suitable by regarding the information for age and period as the geographical one on a two-dimensional plane. On the other hand, we consider a parametric interaction model by age and period as a rival of the non-parametric GW model. Moreover, we consider one more model, that is, an age-period-cohort model, which is widely used in the analysis and prediction of longitudinal cancer data. These models are described with the script which can be executed in R and results for lung cancer mortality in Denmark for reappearance by the reader.

To verify the achievement of our objective, that is, cancer mortality risk visualization, we also apply these methods to data of liver cancer mortality in Japan. Liver cancer is said to have a cohort effect such that the birth cohort around 1935 suffers a high risk. We check whether this effect can be visualized.

Finally, we discuss not only the statistical aspect, but also future problems for our research objective. Each model has its merit, so it may be effective to use plural methods jointly to estimate the properties for cancer mortality risk.

Key words: Cancer, visualization, risk curved surface, geographically weighted generalized linear model, interaction model, age-period-cohort model, cohort effect.

## Evaluating Socio-economic Inequalities in Cancer Mortality

by Using Areal Statistics in Japan: A Note on the Relation

between Municipal Cancer Mortality and Areal Deprivation Index

Using spatial statistical modelling for disease mapping, we visualise the geographic variations in municipal cancer mortality in Japan during the period of 2003–2007 and measure its socio-economic inequalities by using a newly proposed areal deprivation index. We derive a set of weightings for census variables to construct the index from a micro-data analysis on poverty. Bayesian hierarchical Poisson regression models with a spatially structured random effect are applied to the cancer mortality data set to visualise the spatially smoothed SMR distributions and to calculate socio-economic inequality indices of mortality graded by areal deprivation. As a result, significant socio-economic inequalities of mortality are identified for most cancer mortalities by major sites. Relatively large inequalities of mortality, 20-24% excess mortality for most deprived areas compared to the least deprived ones, are found for colorectal and liver cancer for men, and lung and cervical cancer for women. We also argue possible biases in inequality measurements caused by ordinal Poisson regression models without the term of spatially structured random effect and confirm empirical differences between measured inequality indices of models with and without the term.

Key words: Hierarchical Bayesian Poisson regression model, disease mapping, relative index of inequality, areal deprivation index, cancer, SMR.

## Measuring Reduction in Cancer Risk Following

Successful Control of an Infectious Disease

Many cancers are caused by infection with carcinogenic microorganisms. Once an effective preventive or therapeutic measure against an infectious disease is invented and put into practice, the associated cancer risks also decline following successful control of the infectious disease. To evaluate countermeasures against infectious diseases using statistical records of cancer and infectious diseases, and to predict a likely impact of infectious disease control on future risks of cancer, it is essential to explicitly clarify the epidemiological dynamics of both infection and cancinogenesis processes. The present study discusses several different epidemiological measurements of disease control effort, analyzing a mathematical model and clarifying requirements of empirical data for relevant assessments. Since the risks of both infection and cancer tend to depend on calendar time over a long time period, the evaluation of countermeasures should account for both period and cohort measurements of the risks. Moreover, bridging a gap between infectious disease control and statistical assessment of cancer risks requires us to collect longitudinal data of infection dynamics. In addition to existing studies of causal association between infection and cancer, the natural history of infection-carcinogenesis-process, most notably, the time required for infected and non-infected individuals to develop cancer and age-dependency of cancer mortality, needs to be systematically quantified.

Key words: Infectious diseases, cancer, epidemiology, effectiveness, model, immunization.

## Estimation of ‘Cure’ Fraction for Cancer Patients

Using Population-based Cancer Registry Data

Five-year survival has been used as a measure for the prognosis of cancer patients. The prognosis of cancer patients varies among cancer sites; it is unknown whether the five-year follow-up is appropriate for all sites. It is important to estimate the proportion of ‘cure’ based on statistical modelling. A cure fraction model was originally proposed by Boag (1949). Use of population-based cancer registry data, however, did not become common until the late 1990s, when the concept of the relative survival was incorporated to the cure model. Recently, this method has been commonly applied to population-based cancer registry data in the US and European countries, because a methodology and statistical package to estimate ‘cure’ fraction have been developed. We introduced this method and showed an example applying the Osaka Cancer Registry data. We analysed stomach cancer patients diagnosed in 1975–2000. We monitored the time trends of cure fraction and median survival for fatal (uncured) patients for 25 years. The proportion of cure from stomach cancer increased by around 20% during those 25 years, and an estimated 50% of patients were cured. In addition, the increase of cure fraction was partly (around 20%) explained by change of the stage distribution due to earlier diagnosis. A remarkable increase of stage specific cure fraction demonstrated the improvement of treatment for stomach cancer.

Key words: Cure model, population-based cancer registry, relative survival, stomach cancer, gastric cancer.

## Effects of Learning Mathematics on Grades in Economics:

A Control Function Approach for Estimation and Pretesting

This paper estimates the average treatment effects (ATE) of learning mathematics on grades in introductory micro- and macroeconomics, using grade data of economics students at a certain university in 2005 and 2006. At this economics department, introductory micro- and macroeconomics are required, and math classes are optional. Due to self selection, one cannot simply assume exogeneity of learning math, but must consider the possibility of endogeneity (for instance, able students may be more likely to take math classes and also perform better in economics) . This paper takes a *control function approach*, using variables reflecting ability and attitude as control variables. As a pretest for controllability of endogeneity, or weak exogeneity of treatments, this paper proposes a score (LM) test, assuming a simultaneous binary probit model for outcome and treatments. The findings of the paper are twofold. (1) For introductory microeconomics in both years and macroeconomics in 2005, the ability and attitude variables (scores in the entrance exam, grades in seminar classes, etc.) can control for endogeneity of learning math. (2) Learning math increases the probabilities of passing in introductory micro- and macroeconomics by 9–15%.

Key words: Treatment effect, endogeneity, probit, score (LM) test.

## Upper Bounds on Total Variation Distance by DeRobertis Separation

The DeRobertis separation is a function of two probability distributions and it measures the difference between them. Because of some properties of the DeRobertis separation, it works well for Bayesian posterior distributions and probability distributions whose partial function is hard to compute. Given a pair of probability density functions, the DeRobertis separation is known to be an upper bound on the total variation distance, though the tightness of the bound has not been studied. In the letter, a tighter upper bound is derived and the bound is proved to be optimal. Furthermore, another upper bound is proved by using the local DeRobertis separation, which is essentially tighter than the known upper bounds.

Key words: Total variation distance, robust statistics, Bayesian posterior distribution, divergence, DeRobertis separation, local DeRobertis separation.