Proceedings of the Institute of Statistical Mathematics Vol. 55, No. 2, 201-222(2007)

## Behavior Landscape of Palaeo-human Ecology and Its Visualization —Analysis of Lithic Distribution Using A-index and MDS—

(Faculty of Culture and Information Science, Doshisha University)

The purpose of this study is to reconstruct palaeo-human ecology and behavior based on a structural understanding of Paleolithic culture. Concentrations of lithic artifacts from the Paleolithic age are frequently unearthed and are presumed to be traces of lithic toolmaking. To introduce the quantitative method into the traditional archaeological approach, which is often metaphysically or ideologically based, A-index and multidimensional scaling were applied to the assessment of concentrations of lithic artifacts.

A-index application is one of the most useful approaches to assessing the distribution of lithic artifacts, since it is not based on a scale of space but rather on mutuality of distances. Also the A-index can calculate a similarity matrix about the spatial functions. By using this similarity matrix, a multidimensional scaling approach can be applied for reconstruction and visualization of the lithic distribution multidimensional structure, which will indicate spatial/functional similarities. These approaches were applied on the Onbara 2 prehistoric site.

As a result, some important paints became obvious. Direct visual understanding of the concentrations of lithic artifacts is a very perilous approach, for it can not discriminate between the scale of groups and the term of living, which may influence distribution. However, A-index application and them a multidimensional scaling approach could read the context, for example “lithic making”, “trifling everyday working”, and “resources burying as a depot”$\cdots$, that lies behind the distribution. The most important result is that these approaches could reconstruct a palaeo-human behavior landscape along a time line.

Key words: Palaeo-humans, A-index, multidimensional scaling, visualization, behavior landscape, concentrations of lithic artifacts.

Proceedings of the Institute of Statistical Mathematics Vol. 55, No. 2, 223-233(2007)

## A Quantitative Analysis of Portraits of Kabuki Actors

(Faculty of Culture and Information Science, Doshisha University)
In Kabuki performance, all actors are male. An actor who plays female parts is called an Oyama, and an actor who plays male parts is called a Tachiyaku. In this study, we analyzed portraits of Oyama and Tachiyaku drawn by three representative Japanese painters named Toshusai Sharaku, Utagawa Toyokuni and Utagawa Kuniyoshi.

The result derived from principal component analysis using data of angles that compose parts of faces such as eyes, nose, mouth and eyebrows, shows that these three painters made conscious efforts to exaggerate the difference between Oyama and Tachiyaku by drawing an Oyama's face as oval and a Tachiyaku's face as round.

It also became clear that the most remarkable difference was seen in portraits by Sharaku, who is said to have tried to draw the true face of an actor.

Moreover, we carried out a similar analysis on how to draw faces of males and females in the Ukiyo-e of Kitagawa Utamaro to compare the difference between drawing an Oyama's face and a Tachiyaku's face in portraits of Kabuki actors. We found that Utamaro also has a tendency to draw a female's face as oval and male's face as round, but in his portraits it was not as clear as in portraits of Kabuki actors drawn by Sharaku, Toyokuni and Kuniyoshi.

These analyses show that painters who drew Kabuki actors especially emphasized the femininity of an Oyama's face in their drawings.

Key words: Ukiyo-e, Kabuki actors, Oyama, Tachiyaku, drawing techniques for faces, principal component analysis, Toshusai Sharaku, Utagawa Toyokuni, Utagawa Kuniyoshi, Kitagawa Utamaro.

Proceedings of the Institute of Statistical Mathematics Vol. 55, No. 2, 235-254(2007)

## Analysis of Motions for Multiple Roles in Nihon Buyo —Quantitative Analysis of Leg Movement in “Hokushu”—

(Faculty of Culture and Information Science, Doshisha University)
(College of Art, Nihon University)
(College of Information Science and Engineering, Ritsumeikan University)
(College of Information Science and Engineering, Ritsumeikan University)

This study is designed to clarify, quantitatively how the eight character roles (Yukyaku, Tayu, Houkan, Bushi, Mago, Shonin, Yujo, Enja) are differentiated in terms of dancing techniques in the nihon buyo entitled “Hokushu”. The movements of two dancers trained in nihon buyo were measured by means of motion capture to compare and analyze the basic movement common to the eight character roles, i. e., walking (movements of the lower half of the body). In order to analyze the physical movements, we calculated the speed of the hips and the tips of the right and left feet (time quality), the angle of the knees and the height of the hips (spatial quality), and the acceleration of the hips (dynamic quality). The principal component analysis of the feature quantities of the movements revealed that the dancers were clearly differentiating gender, social class and scene in their depiction of the character roles. It was also revealed that the dancers were loyally observing the traditionally preserved basic technical patterns, but at the same time expressing their “individual interpretations”, “expressivities” and “proficiency” within the allowable limits.

Key words: Nihon Buyo, leg movement, motion capture, motion analysis.

Proceedings of the Institute of Statistical Mathematics Vol. 55, No. 2, 255-268(2007)

## Authorship Identification Using Random Forests

(Faculty of Culture and Information, Doshisha University)
This paper proposes the use of Random Forests (RF) for authorship identification. It also reports a comparative study between RF and the following classifiers: k Nearest Neighbor, Support Vector Machines, Learning Vector Quantization, Bagging, and Boosting (AdaBoosting). We focused on the relationship between the performance of the classifiers in authorship identification and the size of training data. In this study, the following three different styles of text were used: 200 novels written by 10 great writers, 110 compositions written by 11 undergraduates, and 60 diaries written by 6 non-eminent writers. It is shown the that Random Forests algorithm is more effective and stable than the other classifiers.

Key words: Authorship identification, text classification, stylometrics, Random Forests.

Proceedings of the Institute of Statistical Mathematics Vol. 55, No. 2, 269-284(2007)

## Teaching the Reading and Writing of Technical Papers in Japanese: A Study of Selected Conjunctive Words and Particle-phrases in Expository Writings

(Center for Japanese Studies, Keio University)

The ultimate goal for most students learning Japanese who intend to perform research in their own fields is to gain the ability to read and write technical papers. In order to achieve this, it is necessary to understand the contextual development of the text in a technical paper in Japanese. Conjunctive words and particle-phrases appearing at the surface level of expression in a text might be considered as providing important clues for understanding the structure of the text. This study aims to shed light on conjunctive words and particle-phrases that are commonly used in academic papers from different fields. As a step toward achieving this goal, 370 text samples from seven different sources were chosen as data. Those seven sources were: (1) pedagogical economics textbook; (2) papers from Economics; (3) papers from Science and Technology; (4) papers from the Journal of the Physical Society of Japan; (5) papers from Japanese Literature; (6) editorial articles; and (7) modern and contemporaneous novels. The frequencies of occurrences (per sentence) of the 65 selected conjunctive words and particle-phrases from each sample were analysed in the following two steps,

(a) An examination of the univariate distribution of the 65 conjunctive words and particle-phrases followed by a canonical discriminant analysis of 370 text samples.

(b) The same method repeated for the four genres of papers (2) (3) (4) and (5).

From these results, it was found that the various texts could be classified into their genres correctly at a high discriminant rate (84.6%) by using 19 conjunctive words and particle-phrases. The results also highlighted (i) the important conjunctive words and particle-phrases through which the texts could be discriminated into their genres; and (ii) the significant conjunctive words and particle-phrases that are commonly used in expository writings. These results revealed that conjunctive words and particle-phrases are an important indicator in distinguishing different textual genre. This study is expected to contribute to the improvement of teaching methods for Japanese for specific purposes, as it will enable us to provide a set of more objective basic data for the development of teaching materials.

Key words: Technical Japanese education, conjunctive words, particle-phrases, textual genre, canonical discriminant analysis, expository writings.

Proceedings of the Institute of Statistical Mathematics Vol. 55, No. 2, 285-310(2007)

## On the Stability of Public Opinion Data of Chinese Value Survey with Respect to Sampling Methods —A Note for the Development of Cultural Manifold Analysis (CULMAN)—

(National Institute of Science and Technology Policy,Ministry of Education, Culture, Sports, Science and Technology)
(The Institute of Statistical Mathematics)
(Research Institute for Humanity and Nature)

The main objective of this paper is to present an aspect of our research methodology for cross-national survey, Cultural Manifold Analysis, in the investigation on the stability of response data of China 2002 Survey carried out by the cross-national survey committee of the Institute of Statistical Mathematics. The main focus is on the reliability of response data of Beijing and Hong Kong. We assumed those data to be collected by three-stage random samplings: sampling of survey points proportionally to the ratios of population in the first stage, sampling of households at each of the selected points in the second stage, and sampling of a respondent at each of the selected households by a sort of birthday rule or the Kish method in the third stage. The outcomes, however, suggested that each of the final samples might have been made to be proportional to the ratios of population as if the selected sampling points were the total population, i.e., the selection probability of the numbers of households at each selected point might have been double.

We investigate the impact of probability sampling by comparing the originally collected data and the modified data that we made by re-sampling the same number of respondents at each of selected points (which we may assume to be closer to the correct probability sample). The result confirmed the stability of those data, i.e., there was no significant difference between the two sets of data in both univariate tabulation and multivariate analysis (Hayashi's quantification method III). Finally, some comments are provided for the future development of practical sampling theory in public opinion polls.

Key words: Chinese Value Survey, cultural manifold analysis (CULMAN), East Asia Value Survey, national character, nationwide statistical sampling, Science of Data.

Proceedings of the Institute of Statistical Mathematics Vol. 55, No. 2, 311-326(2007)

## Developing an Area Sample Based on Street Maps for Social Survey without Frames —A Case Study of Consciousness Survey Conducted in Tokyo—

(Research Institute for Humanity and Nature, National Institutes for the Humanities)

Restrictions on reference to the basic resident register have caused problems for the standard Japanese-style sample survey making it necessary to develop new sampling techniques that are adaptable to social change. This paper proposes an area sampling method based on street maps instead of the traditional sample frame, and verifies its efficiency and reliability through two surveys based on a two-stage sample and an area sample. The findings indicate that there is no significant difference between the traditional sample and the area sample in marginal distribution of responses, but the structure of data collected by area sample is biased toward attribute variables such as education, occupation and household income. In short, although the new area sample is useful in conducting a survey, it is necessary to pay more attention in operation procedure of sampling households and individuals, and in interviewing.

Key words: Random sample, probability sample, area sample, non-sampling error, social research.

Proceedings of the Institute of Statistical Mathematics Vol. 55, No. 2, 327-336(2007)

## A Statistical Consideration of Two-step Optimization in Parameter Design

(The Institute of Statistical Mathematics)
(Yokohama College of Pharmacy)

The Taguchi method is widely used as a parameter design that is robust to disturbances or variations due to the conditions under which products are manufactured. This paper first proposes a generalized population SN ratio based on average loss by employing the average K loss and average log quadratic loss defined in the positive region as performance measures for evaluating a “variation” relating to the scale parameter for data with positive values. We construct sample SN ratios for estimating the proposed population SN ratios and give its statistical considerations relating to the optimality of the two-step procedure in the Taguchi method.

Key words: Inverse Gaussian distribution, log-normal distribution, loss function, parameter design, SN ratios, Taguchi method, two-step procedures.

Proceedings of the Institute of Statistical Mathematics Vol. 55, No. 2, 337-348(2007)

## On the Transition of Household-size Distributions

(The Institute of Statistical Mathematics)
(College of Education, University of the Ryukyus)

Suppose that each household consists of parents and their children. We assume the following simple “Marriage-delivery” model: At time $t$, a boy and a girl are chosen at random from all children as a couple, and at time $t+1$, they marry, have $k$ new children with probability $a_k$, and make a new household of size $r=k+2$, where $k=0,1,2,3,4$. Let $\bar n_{r}(t)$ denote the mean number of households of size $r(\ge 2)$. Furthermore, suppose that the birth rates of male and female are equal, the expected value of $k$ is equal to $2+\delta (\delta\ge 0)$, and no child dies and no parents die until their children are all married. Under these assumptions, the household-size distribution is said to be in steady state if the ratio $\bar n_3(t):\bar n_4(t):\bar n_5(t):\bar n_6(t)$ is approximately constant. We estimate each $\bar n_r(t)$ in steady state, in terms of $a_k (k=0, 1, 2, 3, 4)$, $\delta$ and $t$.

Key words: Household-size distribution, marriage-delivery model.