Proceedings of the Institute of Statistical Mathematics Vol.55, No.1, 3-25(2007)
We propose graphical representations on the Web using XML technology: SVG for 2D graphics and X3D for 3D graphics.
Both SVG and X3D graphics can be displayed within Web browsers such as Internet Explorer. This makes it possible to realize interactive graphics on a Web-based interactive textbook or a Web-based statistical analysis package. This paper describes the potential applications of XML, SVG and X3D and then introduces graphical representations of SVG and X3D graphics. It also introduces authoring tools for producing interactive SVG and X3D graphics. To demonstrate the advantages of the XML-based graphics format, we propose development of a Web-based textbook with interactive graphics.
Key words: SVG, X3D, Web-based graphics.
Proceedings of the Institute of Statistical Mathematics Vol.55, No.1, 27-45(2007)
This paper describes the design and the implementation of a Java library for statistical graphs. First, we consider the required functions for a modern interactive statistical graph library. Then we explain our statistical graph library, called Jasplot (Java statistical plot), as an example. Jasplot has functions for interactive operations on graphs, such as brushing and linked views. We can build new statistical graphs by combining several basic graphs and/or components such as an axis and data input functions. We can easily use Jasplot from other statistical analysis software products such as Jasp and R. We adopt “design pattern” software technology to realize such functions effectively.
Key words: Java, linked views, interactive operation, statistical graph library.
Proceedings of the Institute of Statistical Mathematics Vol.55, No.1, 47-68(2007)
The textile plot proposed by Kumasaka and Shibata (2006a, b) is a powerful tool for visualising high dimensional data. It is a modified parallel coordinate plot, where the locations and scales of each axis are simultaneously chosen so that all connecting lines, each of which signifies an observation, are aligned as horizontally as possible. A main theme of this paper is how to design an ideal environment for working with data through the textile plot. To meet various needs of working with data, the environment has to be as flexible as possible. A reference model for achieving this goal consists of a sequence of four objects; the data, the parallel coordinate, the visual analogue and the textile plot objects. A data object is transformed into a parallel coordinate object, which is a set of coordinate vectors. The visual analogue is an abstract representation of the textile plot produced. The textile plot object is a textile plot but constructed without any restriction in the real world like size of display or resolution. The user can view this object through various interfaces like zooming or resizing. Visual instructions given by the user are sent to one of the objects according to its own nature. A by-product of the design is to enable us to keep a log of visual instruction with the user in a systematic way. It can be used not only for auditing but also for helping the user to construct an appropriate model for the data.
Key words: Parallel coordinate plot, choice of locations and scales, data and attributes, visual operation, information visualisation.
Proceedings of the Institute of Statistical Mathematics Vol.55, No.1, 69-83(2007)
Parallel coordinate plot (PCP) is a standard 2-dimensional (2D) graphical tool for visualizing multivariate data at a glance. We often need to use interactive operations such as highlighting and brushed highlighting to identify each observation in 2D PCP. These operations, however, are not very useful for understanding interrelations among variables at the same time.
In this paper, we propose to extend 2D PCP to one in 3-dimensional (3D) space to show relationships among variables intuitively. Our basic idea is to use the third spatial orthogonal axis to express the results of brushed highlighting of 2D PCP. We locate line segments that represent observations with respect to values of a selected reference variable in 3D space. We illustrate that rearrangements of the order and directions of axes are useful to clearly see piece-wise linear relationships between the reference variable and other variables. We also propose to divide observations according to the values of selected variables for conditioning into several groups and draw 3D PCP separately. Such 3D PCP is useful to show non-linear interaction by the variables for conditioning. We also show the usefulness of this technique by applying it to artificial and real data.
Key words: 3 dimentional data visualization, conditional graphics, interactive operation, statistical graphics.
Proceedings of the Institute of Statistical Mathematics Vol.55, No.1, 85-100(2007)
Nowadays, improvement of information infrastructure has made it easy to obtain small area statistics of official statistics including the national population census. We can find some types of insight from this statistics, but they aren't obtained by simply looking at it. They can be only achieved after statistical analysis using a desktop GIS. However a desktop GIS with advanced functions of spatial analysis are too expensive for average users and it takes a lot of time to learn how to use the software. To solve this problem, it would be effective to make public an online analysis system for the spatial data (WebGIS) together with raw data on the Internet. In the WebGIS, many features of graphics, for example, generating graphics such as maps and statistical graphs dynamically on the server side and implementing interactive functions to the graphics on the client side, are required over the ordinary statistical analysis system on the Web. This paper introduces the WebGIS about small area statistics of Japanese national population census, which adopts SVG as a graphics format of the system. SVG has many more features with respect to the development costs and user operability than existing graphics formats.
For statistic analysis of small area statistics, because the representative points in each small area can be considered as sample data from a two-dimensional continuous random field, the predictive theory using kriging can be applied. To visualize the level of singularity of each small area, we propose using prediction error between the observed value and the predicted value by kriging for each small area as the singularity index and drawing its cholopleth map. We have implemented this function in our WebGIS.
Key words: Spatial statistics, small regional data, Internet, GIS, SVG.
Proceedings of the Institute of Statistical Mathematics Vol.55, No.1, 101-112(2007)
When we perform a geostatistical analysis, it is very important to be able to visualize the geostatistical data. By using “gstat” or “geoR”, which is the library of R, it is possible to analyze geostatistical data exploratory in R environment. However when we publish the output of geostatistical analysis, we want to handle it interactively, and to draw it on the map. Thus, we use GoogleMapsAPI and Scalable Vector Graphics (SVG) to display the plot of observed data, spatial dependence, and predicted data.
Key words: GoogleMaps, geostatistics, AJAX, SVG.
Proceedings of the Institute of Statistical Mathematics Vol.55, No.1, 113-124(2007)
Multivariate geographical data have become widely available and are being used for various purposes because of the development of information technologies. As geographical data include locations of regions, map-based graphics such as dot maps and choropleth maps are suitable for expressing them. Map-based graphics, however, are not suitable for expressing high dimensional data.
This paper proposes the use of linked statistical graphics, especially conditioned choropleth maps (CCmaps) and parallel coordinate plots (PCPs), for visualizing and analyzing multivariate geographical data.
The CCmaps is an extension of a choropleth map and arranges several choropleth maps according to conditions defined by two other variables. The PCP is appropriate for describing multivariate data in a single graph, but are not good at showing location information.
We illustrate that linked views and interactive operations on these graphics are effective for geographical data analysis. The proposed functions are implemented in a statistical analysis system Jasp.
Key words: Choropleth map, conditioned choropleth map, parallel coordinate plot, Jasp.
Proceedings of the Institute of Statistical Mathematics Vol.55, No.1, 125-142(2007)
A multivariate probit model is used to analyze multiple sets of binary data correlating each other. The purpose of this research is to study the applicability of the multivariate probit model to utilization of home long-term care services, based on questionnaire data on long-term care in the comprehensive survey of living condition of the people on health and welfare. This is a national survey in Japan carried out by the health, labor and welfare ministry, and respondents (sample) are selected nationwide. The analysis results showed that home-visit care services were often used by the elderly who were in a poor state of health and were in small-household family unit. In-facility care services were preferred by the elderly with middle-state (between poor and good) health and were in a large-household family units, and short-stay service users were preferred for every health state except good for large-household units. For jointly used combinational services, home-visit care services and in-facility care services were not often used together, because the needs for these two kinds of long-term care services are different. Since these analysis results made sense, we confirm that the multivariate probit model was well applicable to analysis long-term care utilization in the national survey data.
Key words: Comprehensive survey of living condition of the people on health and welfare, home long-term care utilization, long-term care insurance system, multivariate binary data, multivariate probit model.
Proceedings of the Institute of Statistical Mathematics Vol.55, No.1, 143-157(2007)
This paper compares two respondent selection methods within a household for random digit dialing telephone interviewing. One is a probability method, in which respondents are randomly selected according to age-order in households. The other is a non-probability method, in which respondents are selected arbitrarily. The non-probability method had the advantage of higher cooperation rate. However, it suffers more refusals in the middle of a questionnaire. The non-probability method also tended to elicit more “other” or “don't know” responses. There seemed to be no substantive difference in the attitudinal and demographic variables except household size. This is partly because about 80 percent of respondents are those who first answer the phone even with the probability method. The weighting adjustment for non-response bias via calibration technique enlarged the difference between the two selection methods.
Key words: Telephone interviewing, random digit dialing, probability sampling, non-probability sampling, calibration.
Proceedings of the Institute of Statistical Mathematics Vol.55, No.1, 159-175(2007)
When a survey questionnaire asks about sensitive topics, it can not be assumed that all respondents will give truthful answers. The item count technique, one of the most practical indirect questioning methods, requests merely the number of items applied to keep the respondents' secret. An estimate of the target key-item is derived from the differences between mean item count responses of two homogeneous samples. This paper attempts to estimate the percentages of four key-items including shoplifting by both item count and direct questioning techniques. The survey was conducted via face-to-face personal interviewing. The result suggests that the item count estimates can be unstable because of the context effect which disturbs the homogeneity of two samples.
Key words: Indirect questioning technique, privacy, shoplifting.
Proceedings of the Institute of Statistical Mathematics Vol.55, No.1, 177-195(2007)
This paper proposes a method of assimilation and estimation of packet arrival intervals and packet length. Based on the estimation method, we also propose a new traffic shaping algorithm, which enables reduction of packet loss at the traffic shaper, and maintains throughput. We adopt a multidimensional AR model with time varying coefficients to model the network traffic. In our model, the unknown variable consists of packet arrival intervals and length of arrival packets. A Kalman filter is applied to estimate the next packet's arrival time and length. Compared to the token bucket algorithm, the proposed method of traffic shaping reduces packet loss and maintains throughput by flexibly changing the token generation rate based on the estimated values. Using numerical simulations, we verify the effectiveness of the proposed method.
Key words: Time variant AR model, Kalman filter, traffic shaping.