ISM Research Memorandum
No. 1014
Title:
An invariant and metric-free proximity measure and its applications to
classification
Author(s):
Iacus, Stefano (University of Milan - Italy)
Porro, Giuseppe (University of Trieste - Italy)
Key words:
classification; recursive partitioning; average treatment effect
estimation; metric free methods
Abstract:
In this paper we present a new algorithm to
construct a proximity measure which is invariant under monotonic
transformation of the data, which handles seamlessly missing data and, more
importantly, it is free from any notion of distance. The algorithm, called
Random Recursive Partitiong (RRP), is based on an innovative use of regression
trees and makes use at most of the notion of ordering. RRP is a Monte Carlo
method on the space of all possible non-empty and recursive partitions of the
space embedding the data. In each random partition k, a dirac delta pi_{ij}^k
is set to one if observations i and j lie in a same cell. The final proximity
is obtained by averaging the values pi_{ij}^k over all the random partitions.
No formal properties of the method are known at present, therefore Monte Carlo
experiments are provided in order to explore the performance of the method in
classification applications. A companion software is freely available in the
form of a package for the R statistical environment.