Saturday, October 15, 2016

Desirable Properties for Data Depth Formulations

In a previous blog post here, I described data depth as a way to quantify how central or deep a member is within a distribution. While this description intuitively makes sense when we think of distributions of points in a Euclidean space, often, data depth formulations also deal with objects that are more abstract where it is not clear what could be considered "deep within a distribution". Having a clear set of desirable properties help in such cases to evaluate the utility of a depth formulation. In face such such properties, in addition to being used to characterize existing formulations of data depth, can also act as an aid for developing new depth formulations.

Zuo and Serfling proposed the following basic properties desired in any depth function. Typically depth formulations are shown to satisfy these properties under certain assumptions such as the distribution being continuous and angular symmetric. Angular symmetry just means that Prob[x] = Prob[-x] and by this definition, an angularly symmetric distribution must also have a center at the location where x=0. 

Zuo and Serfling's properties for depth formulations:

1. Null at infinity: Depth of a member falls to zero as its distance from the center of the distribution tends to infinity.
2. Maximum at center: Depth is maximum at the center of angular symmetry of the distribution.
3. Monotonicity: The depth falls off monotonically in the direction of any arbitrarily chosen center outward ray.
4. Affine invariance: The depth is invariant if the same affine transformation is performed for all members of the population.

There are also other properties associated with depth functions, in addition to those mentioned above, such as upper semicontinuity which means that the level sets of the depth evaluations across the population are a convex closure. However, I don't think these are as critical as the ones above. For example, even the popular simplicial depth formulation does not satisfy upper semicontinuity. Furthermore, the absence of upper semicontinuity in itself is not a necessarily a drawback but in fact, depending on the application, it can enable the simplicial depth to be able to capture the structure of the distribution better than formulations that satisfy upper semicontinuity!

Finally, here is the link to Zuo and Serfling's paper for the formal descriptions of these properties: Zuo, Yijun; Serfling, Robert. General notions of statistical depth function. Ann. Statist. 28 (2000)