Saturday, July 29, 2017

Weighted Mean Formulations of Data Depth

Data depth is a family of nonparametric methods that provide a measure of centrality by which multivariate data can be ordered. My previous post on data depth was an overview of distance based formulations. Another type of data depth method is based on weighted mean (WM) regions1. Weighted mean regions are nested convex regions that are centered around the geometric center of a distribution. These convex regions are composed of weighted means of the data members, with a general set of restrictions on the weights that ensure their nested arrangement. This arrangement of nested convex (WM) regions is then used to determine the data depth value of each data member. Various strategies of the assigning weights lead to different notions of weighted mean depths. An example of a weighted mean depth is the Zonoid depth2 can be stated as follows.


Let $x, x_1, \ldots , x_n \in \mathbb{R}^d$. Then the zonoid depth of $x$ with respect to $x_1, \ldots , x_n$ is:

$$D_{\textrm{zonoid}}(x|X) = \sup \{ \alpha : x \in D_{\alpha}(x_1, \ldots, x_n) \}$$
where
$$D_{\alpha}(x_1, \ldots, x_n) = \bigg\{ \sum_{i=1}^n \lambda_i x_i: \sum_{i=1}^n \lambda_i=1, 0\leq\lambda_i, \alpha\lambda_i \leq \frac{1}{n} \; \textrm{for all } i\bigg\}$$
Here $D_{\alpha}(\cdot)$ denotes the WM region that indicates the region with depth greater than $\alpha$ and is also known as the $\alpha$-trimmed region. Note that when $\alpha=1$, the WM region collapses to the mean of the data, while $\alpha \leq \frac{1}{n}$leads to a WM region that is the convex hull of data. Other examples of weighted mean depths include expected convex hull depth and geometrical depth.


Weighted mean based formulations of depth, in comparison to the distance based formulations, are more effective in capturing the shape of the distribution. However, weighed mean formulations are more susceptible to outliers in data as the shape of the WM regions, consequently data depth, can be strongly influenced by pathological outliers. They are also more computationally expensive and often involve solving an optimization problem.

References:

[1] Mosler, Karl. "Depth statistics." Robustness and complex data structures. Springer Berlin Heidelberg, 2013.

[2] Dyckerhoff, Rainer, Karl Mosler, and Gleb Koshevoy. "Zonoid data depth: Theory and computation." COMPSTAT. Physica-Verlag HD, 1996.

No comments:

Post a Comment