Saturday, March 14, 2020

Why only 2 Weighted Medians at most?


It is intuitive to observe that in a distinct ordering of numbers, there are at most two medians.If the number of elements is odd, i.e. $2n-1$, then $n^{th}$ element is the median.
Whereas, if the number of elements is even, i.e $2n$, then $n^{th}$ and $(n+1)^{th}$ elements form the median.

Apart from median, there is also a concept called "weighted median".

The Definition 

In a given distinct ordering of pairs $\{(i_1 ,w_1) , (i_2, w_2) .. , (i_n, w_n)\}$:
------> Let  $i$ represent any arbitrary identity of a given element; $w$ represents the weight of that element in the sequence; and $W$ the sum of weights of all the elements
------> Then, an element $i_m$ is a weighted median if $\displaystyle\sum_{1 \le j \lt m} w_j \le \frac{W}{2}$ and $\displaystyle\sum_{m \lt j \le n} w_j \le \frac{W}{2}$

The Interpretation

One good use of weighted median is to compute the median of all the values when there are multiples/repetitions of same values in the sampled data.
So if you have $n$ values with average repetition $w_{avg}$ unordered, the median could be computed at $O(n.w_{avg})$.
But if you organize or have the data as distinct values and their repetitions, the complexity to find the median could come down to $O(n)$

Let's see why this is so:

After expanding an ordering of the form: (Lets refer to this as abridged ordering)
 $\{(i_1 ,w_1) , (i_2, w_2) .. , (i_n, w_n)\}$ .....
to an ordering of the form: (lets call this expanded ordering)
 $\{i_1, i_1,...(w_1times), i_2, i_2, ....(w_2 times), ...., i_n, i_n, ... (w_n times) \}$
The total number of elements in the ordering would be:
$\displaystyle\sum_{1\le j \le n} w_j = W$.

Now we can easily say that the median of such an ordering would be $(\frac{W+1}{2})^{th}$ element if $W$ turns out to be odd.
And if its even, the median comprises of $(\frac{W}{2})^{th}$and $(\frac{W}{2} + 1)^{th}$elements.
Let a given repetition of $i_m$ be the median in the above expanded ordering, and lets also consider $W$ to be odd.
Then, the number of elements in the expanded ordering before and after the selected instance of $i_m$ must exactly be $W/2$.
Note that, for any element other than median, this property will not hold good.
Now if we extract out the copies of $i_m$ on its left, and right, if any, we can establish that the number of  non $i_m$ elements before and after $i_m$ is at most $W/2$.
Hence, if we calculate the weighted median of abridged ordering, it would be same as the median of the expanded ordering.
Similar argument can be made for when $W$ is even, and also the special case when such ordering has two distinct medians $i_m$ and $i_{m+1}$
From this analysis on using weights as repetitions, it is easy to see, that like median, the weighted medians can also be at most two.

The observation

But what if we focus on just the plain simple definition of weighted median without interpreting the weights as repetitions or such, and were to "observe" that at most, there can just be two weighted medians.... How would we establish that fact? Well, its not too hard.
Let $i_m$ be a weighted median in a given abridged ordering, then,

$\displaystyle\sum_{1 \le j \lt m} w_j + w_m + \displaystyle\sum_{m \lt j \le n} w_j = W$

Let $A = \displaystyle\sum_{1 \le j \lt m} w_j$, and $B = \displaystyle\sum_{m \lt j \le n} w_j$

  Then $A + w_m + B = W$ and by definition of weighted median, $A \le W/2$ and $B \le W/2$

Now, if we assume that: $w_m + B \le W/2$,  then we can claim that $w_{m-1}$ is also a weighted median.

This is because: $A \le W/2 \implies A - w_{m-1} \le W/2$ and of course by our assumption, we have $w_m + B \le W/2$

We have three relations to satisfy here: $1: A + w_m + B = W; 2: w_m + B \le W/2;$ and $3: A \le W/2$. 

All the three can be satisfied if and only if $4: A = W/2$ and $5: w_m + B = W/2$

Given that we already have weighted medians $w_{m-1}$ and $w_m$, the requirements in 4: and 5: above make it impossible to have any other weighted median.
Similar argument can also be made assuming that $A + w_m \le W/2$

For me this understanding proved to be more fun than actually writing the R-Select code to find out the weighted median itself, and hence come these notes 😀Though, i actually ended up describing the algorithm here 😛




No comments: