Friday, May 14, 2021

A Hare and Tortoise tale of computation

Its one of those usual days when i solved a little practice problem, and on a second look, felt like i touched my nose from around the back of my head..

The problem is this: "Given an array of positive integers, calculate the sum of all possible odd-length sub-arrays."
Below are 2 implementations to solve the problem.. 
Now place your bets on which one will be faster on a general purpose risc machine.. 
Its really like an Ingenious vs Ingenuous face-off, and who in the sane world would ever identify themself with a super brainy victor? 😉

The ingenious way: total= ∑ count[i]*val[i] where count[i] defines the number of odd length arrays that can be formed with val[i]. For indices beginning at 0, count[i] = ceil( (i+1)*(n-i ) / 2 )

The smarter code
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
int sumOddLengthSubarrays(vector<int>& arr) {
        int total = 0;
        int n = arr.size();
        for(int i = 0; i < n; i++){
            int count = (i+1)*(n-i);
            int count_odd = count/2 + count%2;
            total += count_odd*arr[i];
        }
        return total;
}

The ingenuous way: total = ∑val[i] + ∑combi_sum[i]; where combi_sum[i] is defined as the sum of all the odd length sub-arrays that end at position (i) other than the single element subarray [val[i]] and computed as below:
combi_sum[0]=combi_sum[1]=0;
combi_sum[2]=val[0]+val[1]+val[2];
For all other (i):
combi_sum[i] = ∑ {
                                num_odd_arrays_ending_at[i-2] * (val[i-1] + val[i]) + 
                                combi_sum[i-2] + 
                                val[i-2] + val[i-1] + val[i];
                               }
Wondering what nasty cocktail is that summation above?
Just go with the flow, there is explanation to follow..
The round about code
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
int sumOddLengthSubarrays(vector<int>& arr) {
        vector<int> running_sum;
        running_sum.reserve(arr.size());
        vector<int> combination_sum;
        combination_sum.reserve(arr.size());
        int total = 0;
        int i = 0;
        int k = arr.size() > 1 ? 2 : arr.size();
        for(i = 0; i < k; i++){
            total += arr[i];
            running_sum.push_back(total);
            combination_sum.push_back(0);
        }
        if(i < arr.size()){
            total += arr[i];
            running_sum.push_back(total);
            combination_sum.push_back(total);
            total += total;
            i++;
        }
        
        for(; i < arr.size(); i++){
            total += arr[i];
            running_sum.push_back(running_sum[i-1] + arr[i]);
            combination_sum.push_back(
                (((i/2) - 1) * (arr[i-1] + arr[i])) + 
                combination_sum[i-2] +
                running_sum[i] - running_sum[i-3]
            );
            total += combination_sum[i];
        }
        return total;
}
Now, which version do you think is faster on a usual computer?
i for sure thought the smarter and concise version was also the faster... but was in for a surprise trying to cheerfully test its performance.
And that too, after a recovery from the rather laborious and clumsy looking round about version.
Let me lay it out for you.. the fact is that the 'round about way', beats the 'smarter way'..
And how is it so? if you wonder, it seems to come from the cost of multiplications... not only the number of them, but also from the value of the multiplicand.
So, lets draw a table tracing through the calculations of each approach.
Lets start with the smarter and concise way first.
index 0 1 2 3 4 5 6 7 8 9 10
example data 6 5 8 7 1 2 4 3 9 11 10
cnt(all) 1*11= 11 2*10= 20 3*9 = 27 4*8 = 32 5*7 = 35 6*6 = 36 7*5 = 35 8*4 = 32 9*3 = 27 10*2 = 20 11*1= 11
cnt(odd) = ceil(cnt(all)/2) 6 10 14 16 18 18 18 16 14 10 6
cnt(odd) * val 6*6= 36
10*5= 50
14*8= 112
16*7= 112
18*1= 18
18*2= 36
18*4= 72 16*3= 48
14*9= 126
10*11= 110
6*10= 60

Total = ∑(cnt(odd)[i]*val[i]) = 780

So, for each element, we have two multiplications and one divisions. That is obvious before we drew the table.
But what we see from the table is that for an array of size 'n', the multiplicands range:
a) from 1 to n for an array of size n when calculating cnt(odd).
b) and from ceil(n/2) to ceil((n*n)/8 when calculating cnt(odd)*val
So, for an array of size 100, the minimum multiplicand is 50, and maximum is 1250.
The n*n part in the multiplicand indicates, that the multiplicand grow faster than the array size.

Now, let us look at the round about way
Lets recap the definition of calculations used
1) combi_sum[0]=combi_sum[1]=0 
2) combi_sum[2]=val[0]+val[1]+val[2]
3) For all other (i) combi_sum[i] =
                                                                num_odd_arrays_ending_at[i-2] * (val[i-1] + val[i]) +
                                                                combi_sum[i-2] + 
                                                                val[i-2] + val[i-1] + val[i]
                                                            };
A bit of explanation: 
At position (i), assume that already the sum of all the odd length arrays ending at (i-2) is computed.
Now (i), together with (i-1) will add to each of these sub-arrays.
Apart from that, (i) will also form a new odd length array with (i-2), (i-1), and itself.
In the example down below, lets consider position 6. position 6 can form an odd length array in following combinations: a) 0 to 6, b) 2 to 6, c) 4 to 6. 
The sum of those odd length arrays a), b), and c) would be ∑val[0..6], ∑val[2..6], and ∑val[4..6] respectively.
But we already calculated the sum of val[0..4] =27 plus val[2..4] =16 and stored it as combi_sum[4]=27+16=43.
a) ∑val[0..6] =    val[0..4] + val[5..6];
b) ∑val[2..6] =    val[2..4] + val[5..6]
c) ∑val[4..6]
a) + b) + c) = (val[0..4] + val[5..6]) + (∑val[2..4] + val[5..6]) + ∑val[4..6]
re-arranging we get:
a) + b) + c) = val[0..4] + val[2..4] + 2 * val[5..6] + ∑val[4..6]
But we know val[0..4] + val[2..4] = combi_sum[4], so
a) + b) + c) = combi_sum[4] + 2 * val[5..6] + ∑val[4..6]
q.e.d 😁
4) num_odd_arrays_ending_at[i-2] = (i-2)/2
5) we really don't need to, but we also maintain a running sum to skip an addition.
instead of doing: val[i-2] + val[i-1] + val[i], we do running_sum[i] - running_sum[i-3]
6) total = ∑val[i] + ∑combi_sum[i]


index 0 1 2 3 4 5 6 7 8 9 10
example data 6 5 8 7 1 2 4 3 9 11 10
running sum 6 11  
19  
26 27 29 33 36 45 56 66
combi sum 0 0 19  
20= 0*15+0 +26-6 43= 1*8+19 +27-11 33= 1*3+20 +29-19
62= 2*6+43 +33-26
56= 2*7+33 +36-27
114= 3*12+62 +45-29
139= 3*20+56 +56-33
228= 4*21+114 +66-36

Total =  running_sum[10] +  ∑(combination_sum[i]) =  780

Here, in the round about way, we surely have additional memory to buffer up the combi_sum and also running_sum, but we have only one multiplication per element, though we have a few more additions. 

The highest multiplicand is (n-2)/2 for an array of size n... compare it to n*n/8 in case of the smarter solution.

And hence, sometimes, the operations we assume wont have any noticeable performance impact, can indeed do, or can it not? Especially since the multiplication operation within wordsize is guaranteed to have same asymptotic cost as addition..

While the hare was hopping element to element without looking back much swiftly, it was kept busy in multiplication..
The tortoise was looking back at at least 3 elements but as a result kept its multiplication simpler..
And in the end it went past the finish line without losing much of its breath.

But on a third look, its not the multiplication that's the problem with the smarter way, its just the calculation of odd_count as (count/2+count%2) or ceil(count/2.0) 

Do it as odd_count = (count+1)/2. Now, both hare and rabbit are happy, they crossed the line at the same time, though they took different routes...

Time to think, whats the cost of modulo 😱

No comments: