Sunday, May 16, 2021

Gist vs Details

All of us have heard that 'the devil is in the detail', and is intended to mean that we should bother about the details.

But should we really?

Lets check a counter example, which would rather re-interpret 'the devil is in the detail' as don't get into the details or you will awaken the devil 😈

It is a major efficiency issue in many 'structured' organizations, several people focusing on details and the details going ping-pong, and finally what one could implement in a day taking months.

End result is a robust bullet proof museum worthy product crafted by hundreds. Why museum worthy? Oh its market is already taken by a product quickly patched together by a dozen

Just to clear out the air, this is metaphoric writing and 'details' is quite overloaded in its meaning.

Here is the counter example problem statement:

"We are hosting a tournament with 'n' number of participants.
If 'n' is even, we conduct n/2 matches and n/2 winners advance to next round.
If 'n' is odd, we give free pass to a random participant, and conduct (n-1)/2 matches with the remaining (n-1) participants. So the random lucky participant and (n-1)/2 winners advance to the next round.
The tournament thus proceeds with remaining participants till a champion is decided.
Calculate the total number of matches that need to be held in the tournament.
The Gist way of solving this problem: 
We need a champion. How do we get a champion? After n-1 participants lose. How do they lose? By  playing a match. So if we play n-1 matches, we eliminate n-1 participants, and we have a champion.
``But wait...what about odd, even and all that?``
What about it? Does it contradict the statement that every match conducted eliminates exactly one participant? ``Hmm, no... ``
Then, hey, its (n-1)

The Details way of solving this same problem:
``What about odd, even and all that?``
We change course here, and say ``No no, lets do some detailed analysis, we want to be sure we covered the odd/even part of the requirements``

Hang on tight as we get into the "detailed analysis :)"

Detailed Case Analysis:

Part 1: Formulate basic recursion:
Lets define matches required for a given number n as m(n).
"n teams need n/2 matches if n is even" translates to
m(2n) = n + m(n) ------> Eq.1
"n teams need (n-1)/2 matches if n is odd, and one gets a free pass" translates to
m(2n+1) = n + m(n) + 1 ------> Eq.2
From Eq.1 and 2:
m(2n+1) = m(2n) + 1 ------> Eq.3
Now, this is great that we can express m(2n+1) in terms of m(2n).
If we can calculate m(2n) in closed form, we can easily calculate m(2n+1) by adding a 1 to it.
We have base cases: m(1) = 0 and m(2) = 1.
Here, Eq.1 and Eq.2 are good, but they are discontinuous, in the sense that we have to ways to move ahead based on whether number of participants in odd or even
Eq.3 is better, but its still a one way relationship in the sense that we can go from even to next odd, but we don't have a way to go from an odd to next even.

Part 2:Derive a continuous recursion
So, lets try to formulate a recursion that is valid for all 'n' in a uniform way.
Lets consider m(2n+2)...
m(2n+2) = m(2(n+1)) = n+1+m(n+1) ---> Eq.4
 
Part 2.1: n is even:
Here we consider that n is even, which means n+1 is odd.
So, m(n+1) = m(n) + 1 based on Eq.2
Eq.4 reduces to:
m(2n+2) = (n+1 + m(n)) + 1 And from Eq.2 n+1+m(n) = m(2n+1)
So, For all even 'n', m(2n+2) = m(2n+1) + 1 -----> Eq5
 
Part 2.2: n is odd:
Here, we consider that n is odd, which means n = 2x+1, from some integer 'x'.
lets substitute this in m(n+1)
Then, m(n+1) = m(2x+1+1) = m(2x+2) = m(2x+1) + 1 (from Eq5 due courtesy Part2.1)
Now, lets plug back the value of n = 2x+1
We get for all odd n, m(n+1) = m(n) + 1 ---> Eq.6
So, we have proved that:
when n is even m(n+1) = m(n) + 1 (Eq.3)
when n is odd, m(n+1) = m(n) + 1(Eq.6)
And hence, 
For all n, m(n+1) = m(n) + 1 --> Eq.6
And thus we arrive at a continuous recursion of m(n) = m(n-1) + 1
 
Part 3: Arrive at the closed form from continuous recursion  
From base cases, m(1) = 0 and expanding Eq. 6:
m(n) = 1 + m(n-2) + 1 = 1 + 1 + m(n-3) + 1 = 1 + 1 + ... (n-2) times + m(1) + 1
m(n) = (n-2) + m(1) + 1 = n-2 + 0 + 1 = n -1 
m(n) = n-1 --> Eq.7 
Q.E.D
 
And here we see just the tips of devil's horns.
The claws, the thorny mace, and other gory tools of torture lie buried in the reviewing, standardizing, platforms, and ways of working  :-)

Friday, May 14, 2021

A Hare and Tortoise tale of computation

Its one of those usual days when i solved a little practice problem, and on a second look, felt like i touched my nose from around the back of my head..

The problem is this: "Given an array of positive integers, calculate the sum of all possible odd-length sub-arrays."
Below are 2 implementations to solve the problem.. 
Now place your bets on which one will be faster on a general purpose risc machine.. 
Its really like an Ingenious vs Ingenuous face-off, and who in the sane world would ever identify themself with a super brainy victor? πŸ˜‰

The ingenious way: total= ∑ count[i]*val[i] where count[i] defines the number of odd length arrays that can be formed with val[i]. For indices beginning at 0, count[i] = ceil( (i+1)*(n-i ) / 2 )

The smarter code
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
int sumOddLengthSubarrays(vector<int>& arr) {
        int total = 0;
        int n = arr.size();
        for(int i = 0; i < n; i++){
            int count = (i+1)*(n-i);
            int count_odd = count/2 + count%2;
            total += count_odd*arr[i];
        }
        return total;
}

The ingenuous way: total = ∑val[i] + ∑combi_sum[i]; where combi_sum[i] is defined as the sum of all the odd length sub-arrays that end at position (i) other than the single element subarray [val[i]] and computed as below:
combi_sum[0]=combi_sum[1]=0;
combi_sum[2]=val[0]+val[1]+val[2];
For all other (i):
combi_sum[i] = ∑ {
                                num_odd_arrays_ending_at[i-2] * (val[i-1] + val[i]) + 
                                combi_sum[i-2] + 
                                val[i-2] + val[i-1] + val[i];
                               }
Wondering what nasty cocktail is that summation above?
Just go with the flow, there is explanation to follow..
The round about code
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
int sumOddLengthSubarrays(vector<int>& arr) {
        vector<int> running_sum;
        running_sum.reserve(arr.size());
        vector<int> combination_sum;
        combination_sum.reserve(arr.size());
        int total = 0;
        int i = 0;
        int k = arr.size() > 1 ? 2 : arr.size();
        for(i = 0; i < k; i++){
            total += arr[i];
            running_sum.push_back(total);
            combination_sum.push_back(0);
        }
        if(i < arr.size()){
            total += arr[i];
            running_sum.push_back(total);
            combination_sum.push_back(total);
            total += total;
            i++;
        }
        
        for(; i < arr.size(); i++){
            total += arr[i];
            running_sum.push_back(running_sum[i-1] + arr[i]);
            combination_sum.push_back(
                (((i/2) - 1) * (arr[i-1] + arr[i])) + 
                combination_sum[i-2] +
                running_sum[i] - running_sum[i-3]
            );
            total += combination_sum[i];
        }
        return total;
}
Now, which version do you think is faster on a usual computer?
i for sure thought the smarter and concise version was also the faster... but was in for a surprise trying to cheerfully test its performance.
And that too, after a recovery from the rather laborious and clumsy looking round about version.
Let me lay it out for you.. the fact is that the 'round about way', beats the 'smarter way'..
And how is it so? if you wonder, it seems to come from the cost of multiplications... not only the number of them, but also from the value of the multiplicand.
So, lets draw a table tracing through the calculations of each approach.
Lets start with the smarter and concise way first.
index 0 1 2 3 4 5 6 7 8 9 10
example data 6 5 8 7 1 2 4 3 9 11 10
cnt(all) 1*11= 11 2*10= 20 3*9 = 27 4*8 = 32 5*7 = 35 6*6 = 36 7*5 = 35 8*4 = 32 9*3 = 27 10*2 = 20 11*1= 11
cnt(odd) = ceil(cnt(all)/2) 6 10 14 16 18 18 18 16 14 10 6
cnt(odd) * val 6*6= 36
10*5= 50
14*8= 112
16*7= 112
18*1= 18
18*2= 36
18*4= 72 16*3= 48
14*9= 126
10*11= 110
6*10= 60

Total = ∑(cnt(odd)[i]*val[i]) = 780

So, for each element, we have two multiplications and one divisions. That is obvious before we drew the table.
But what we see from the table is that for an array of size 'n', the multiplicands range:
a) from 1 to n for an array of size n when calculating cnt(odd).
b) and from ceil(n/2) to ceil((n*n)/8 when calculating cnt(odd)*val
So, for an array of size 100, the minimum multiplicand is 50, and maximum is 1250.
The n*n part in the multiplicand indicates, that the multiplicand grow faster than the array size.

Now, let us look at the round about way
Lets recap the definition of calculations used
1) combi_sum[0]=combi_sum[1]=0 
2) combi_sum[2]=val[0]+val[1]+val[2]
3) For all other (i) combi_sum[i] =
                                                                num_odd_arrays_ending_at[i-2] * (val[i-1] + val[i]) +
                                                                combi_sum[i-2] + 
                                                                val[i-2] + val[i-1] + val[i]
                                                            };
A bit of explanation: 
At position (i), assume that already the sum of all the odd length arrays ending at (i-2) is computed.
Now (i), together with (i-1) will add to each of these sub-arrays.
Apart from that, (i) will also form a new odd length array with (i-2), (i-1), and itself.
In the example down below, lets consider position 6. position 6 can form an odd length array in following combinations: a) 0 to 6, b) 2 to 6, c) 4 to 6. 
The sum of those odd length arrays a), b), and c) would be ∑val[0..6], ∑val[2..6], and ∑val[4..6] respectively.
But we already calculated the sum of val[0..4] =27 plus val[2..4] =16 and stored it as combi_sum[4]=27+16=43.
a) ∑val[0..6] =    val[0..4] + val[5..6];
b) ∑val[2..6] =    val[2..4] + val[5..6]
c) ∑val[4..6]
a) + b) + c) = (val[0..4] + val[5..6]) + (∑val[2..4] + val[5..6]) + ∑val[4..6]
re-arranging we get:
a) + b) + c) = val[0..4] + val[2..4] + 2 * val[5..6] + ∑val[4..6]
But we know val[0..4] + val[2..4] = combi_sum[4], so
a) + b) + c) = combi_sum[4] + 2 * val[5..6] + ∑val[4..6]
q.e.d 😁
4) num_odd_arrays_ending_at[i-2] = (i-2)/2
5) we really don't need to, but we also maintain a running sum to skip an addition.
instead of doing: val[i-2] + val[i-1] + val[i], we do running_sum[i] - running_sum[i-3]
6) total = ∑val[i] + ∑combi_sum[i]


index 0 1 2 3 4 5 6 7 8 9 10
example data 6 5 8 7 1 2 4 3 9 11 10
running sum 6 11  
19  
26 27 29 33 36 45 56 66
combi sum 0 0 19  
20= 0*15+0 +26-6 43= 1*8+19 +27-11 33= 1*3+20 +29-19
62= 2*6+43 +33-26
56= 2*7+33 +36-27
114= 3*12+62 +45-29
139= 3*20+56 +56-33
228= 4*21+114 +66-36

Total =  running_sum[10] +  ∑(combination_sum[i]) =  780

Here, in the round about way, we surely have additional memory to buffer up the combi_sum and also running_sum, but we have only one multiplication per element, though we have a few more additions. 

The highest multiplicand is (n-2)/2 for an array of size n... compare it to n*n/8 in case of the smarter solution.

And hence, sometimes, the operations we assume wont have any noticeable performance impact, can indeed do, or can it not? Especially since the multiplication operation within wordsize is guaranteed to have same asymptotic cost as addition..

While the hare was hopping element to element without looking back much swiftly, it was kept busy in multiplication..
The tortoise was looking back at at least 3 elements but as a result kept its multiplication simpler..
And in the end it went past the finish line without losing much of its breath.

But on a third look, its not the multiplication that's the problem with the smarter way, its just the calculation of odd_count as (count/2+count%2) or ceil(count/2.0) 

Do it as odd_count = (count+1)/2. Now, both hare and rabbit are happy, they crossed the line at the same time, though they took different routes...

Time to think, whats the cost of modulo 😱

Thursday, May 06, 2021

Obsession Outlook Originality = Output

Yea right, the three Os of Obsession, Outlook, and Originality together can read OOO and mean that you are Out Of Office for Others πŸ˜‰

Can be on anything, but when one leads to the other in the order, it can be good, and when it feeds back, can probably be better after each cycle.

Journaling here one such cycle while working on a leet problem.

The Obsession:
 Write good code that got translated to get that "faster than 100.00%" report on submission :-) 
The problem was very simple, shuffle a std::string 's' based on vector<int> 'indices'.
ex: for input -- std::string s ("rat"); vectot<int> indices{1,0,2}; -- the output std::string t = "art"
The easiest solution for this is:
string t(s.length(), 0);
for(int i = 0; i < s.length(); i++){
    t[indices[i]] = s[i];
return t;

But here is where obsession bit because of  'faster than 95.12%' feedback... and then what crazy things i tried to get that execution speed up πŸ™ˆ..
Read the constraints again, tried to take advantage of the fact that 100 is the maximum size of the input string...  switched to cycle sort kind of inplace shuffle, even tried to pre-allocate memory for the output string, and somehow hack std::string to wrap around it.. unfortunately, or should i say, fortunately, none of that worked..

The Outlook:
Then i took a pause and asked myself, what am i doing here? what does 100% faster mean? 100% faster has tuned into an obsession because it means well written code. i choose 100% faster as a parameter to measure the quality of code, but here i am obviously destroying it for the sake of 100% faster... so i stepped back and re-asserted that no, the operation is not a success if the patient is dead... Then i was like, ok lets shake the obsession away and think slowly case by case of various approaches and possible areas of improvement..

The Originality:
The simple approach is good, and only goes through the string once, but because we construct an output std:string and there is no interface that gets it to own precreated memory, we will end up initializing the output string once, and hence it takes 2n steps of assignment and O(n) extra memory.. 
The cycle sort style inplace shuffle is good, but it can also take 2n steps of comparisons though it does only n assignments.
So, what if we can cut the number of comparisons?? 
Time to do some case analysis of cycle sort style shuffle..
1) Worst case input order:
    The worst case occurs when we need to swap n/2 elements individually with each other..
    In this case, there are n/2 cycles and after each cycle we step back to the starting point of that cycle and move past it... So it takes at least 2n comparisons no matter what.
2) Best case input order:
    The best case order is when we have a perfect cycle, where we start at the beginning and all the elements are chained together neatly ending up at the starting point... 
    This is a case that can be exploited to avoid additional n comparisons.
3) Average case input order:
   It may have some split cycles, but last cycle usually ends somewhere in the middle of the string.
   Even, this case can be exploited to avoid at least n/2 additional comparisons.. 
And that kind of thinking felt like the purpose of practicing code being met and why not don the grin if it also fetches a 'faster than 100%' on the side πŸ˜‰without forgetting to ask 'can we do better'?
 
The Output: