Thursday, April 15, 2021

little joy of sorting

What good is sorting unless you really need to sort? Take a look at this problem:
/**
 * Return the number of specially colored things.
 * Each thing has only one color on it.
 * 'thing' s are comparable based on their color 
 *   with a 'color' or another 'thing'.
 * 'color' s can be compared to each other as well.
 */
int find(vector special_colors, vector things)

Now we can implement it in two ways
1) The so mixed up way  :-)
int find(vector<color> special_colors, vector <thing> things) {
    int special_things = 0;
    for (auto thing: things) {
        for (auto special_color: special_colors) {
            if (thing == special_color) {
                special_things++;
            }
        }
    }
    return special_things;
}

It gets the job done, but errr, has a time complexity of O(m*n) where m = special_colors.size() and n = things.size() 

(OR)

 2) The all nice and sorted way ;-)

 int find(vector special_colors, vector things) {
     sort(special_colors.begin(), special_colors.end());
     sort(things.begin(), things.end());
     auto thing = things.begin();
     auto special_color = special_colors.begin();
     int special_things = 0;
     while (thing != things.end() && special_color != special_colors.end()) {
         if ( *thing == *special_color) {
             special_things++;
             thing++;
         } else if ( *thing < *special_color) {
             thing++;
         } else {
             special_color++;
         }
     }
     return special_things;
 } 
Now, this has an *average* time complexity of O(m*logm + n*logn) for sorting, and O(m+n) for counting (i.e., everything we do after sorting both the vectors), which is much better than O(m*n) 
 
Goes on to say, "Hey, better be sorted and save yourself some free time" ;-)

Quiz: For what kind of input, will the counting complexity be the most?

*average* - because, most often a hybrid algorithm to achieve average optimal performance is used to implement sort()