Broad Network


Counting Elements for app.codility.com-programmers Explained in C Plus Plus

Foreword: Category of Article - Technology, Computers and Software, Algorithm

By: Chrysanthus Date Published: 24 Jan 2024

Introduction

Consider an unsorted vector of integers such as,

1670713704

There are eight elements (values) here. The highest value here is 16. The smallest value is 0. These are all positive integers. 0 is normally considered as a positive number. Each element here can be mapped to the index of another vector as follows:

value0 4 7 13 16
index012345678910111213141516

In this two-row table, the first row has the elements of the first (given) vector, but in ascending order. This table represents the second vector. Since the highest value in the first vector is 16, there are 17 cells in the second vector, indexed (numbered) from 0 to 16. The first extra cell in front in this second vector, is to account for a possible value of 0, in the first vector. The second row in this second table shows the zero based indexes for the second vector (index counting normally begins from 0 and not 1). In this second table, each value from the first vector (table), is put just above its index (in the second vector table).

Notice that there are some empty cells in the first row of this second table. Theses empty cells are for the numbers that are not found in the first vector; as the second vector has locations for possible first vector values from 0 to 16, inclusive. If there were no 0 in the first vector, its cell would still have been created in the second vector, but left empty.

The second vector is usually longer than the first vector, accounting for the values that would have been in the second vector, but not found in the first vector. The highest value in the first vector is 16, and so the second vector needs to have 17 cells, indexed from 0 to 16. In general, if the highest integer in the first vector is m, then the second vector needs to have m+1 cells, indexed from 0 to m.

The topic for this lesson (tutorial) is Counting Elements. The elements can be characters or strings, or some other similar entities, that are countable. It is the number of occurrence of each element in the first vector, that is to be counted.

Notice that in the first vector, there are 3 sevens, 2 zeros, 1 sixteen, 1 thirteen, and 1 four, giving 8 elements in total. This particular second table above is not really helpful, so far as counting the number of occurrences of each value in the first vector, is concerned. It would have been better, if in the second table above, the number of occurrences of each number was put just above its index for the second vector.

That is, instead of putting 7 above the index 7, 3 should have been put as 7 occurs three times in the first vector. Instead of putting 0 above index 0, 2 should have been put since 0 occurs two times, in the first vector. Instead of putting 16 above index 16, 1 should have been put since 16 occurs one time. Instead of putting 13 above index 13, 1 should have been put since 13 occurs one time. Instead of putting 4 above index 4, 1 should have been put since 4 occurs one time. The following table or vector, should be used instead of the above second table:

value20001003000001001
index012345678910111213141516

This third table can be arrived at without passing through the second table (or second vector) above. There are no empty cells this time. Instead of an empty cell, 0 is put to indicate that the expected value from the first table, occurred 0 time. In other words, the new vector is still given a length of m+1, indexed from 0 to m, and each cell has the number of occurrence of its index in the first vector, as value. The index that did not occur in the first vector as value, is given the cell value of 0, to mean zero occurrence, in this third (new) vector.

Counting Number of Occurrences of each Element in a Vector

A function can be written to return a (new) vector that has the count of each element in a given vector (without passing through the second vector above). This return vector is the new or third (count) vector in the above section. Let the given vector be

1673713734

and be called vector A. There are 8 elements in the given vector. Let the vector to be returned be called, count. The maximum integer in the given vector is 16. So count needs to have 16+1 = 17 elements, to account for possible 0\92s in the given vector; though this given vector has no 0. The return vector, count for this case, would be:

value00021003000001001
index012345678910111213141516

The number of occurrences of each number in a given vector, can be counted in two ways: the blunt (brute force) way, or by incrementing from zero, the value in the count vector, each time its index is encountered in the given vector, as the given vector is scanned, once.

It can be cumbersome to produce the count vector by counting the blunt or brute force way. So only the smart way (incrementing each integer value in the count vector appropriately, from 0) is explained as follows:

First, the count vector is created with all the cells initialized to 0. Next the given vector is scanned from its beginning. For each value (integer) read in the given vector, the index in the count vector that is the same as the value read, has its own cell value (number of occurrences) incremented by 1.

At the end, the count vector is read out (returned). For a value that does not exist in the given vector, its count in the count vector remains at zero. For any value that exists in the given vector, its count in the count vector, is the number of places (occurrences) in which it is found in the given vector.

It takes m = 16+1 approximate operations to create and initialize the count vector with zeros. It takes n = 8 main operations to scan the giving vector, doing all the necessary increments (adding 1 each time) to the values of the count vector, according to whose indexes are encountered in the given vector. This gives a time complexity for the function, as O(n+m).

The program is:

    #include <iostream>
    #include <vector>
    using namespace std;
    
    vector<int> counting(vector<int> A, int m) {
        int N = A.size();      
        vector<int> count (m+1, 0);  //time complexity: m operations
        
        for (int i=0; i<N; i++)  //time complexity: n=N operations
            count[A[i]] = count[A[i]] + 1;  //add 1 to whatever is already in the count cell
        
        return count;
    }

    int main(int argc, char **argv)
    {
        vector<int> A = {16, 7, 3, 7, 13, 7, 3, 4};
        int N = A.size();
        int m = 0;  //maximum number (value) in given array
        for (int i = 0; i<N; i++)  //looking for actual maximum number in A
            if (A[i] > m)
                m = A[i];
                
        vector<int> ctrV = counting(A, m);
        for (int i = 0; i < ctrV.size(); i++)
            cout << ctrV[i] << ", ";
        cout << endl;

        return 0;
    }

The output is:

    0, 0, 0, 2, 1, 0, 0, 3, 0, 0, 0, 0, 0, 1, 0, 0, 1,

as expected.

The function of interest is the counting() function and not the C++ main() function. The statement, \93int N = A.size();\94 in the counting() function is an operation, but it is ignored in the determination of the time complexity. The n and m numbers of operations for time complexity are indicated, in the code. In the statement,

    count[A[i]] = count[A[i]] + 1;

A[i] is the ith  element in the given vector. It is an integer. This integer becomes an index in the count vector.

The maximum value for the given vector is determined in the C++ main() function; and that sequence of operations is not included in the time complexity.

Issue with Negative Numbers

Notice that in the given vector above, there is no negative number. The count vector (or any vector) does not have a negative index; and so when the given vector has negative integers, there is a problem. This problem can be resolved in two ways: 1) adding a big number to all the values in the given vector, so that all the resulting numbers are greater or equal to 0, ending up with a mapped vector; or 2) separating the given vector into one vector for negative numbers and one vector for positive numbers (zero being positive, and with the vector for positive numbers).

Adding a Big Value to have all Numbers Positive
Assume that the list (vector) with negative numbers, for which multiple occurrences of each number is to be counted, is:

16-1167-20-27137-2-6074-6

This given vector will have to be scanned to get both the lowest negative value and the highest positive value. Both the lowest and highest negative values can be obtained in one for-loop. That takes O(n) time. In the above vector, the lowest value (negative) is -6, and the highest (positive) is +16.

So if 6 is added to each integer, the -6 will be translated to 0, -2 will be translated to 4, -1 will be translated to 5, 0 will be translated to 6, 4 will be translated to 10, and so on. 16, the highest value will be translated to 22. The translation process needs an additional O(n) time. After the whole translation, the following vector results:

225221346413191340613100

With the highest value of 22, there should be 22 + 1 = 23 cells in the count (new) vector. In O(m) time, where m = 23, the count vector becomes:

20003120001004000001002
012345678910111213141516171819202122

In order to return or report the count vector, the 6 added to each value has to be removed. This takes an additional O(m) time. The return count vector, becomes:

20003120001004000001002
-6-5-4-3-2-1012345678910111213141516

This should be returned as a two-dimensional vector, instead of a one-dimensional vector. And so, one way of the counting of multiple occurrences, with negative values, is done!

Separating Given Vector into Negative and Positive Vectors
Let the given vector be:

16-1167-20-27137-2-6074-6

Consider zero as a positive number. In separating this vector into one with negative values and another with positive values, the minimum negative value, the maximum negative value, the minimum positive value, and the maximum positive value, can also be found. All that can be done in one scan. The number of elements in either resulting vectors can also be determined in this same scan. When there are many such operations in one step (body of for-loop), the time for the one scan, becomes O(n.log2(n)) approximately, and not just O(n).

Imagine that there was only one operation in one step (the for-loop body), then in one scan of n elements, the time would be O(n), equivalent to one scan. Assume that n = 8. If there are two operations in one step, the time would be O(8) x 2, equivalent to two scans. If there are 3 operations in one step, the time would be O(8) x 3, equivalent to 3 scans, and generalized as O(n.log2(n)). Remember that log2(8) = 3.

Since the O(n.log2(n)) time for the problem at hand, is not even part of the solution, it should be done (coded) in the C++ main() function. The resulting vectors for the given vector above will be:

-1-2-2-2-6-6

and

1616707137074

The arrangement of negative values in descending order here, is coincidental.

There will be two count vectors: one for the negative values and one for the positive values. The number of elements in either vector should be known.

The smallest value in the vector with negative values above, is -6. The biggest value in this vector is -1. In the corresponding count vector, where the indexes are all positive, -6 corresponds to index 0. The count vector is:

-6-5-4-3-2-1
count200031
index012345

The number of elements in this count vector is the maximum negative value minus the minimum negative value, plus 1. Zero index is there.

The count vector for the positive numbers is done as in the first (normal) case above (where m+1 = 16+1). It is:


012345678910111213141516
count20001004000001002
index012345012345012345

The two count vectors are returned separately.

Exercise

Problem: You are given an integer m (1 <= m <= 1,000,000) and two non-empty, zero-indexed
vectors A and B of n integers, a0 , a1 , . . . , an-1 and b0 , b1 , . . . , bn-12 respectively (0 <= ai , bi <= m).

The goal is to check whether there is a swap operation, which can be performed on these
vectors in such a way that the sum of elements in vector A equals the sum of elements in
vector B, after the swap. By swap operation we mean picking one element from vector A and
one element from vector B and exchanging them.

Example
Let A = [9, 1, 4, 5, 4 ]
Let B = [8, 2, 4, 6, 7]

Current sum of all elements in A is 23.
Current sum of all elements in B is 27.

Notes
- The highest number in either vector is 9, so m = 9. It is given that all the numbers in A and B are positive.

- The function should return true, when the first possible swap is detected. If no swap is possible, to make both vector sums equal, then the function should return false.
Note: only one pair of elements have to be swapped to cause the change (equal sums for both vectors). The pair must not necessarily have the same index in either vector. So, not more than one pair has to be swapped. From inspection of the above given vectors (lists), the first two numbers that can be swapped to cause equality, of both vector sums, are 4 from vector A and 6 from vector B; moving from left to right, in the vectors. The new sum in either vector would be twenty-five. 5 and 7 can also be swapped to cause the same effect. However, when the first possible swap is detected, the function should return true.

O(n3) Time Solution
For this time complexity, every pair of elements are swapped, and for each swap, the totals (sums) are calculated. The totals are compared before the next swapping. After each swap, if the totals are not the same, the reverse swapping has to be made, for the new totals to be correctly done. The program is:

    #include <iostream>
    #include <vector>
    using namespace std;
    
    bool solution(vector<int> A, vector<int> B, int m) {
        int n = A.size();   
        int sumA = 0;    
        int sumB = 0;
        for (int i=0; i<n; i++) {
            for (int j=0; j<n; j++) {
                //swapping
                int temp = A[i];
                A[i] = B[j];
                B[j] = temp;
                for (int k=0; k<n; k++) {
                    sumA = sumA + A[k];
                    sumB = sumB + B[k];  
                }
                if (sumA == sumB)
                    return true;
                else {//swap back
                    int temp = A[i];   
                    A[i] = B[j];
                    B[j] = temp;
                }
                sumA = 0; sumB = 0;  //resetting sums
            }
        }
        
        return false;
    }

    int main(int argc, char **argv)
    {
        vector<int> A = {9, 1, 4, 5, 4};
        vector<int> B = {8, 2, 4, 6, 7};
        bool bl = solution(A, B, 9);
        cout << bl << endl;

        return 0;
    }

The output is 1, for true. False would be returned (last statement in function, solution() ), if true is not returned. The swapping indexes for which the totals became equal, were not asked for. However, they are the zero-based index 2 for vector A (value of 4), and the zero based index 3, for vector B (value of 6).

There are three for-loops. The second one is nested in the first one and the third one is nested in the second one. The third one calculates the totals (sums) after each swap. The equality of the sums is checked just after each scan of the third for-loop. If after swapping, the sums are not equal, then the swapped values are swapped back. The swapping code for forward and backward swapping, is the same, that is:

                    int temp = A[i];
                    A[i] = B[j];
                    B[j] = temp;

If the time consumed by the two swapping code segments are ignored, then the time complexity is O(n3) . If the operations from the swapping code segments are not ignored, then the time complexity is higher than O(n3). Well, when quoting the time complexity in practice, the swapping code segments are ignored.

Note that the parameter, m was not used in the above solution() function; and the strategy for counting the number of occurrences for each element, was also not used.

O(n2) Time Solution
The vectors and their totals are as follows:

Let A = [9, 1, 4, 5, 4 ];    sumA = 23

Let B = [8, 2, 4, 6, 7];    sumB = 27

That is, for vector A, 9 + 1 + 4 + 5 + 4 = 23. For vector B, 8 + 2 +4 + 6 + 7 = 27.  The difference between the two totals (sums) is 27 \96 23 = 4. When the first value of 4 from A, was swapped with 6 from B, to make both sums equal, 2 was essentially subtracted from 27, and added to 23 to make both sums 25.

6 \96 4 = 2. The difference between both sums (totals) is 4. 2 is half (\BD) of 4. This means that for one swap to cause both sums to be equal, for any given pair of vectors, the difference between both sums must be an even number (e.g. 4 above); and the difference between both swapped values must be half the difference of both sums (both totals), e.g. 2 above. The halved number must not necessarily be an even number. So, the correct swapped pair, sends half the difference of the smaller sum from the bigger sum, to the smaller sum vector.

The function coding will be similar to the above solution() function, but the inner for-loop will be eliminated. The summing of the two vectors, is done before the two for-loops (with one nested) are engaged. The time taken to determine these two totals is ignored. And so the resulting time complexity is considered as O(n2), based on the two for-loops (one nested). Remember that both vectors are of the same length, n.

The preceding summing of A and B are done in two simple for-loops. The solution function is:
    
    bool solution(vector<int> A, vector<int> B, int m) {
        int n = A.size();   
        int sumA = 0;    
        for (int i=0; i<n; i++) sumA = sumA + A[i];
        int sumB = 0;
        for (int i=0; i<n; i++) sumB = sumB + B[i];
        bool sumBBiggerThanSumA = false;
        int diff = 0;
        if (sumB > sumA) {
            diff = sumB - sumA;
            sumBBiggerThanSumA = true;
        }
        else {
            diff = sumA - sumB;
            sumBBiggerThanSumA = false;
        }

        for (int i=0; i<n; i++) {
            for (int j=0; j<n; j++) {
                //check possible swapping without swapping
                int change = 0;
                if (sumBBiggerThanSumA == true)
                    change = B[j] - A[i];
                else
                    change = A[i] - B[j];
              
                if (change > 0 && change == 0.5 * diff)
                    return true;
            }
        }
        
        return false;
    }

Note that the parameter, m has still not been used here; and the strategy for counting the number of occurrences for each element, has also not been used.

O(n+m) Time Solution
The above algorithm can be seen as follows: obtain the difference, d of the total sums of both vectors. Obtain half the difference, h. Then, for each element in B, subtract h and see if the answer is found in A, by checking all the elements in A. This leads to O(n2) time.

O(n+m) time is usually less than O(n2) time. Scanning each element in B takes n time. A count vector for vector A, can be made in m time, where m is the largest element in A. Let each element in B be B[i], where i is an index for a B element. In order to know if B[i] - h exits as an element in A, use the count vector, where B[i] - h is an index. If the value (count) of the index in the count vector is greater than 0, then B[i] - h exists in vector A, and the correct swapping can take place. Remember that the index for the count vector has as value, the number of occurrences of one value (an index in count), in a given vector. Of course, B[i] - h should not be less than 0, and should not be greater than the maximum index, m of the count vector. The time complexity will be O(n+m) time.

The durations to determine the two vector totals (sums), are ignored when quoting the time complexity. The solution function is:

    #include <iostream>
    #include <vector>
    using namespace std;
    
    bool solution(vector<int> A, vector<int> B, int m) {
        int N = A.size();   
        int sumA = 0;    
        for (int i=0; i<N; i++) sumA = sumA + A[i];
        int sumB = 0;
        for (int i=0; i<N; i++) sumB = sumB + B[i];
        int d = sumB - sumA;
        if (d % 2 == 1 || d % 2 == -1 ) return false;  //d cannot be odd integer
        int h = d/2;

        vector<int> count (m+1, 0);
        for (int i=0; i<N; i++)  //time complexity: m operations
            count[A[i]] = count[A[i]] + 1;

        for (int i=0; i<N; i++)  //time complexity: n operations
            if (0 <= B[i] - h && B[i] - h <= m && count[B[i] - h] > 0)  //still valid for -ve h
                return true;

        return false;
    }

The output is 1, for true. \93if (count[B[i] - h] > 0)\94 means that B[i] - h occurs at least once in vector A.

Conclusion

The smart way to count the number of occurrences of each value in a given vector is as follows:

First, the count vector is created with all the cells initialized to 0. Next the given vector is scanned from its beginning. For each value (integer) read in the given vector, the index in the count vector that is the same as the value read, has its own cell value (number of occurrences) incremented (by 1). At the end, the value of each index in the count vector, is the number of occurrences of the index as element in the given vector.

The count vector is finally read out (returned). For a value that does not exist in the given vector, its count in the count vector remains at zero. For any value that exists in the given vector, its count in the count vector, is the number of places (occurrences) in which it is found in the given vector.

Counting the number of negative integers can be done in two ways. The first method is to add some big number to each value: so that, all values would be greater than or equal to zero. That is, we shift the representation of zero by some amount to accommodate all the negative numbers we need. In the second method, we simply create (separate) a second vector for counting negative numbers.

Note: all the iterations for O(n) or O(n2) or O(n3), etc. do not have to take place, in order to have the result. With time complexity, it is the maximum number of iterations (main operations) that is quoted.  


Related Links

More Related Links

Cousins

NEXT

Comments