Broad Network


Counting Elements for app.codility.com/programmers Explained in C

Category of Article : Technology | Computers and Software | Algorithm

By: Chrysanthus Date Published: 12 May 2025

Consider an unsorted array of integers such as,

1670713704

There are eight elements (values) here. The highest value here is 16. The smallest value is 0. These are all positive integers. 0 is normally considered as a positive number. Each element here can be mapped to the index of another array as follows:

value0471316
index012345678910111213141516

In this two-row table, the first row has the elements of the first (given) array, but in ascending order. This table represents the second array. Since the highest value in the first array is 16, there are 17 cells in the second array, indexed (numbered) from 0 to 16. The first extra cell in front in this second array, is to account for a possible value of 0, in the first array. The second row in this second table shows the zero based indexes for the second array (index counting normally begins from 0 and not 1). In this second table, each value from the first array (table), is put just above its index (in the second array table).

Notice that there are some empty cells in the first row of this second table. Theses empty cells are for the numbers that are not found in the first array; as the second array has locations for possible first array values from 0 to 16, inclusive. If there were no 0 in the first array, its cell would still have been created in the second array, but left empty.

The second array is usually longer than the first array, accounting for the values that would have been in the second array, but not found in the first array. The highest value in the first array is 16, and so the second array needs to have 17 cells, indexed from 0 to 16. In general, if the highest integer in the first array is m, then the second array needs to have m+1 cells, indexed from 0 to m.

The topic for this lesson (tutorial) is Counting Elements. The elements can be characters or strings, or some other similar entities, that are countable. It is the number of occurrence of each element in the first array, that is to be counted.

Notice that in the first array, there are 3 sevens, 2 zeros, 1 sixteen, 1 thirteen, and 1 four, giving 8 elements in total. This particular second table above is not really helpful, so far as counting the number of occurrences of each value in the first array, is concerned. It would have been better, if in the second table above, the number of occurrences of each number was put just above its index for the second array.

That is, instead of putting 7 above the index 7, 3 should have been put as 7 occurs three times in the first array. Instead of putting 0 above index 0, 2 should have been put since 0 occurs two times, in the first array. Instead of putting 16 above index 16, 1 should have been put since 16 occurs one time. Instead of putting 13 above index 13, 1 should have been put since 13 occurs one time. Instead of putting 4 above index 4, 1 should have been put since 4 occurs one time. The following table or array, should be used instead of the above second table:

value20001003000001001
index012345678910111213141516

This third table can be arrived at without passing through the second table (or second array) above. There are no empty cells this time. Instead of an empty cell, 0 is put to indicate that the expected value from the first table, occurred 0 time. In other words, the new array is still given a length of m+1, indexed from 0 to m, and each cell has the number of occurrence of its index in the first array, as value. The index that did not occur in the first array as value, is given the cell value of 0, to mean zero occurrence, in this third (new) array.

Counting Number of Occurrences of each Element in an Array

A function can be written to return a (new) array that has the count of each element in a given array (without passing through the second array above). This return array is the new or third (count) array in the above section. Let the given array be

1673713734

and be called array A. There are 8 elements in the given array. Let the array to be returned be called, count. The maximum integer in the given array is 16. So count needs to have 16+1 = 17 elements, to account for possible 0's in the given array; though this given array has no 0. The return array, count for this case, would be:

value20021003000001001
index012345678910111213141516

The number of occurrences of each number in a given array, can be counted in two ways: the blunt (brute force) way, or by incrementing from zero, the value in the count array, each time its index is encountered in the given array, as the given array is scanned, once.

It can be cumbersome to produce the count array by counting the blunt or brute force way. So only the smart way (incrementing each integer value in the count array appropriately, from 0) is explained as follows:

First, the count array is created with all the cells initialized to 0. Next the given array is scanned from its beginning. For each value (integer) read in the given array, the index in the count array that is the same as the value read, has its own cell value (number of occurrences) incremented by 1.

At the end, the count array is read out (returned). For a value that does not exist in the given array, its count in the count array remains at zero. For any value that exists in the given array, its count in the count array, is the number of places (occurrences) in which it is found in the given array.

It takes m = 16+1 approximate operations to create and initialize the count array with zeros. It takes n = 8 main operations to scan the giving array, doing all the necessary increments (adding 1 each time) to the values of the count array, according to whose indexes are encountered in the given array. This gives a time complexity for the function, as O(n+m).

The program is:

    #include <stdio.h>

    struct Result {
        int * count;
        int num;     // Length still to be dertermined
    } stroct;
    
    struct Result counting(int A[], int m, int N, int ctr[]) { 
        stroct.count = &ctr[0];    //stroct.count and ctr[] now point to the same address
        stroct.num = m+1;    //length of count is determined here

        //initialize the count sequence
        for (int i=0; i<(m+1); i++)    //time complexity: m operations
            stroct.count[i] = 0; 
        
        for (int i=0; i<N; i++)  //time complexity: n=N operations
            stroct.count[A[i]] = stroct.count[A[i]] + 1;    //add 1 to whatever is already in the count cell

        return stroct;
    } 

    int main(int argc, char **argv)
    {
        int A[] = {16, 7, 3, 7, 13, 7, 3, 4};
        int N = sizeof(A) / sizeof(A[0]);    //divide size of array by size of one (first) element
        int m = 0;  //maximum number (value) in given array
        for (int i = 0; i<N; i++)  //looking for actual maximum number in A
            if (A[i] > m)
                m = A[i];
        
        int ctr[m+1];

        struct Result cnt = counting(A, m, N, ctr);
        for (int i = 0; i < cnt.num; i++) 
            printf("%i, ", cnt.count[i]);
        printf("\n");

        return 0;
    }

Note: since a function in C cannot return an array, a struct with an integer sequence called count, is returned. Also, the array whose start reference (reference of first cell) has to be equated to the start reference of the count sequence, has to be sent as an argument. Read through the code, if you have not already done so. The output is:

    0, 0, 0, 2, 1, 0, 0, 3, 0, 0, 0, 0, 0, 1, 0, 0, 1, 

as expected.

The function of interest is the counting() function and not the C main() function. The n and m numbers of operations for time complexity are indicated, in the code. In the statement,

    stroct.count[A[i]] = stroct.count[A[i]] + 1;

A[i] is the ith element in the given array. It is an integer. This integer becomes an index in the count array.

The maximum value for the given array is determined in the C main() function; and that sequence of operations is not included in the time complexity.

Issue with Negative Numbers

Notice that in the given array above, there is no negative number. The count array (or any array) does not have a negative index; and so when the given array has negative integers, there is a problem. This problem can be resolved in two ways: 1) adding a big number to all the values in the given array, so that all the resulting numbers are greater or equal to 0, ending up with a mapped array; or 2) separating the given array into one array for negative numbers and one array for positive numbers (zero being positive, and with the array for positive numbers).

Adding a Big Value to have all Numbers Positive

Assume that the list (array) with negative numbers, for which multiple occurrences of each number is to be counted, is:

16-1167-20-27137-2-6074-6

This given array will have to be scanned to get both the lowest negative value and the highest positive value. Both the lowest and highest negative values can be obtained in one for-loop. That takes O(n) time. In the above array, the lowest value (negative) is -6, and the highest (positive) is +16.

So if 6 is added to each integer, the -6 will be translated to 0, -2 will be translated to 4, -1 will be translated to 5, 0 will be translated to 6, 4 will be translated to 10, and so on. 16, the highest value will be translated to 22. The translation process needs an additional O(n) time. After the translation, the following array results:

2252213464131913406131012

With the highest value of 22, there should be 22 + 1 = 23 cells in the count (new) array. In O(m) time, where m = 23, the count array becomes:

20003120001004000001002
012345678910111213141516171819202122

In order to return or report the count array, the 6 added to each value has to be removed. This takes an additional O(m) time. The return count array, becomes:

20003120001004000001002
-6-5-4-3-2-1012345678910111213141516

This should be returned as a two-dimensional array, instead of a one-dimensional array. And so, one way of the counting of multiple occurrences, with negative values, is done!

Separating Given Array into Negative and Positive Arrays

Let the given array be:

16-1167-20-27137-2-6074-6

Consider zero as a positive number. In separating this array into one with negative values and another with positive values, the minimum negative value, the maximum negative value, the minimum positive value, and the maximum positive value, can also be found. All that can be done in one scan. The number of elements in either resulting arrays can also be determined in this same pass. When there are many such operations in one step (body of for-loop), the time for the one scan, becomes O(n.log2(n)) approximately, and not just O(n).

Imagine that there was only one operation in one step (the for-loop body), then in one scan of n elements, the time would be O(n), equivalent to one scan. Assume that n = 8. If there are two operations in one step, the time would be O(8) x 2, equivalent to two scans. If there are 3 operations in one step, the time would be O(8) x 3, equivalent to 3 scans, and generalized as O(n.log2(n)). Remember that log2(8) = 3.

Since the O(n.log2(n)) time for the problem at hand, is not even part of the solution, it should be done (coded) in the C main() function. The resulting arrays for the given array above will be:

-1-2-2-2-6-6

and

1616707137074

The arrangement of negative values in descending order here, is coincidental.

There will be two count arrays: one for the negative values and one for the positive values. The number of elements in either array should be known.

The smallest value in the array with negative values above, is -6. The biggest value in this array is -1. In the corresponding count array, where the indexes are all positive, -6 corresponds to index 0. The count array is:

-6-5-4-3-2-1
count200031
index012345

The number of elements in this count array is the maximum negative value minus the minimum negative value, plus 1. Zero index is there.

The count array for the positive numbers is done as in the first (normal) case above (where m+1 = 16+1). It is:

012345678910111213141516
count20001004000001002
index012345678910111213141516

The two count arrays are returned separately.

Exercise

Problem: You are given an integer m (1 <= m <= 1,000,000) and two non-empty, zero-indexed arrays A and B of n integers, a0 , a1 , . . . , an−1 and b0 , b1 , . . . , bn−1 respectively (0 <= ai , bi <= m).

The goal is to check whether there is a swap operation, which can be performed on these arrays in such a way that the sum of elements in array A equals the sum of elements in array B, after the swap. By swap operation we mean picking one element from array A and one element from array B and exchanging them.

Example
Let A = [9, 1, 4, 5, 4 ]
Let B = [8, 2, 4, 6, 7]

Current sum of all elements in A is 23.
Current sum of all elements in B is 27.

Notes

Note: only one pair of elements have to be swapped to cause the change (equal sums for both arrays). The pair must not necessarily have the same index in either array. So, not more than one pair has to be swapped. From inspection of the above given arrays (lists), the first two numbers that can be swapped to cause equality, of both array sums, are 4 from array A and 6 from array B; moving from left to right, in the arrays. The new sum in either array would be twenty-five. 5 and 7 can also be swapped to cause the same effect. However, when the first possible swap is detected, the function should return true.

O(n3) Time Solution

For this time complexity, every pair of elements are swapped, and for each swap, the totals (sums) are calculated. The totals are compared before the next swapping. After each swap, if the totals are not the same, the reverse swapping has to be made, for the new totals to be correctly done. The program is:

    #include <stdio.h>
    
    _Bool solution(int A[], int B[], int N) {
        int n = N;
        int sumA = 0;    
        int sumB = 0;
        for (int i=0; i<n; i++) {
            for (int j=0; j<n; j++) {
                //swapping
                int temp = A[i];
                A[i] = B[j];
                B[j] = temp;
                for (int k=0; k<n; k++) {
                    sumA = sumA + A[k];
                    sumB = sumB + B[k];  
                }
                if (sumA == sumB) 
                    return 1;
                else {//swap back
                    int temp = A[i];   
                    A[i] = B[j];
                    B[j] = temp;
                }
                sumA = 0; sumB = 0;  //resetting sums
            }
        }
        
        return 0;
    } 

    int main(int argc, char **argv)
    {
        int A[] = {9, 1, 4, 5, 4};
        int B[] = {8, 2, 4, 6, 7};
        int N = sizeof(A) / sizeof(A[0]);    //divide size of array by size of one (first) element
        _Bool bl = solution(A, B, N); 
        printf("%i\n", bl);

        return 0;
    }

The output is 1, for true. false (0) would be returned (last statement in function, solution() ), if true is not returned. The swapping indexes for which the totals became equal, were not asked for. However, they are the zero-based index 2 for array A (value of 4), and the zero based index 3, for array B (value of 6).

There are three for-loops. The second one is nested in the first one and the third one is nested in the second one. The third one calculates the totals (sums) after each swap. The equality of the sums is checked just after each scan of the third for-loop. If after swapping, the sums are not equal, then the swapped values are swapped back. The swapping code for forward and backward swapping, is the same, that is:

                    int temp = A[i];
                    A[i] = B[j];
                    B[j] = temp;

If the time consumed by the two swapping code segments are ignored, then the time complexity is O(n3) . If the operations from the swapping code segments are not ignored, then the time complexity is higher than O(n3). Well, when quoting the time complexity in practice, the swapping code segments are ignored.

Note that the parameter, m was not used in the above solution() function; and the strategy for counting the number of occurrences for each element, was also not used.

O(n2) Time Solution

The arrays and their totals are as follows:

Let A = [9, 1, 4, 5, 4 ];    sumA = 23

Let B = [8, 2, 4, 6, 7];    sumB = 27

That is, for array A, 9 + 1 + 4 + 5 + 4 = 23. For array B, 8 + 2 +4 + 6 + 7 = 27. The difference between the two totals (sums) is 27 – 23 = 4. When the first value of 4 from A, was swapped with 6 from B, to make both sums equal, 2 was essentially subtracted from 27, and added to 23 to make both sums 25.

6 – 4 = 2. The difference between both sums (totals) is 4. 2 is half (½) of 4. This means that for one swap to cause both sums to be equal, for any given pair of arrays, the difference between both sums must be an even number (e.g. 4 above); and the difference between both swapped values must be half the difference of both sums (both totals), e.g. 2 above. The halved number must not necessarily be an even number. So, the correct swapped pair, sends half the difference of the smaller sum from the bigger sum, to the smaller sum array.

The function coding will be similar to the above solution() function, but the inner for-loop will be eliminated. The summing of the two arrays, is done before the two for-loops (with one nested) are engaged. The time taken to determine these two totals is ignored. And so the resulting time complexity is considered as O(n2), based on the two for-loops (one nested). Remember that both arrays are of the same length, n.

The preceding summing of A and B are done in two simple for-loops. The solution function is:

    _Bool solution(int A[], int B[], int N) {
        int n = N;
        int sumA = 0;    
        for (int i=0; i<n; i++) sumA = sumA + A[i];
        int sumB = 0;
        for (int i=0; i<n; i++) sumB = sumB + B[i];
        _Bool sumBBiggerThanSumA = 0;
        int diff = 0;
        if (sumB > sumA) {
            diff = sumB - sumA;
            sumBBiggerThanSumA = 1;
        }
        else {
            diff = sumA - sumB;
            sumBBiggerThanSumA = 0;
        }

        for (int i=0; i<n; i++) {
            for (int j=0; j<n; j++) {
                //check possible swapping without swapping
                int change = 0;
                if (sumBBiggerThanSumA == 1)
                    change = B[j] - A[i];
                else
                    change = A[i] - B[j];
              
                if (change > 0 && change == 0.5 * diff) 
                    return 1; 
            }
        }
        
        return 0;
    } 

For example, with 4 from array A and 6 from array B, the variable, change is 6 - 4 = 2, which is equal to half (0.5) of the sum difference, 4 = 27 - 23. Read through the code, if that is not already done. Note that the parameter, m has still not been used here; and the strategy for counting the number of occurrences for each element, has also not been used.

O(n+m) Time Solution

The above algorithm can be seen as follows: obtain the difference, d of the total sums of both arrays. If d is an even number, obtain half the difference, h. Then, for each element in B, subtract h and see if the answer is found in A, by checking all the elements in A; assuming that sumB is greater than sumA. This leads to O(n2) time.

The reader must remember that the difference between the swapped values must be h. Do not confuse between the difference, d between the sums and the difference, h between two pairs of numbers that correspond, from the different arrays.

The arrays and their totals are as follows:

Let A = [9, 1, 4, 5, 4 ];    sumA = 23

Let B = [8, 2, 4, 6, 7];    sumB = 27

The difference between the sums of the two arrays is sumB - sumA = d = 4. And half the difference, h = d/2 = 4/2 = 2. h is the difference between two pairs of numbers that correspond, from the different arrays.

When swapping, the bigger element must come from the array with the bigger sum, so that the bigger sum reduces and the smaller sum increases; it cannot be the other way round.

From the question, a1 , . . . , an−1 AND b0 , b1 , . . . , bn−1 mean that both arrays are of the same length, while (0 <= ai , bi <= m) means that m is the biggest value in either array A or array B.

The reader should already know that the possible pairs of swapping values are: 6 from array B and 4 from array A, or 7 from array B and 5 from array A, with (6, 4) coming first, when scanning from the left.

m is the biggest index in the count array. The biggest value in both arrays is 9. Taking m as 9, the count array for array A is:

count0100210001
index0123456789

So, 1 occurs in A, 1 time; 4 occurs in A, 2 times; 5 occurs in A, 1 time; and 9 occurs in A, 1 time. For the rest of the indexes that are not members of array A, of course, each of them occurs, zero time in the count array.

The pairs from array A to array B are (6, 4) and (7, 5). Observed from the count array, that if h = 2 is subtracted from the index 6, which is a number in array B, then the index 4 which is a number in array A, will be obtained; and the number of times 4 occurs in array A is greater than 0 (actually two). Observed again from the count array, that if h = 2 is subtracted from the index 7, which is a number in array B, then the index 5 which is a number in array A, will be obtained; and the number of times 5 occurs in array A is greater than 0 (actually one).

Notice that all the possible values of array A and B have been combined into one index range in the count array. So, write a program, that will first produce the count array in m time (from the array with the largest m). After that, in n time, for each element in B (the other array), h will be subtracted, and if the result occurs in A at least once (count > 0), then the right swapping is possible. The function should return, for the first possible occurrence.

If no result is found or if d is not even, then the function should return false. The code is:

    #include <stdio.h>
    
    _Bool solution(int A[], int B[], int N) {
        int sumA = 0;    
        int max = 0;
        for (int i=0; i<N; i++) {
            sumA = sumA + A[i];
            if (A[i] > max)
                max = A[i];
        }
        int sumB = 0;
        for (int i=0; i<N; i++) {
            sumB = sumB + B[i];
            if (B[i] > max)
                max = A[i];
        }
        
        int d = sumB - sumA;
        if (d < 0) {
            d = -1 * d;
            //exchange arrays (references)
            int *tempV = &B[N];
            int *A = &B[N];
            int *B = &tempV[N];
        }
            
        if (d % 2 == 1) return 0;  //d not even, return false
        int h = d/2;

        int count[max+1]; 
        //initialize all count cells to 0
        for (int i=0; i<=max; i++)  //not normally included in time complexity
            count[i] = 0;

        for (int i=0; i < N; i++)  //time complexity: m operations
            count[A[i]] = count[A[i]] + 1;

        for (int i=0; i<N; i++)  //time complexity: n operations
            if (count[B[i] - h] > 0)  //found first correspondence
                return 1;

        return 0;    //result not found
    } 
    
    int main(int argc, char **argv)
    {
        int A[] = {9, 1, 4, 5, 4};
        int B[] = {8, 2, 4, 6, 7};
        int N = sizeof(A) / sizeof(A[0]);    //divide size of array by size of one (first) element
        _Bool bl = solution(A, B, N); 
        printf("%i\n", bl);

        return 0;
    }

Read through the code if that is not already done. The output is 1, for true. The time complexity is given as O(n+m), though for m, there are actually m+1 operations and not m.

Conclusion

The smart way to count the number of occurrences of each value in a given array is as follows:

First, the count array is created with all the cells initialized to 0. Next the given array is scanned from its beginning. For each value (integer) read in the given array, the index in the count array that is the same as the value read, has its own cell value (number of occurrences) incremented (by 1). At the end, the value of each index in the count array, is the number of occurrences of the index as element in the given array.

The count array is finally read out (returned). For a value that does not exist in the given array, its count in the count array remains at zero. For any value that exists in the given array, its count in the count array, is the number of places (occurrences) in which it is found in the given array.

Counting the number of negative integers can be done in two ways. The first method is to add some big number to each value: so that, all values would be greater than or equal to zero. That is, we shift the representation of zero by some amount to accommodate all the negative numbers we need. In the second method, we simply create (separate) a second array for counting negative numbers.

Note: all the iterations for O(n) or O(n2) or O(n3), etc. do not have to take place, in order to have the result. With time complexity, it is the maximum number of iterations (main operations) that is quoted.

The rest of the 17 lessons, with explanation of lessons, explanation of tasks, and explanation of solutions, can be found at (click): Explanations in C for the 17 Algorithm Lessons at https://app.codility.com/programmers with Problems and Solutions





Related Links

More Related Links

Cousins

NEXT

Comments