MissingInteger at app.codility.com-programmers in C Plus Plus Explained

Find the smallest positive integer that does not occur in a given sequence.

Foreword: Category of Article - Technology, Computers and Software, Algorithm

By: Chrysanthus Date Published: 26 Jan 2024

Task

Task score : 100% -  Correctness : 100% ; Performance : 100%
Detected time complexity : O(N) or O(N.log(N))
Lesson 4 at app.codility.com/programmers
The solution and its explanation are given below.

Problem
This is a demo task.

Write a function:

    int solution(vector<int> &A);

that, given a vector A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.

For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.

Given A = [1, 2, 3], the function should return 4.

Given A = [-1, -3], the function should return 1.

Write an efficient algorithm for the following assumptions:

        N is an integer within the range [1..100,000];
        each element of vector A is an integer within the range [-1,000,000..1,000,000].

Note:
One of the given vectors above is:

    [1, 3, 6, 4, 1, 2]

If this vector is sorted, it would be:

    [1, 1, 2, 3, 4, 6]

It can clearly be seen that the missing positive integer greater than 0 is 5. If 5 were included, then the missing integer would be 7 = 6+1, according to the above criteria.

After a possible sort, if there are negative integers, then the search has to begin from 1, excluding 0.

The lesson for this problem is Counting Elements. Counting elements is also a form of sorting. This should give the candidate the idea, that a count vector should be used, and the missing integer, searched from count index 1, which is just after count index 0. The missing integer (count index), should have a value (number of occurrences) of 0.

If the missing integer is not found, then the missing integer should be taken as the maximum number in the vector, A, plus 1; according to the above criteria.

Illustration with given Example Data
When the given vector, A is

Value:	1	3	6	4	1	2
Index:	0	1	2	3	4	5

then the count vector is:

No. of Occurrences:	0	2	1	1	1	0	1
Index:	0	1	2	3	4	5	6

The number of occurrences of the value 5 in A is 0 in the count vector. 5 is the missing integer in A. The number of elements in A here, is approximately the number of elements in the count vector. This gives a time complexity of O(N+N). Remember that time complexity is maximum possible number of operations.

When the given vector, A is:

Value:	1	2	3
Index:	0	1	2

then the count vector is:

No. of Occurrences:	0	1	1	1
Index:	0	1	2	3

Remember that the indexes of the count vector are the values of the given vector. There may be a convenience 0, which is not found in the given vector, as in this case. There is no index with number of occurrences of 0 in this case. Whatever is the situation, the number of occurrences of 0, in this case is 0, though zero as value is not found in the given vector. This is the convenience 0.

The indexes to consider in the count vector, are in the range, 1 to 3 (inclusive). The maximum integer in A is 3. According to the above criteria, the answer has to be 4 = 3 + 1. In the count vector, 4 is the next index after the highest index of 3.

The time complexity for this algorithm is O(N+N): N time to scan the given vector, A and produce the count vector; and N time to "read" through the count vector.

When the given vector, A is [-1, -3], the smallest positive integer is taken (considered) as +1, because all the elements in A are negative. If the count vector is to be produced from this given vector, based on possible positive integers in the given vector, then the count vector will be:

No. of Occurrences:	0
Index:	0

where the index of 0 here is the convenience 0.

Since there is no index from 1 upwards, in the count vector, with 0 as number of occurrences, then there is no missing smallest positive integer in A, in the range, 1 to infinity (inclusive). 1 is the next index after the highest index of 0 here, in the count vector. So the answer is taken (considered) as 1 = 0 + 1.

The number of elements (length of given vector) in the given vector, [-1, -3] is 2. The algorithm takes approximately 2 = N operations to scan vector, A to produce the count vector (of 1 element). The algorithm then takes 1 operation to "read" through the count vector.

The number of operations here, is O(2+1), better written as O(N+1). Remember that in this case, all the values (integers) in A are negative.

The Code

The algorithm scans the given vector, A to produce the count vector. After that it "reads" through the count vector, looking for any index from 1 to m (inclusive) that has 0 as number of occurrence. m is the maximum number in the given vector, A. The first index found, beginning from the left of the count vector, is the missing smallest positive integer. If no such index is found, then the answer is m+1. The code is as follows (read through it and the comments):

#include <iostream>
#include <vector>
using namespace std;

int solution(vector<int> &A) {
    int N = A.size();
    vector<int> count(N+1,0);

    for (int i=0;i<N;i++){
        if ( A[i] <= N && A[i] >=0) {  //considering only elements in A, from 0 to m (inclusive)
            count[A[i]] += 1;
        }
    }

    for (int i=1; i<=N; i++){  //missing integer is less than m
        if (count[i]==0)
            return i;
    }

    return N+1;  //if A consists of elements from 1 to m
}

int main()
{
    vector<int> P{1, 3, 6, 4, 1, 2};
    int ret1 = solution(P);
    cout << ret1 << endl;
    vector<int> Q{1, 2, 3};
    int ret2 = solution(Q);
    cout << ret2 << endl;
    vector<int> R{-1, -3};
    int ret3 = solution(R);
    cout << ret3 << endl;

    vector<int> S{2};
    int ret4 = solution(S);
    cout << ret4 << endl;

    return 0;
}

The output is:

    5
    4
    1
    1

The very last return statement, "return N+1;" will be executed, if all the numbers in A are from 1 to m (inclusive). This means that duplicates are allowed. If the missing smallest positive integer is within the range, 1 to m (inclusive), then the previous return statement, "return i;" will return the index from the count vector. There are only two return statements in the solution() function above.

The result from app.codility.com/programmers is:

    Task score : 100% -  Correctness : 100% ; Performance : 100%
    Detected time complexity : O(N) or O(N.log(N))

This does not include the time to initialize all the elements of the count vector to 0, just after creation. In the author's opinion, the time complexity should be O(N+N).

Conclusion

The efficient algorithm scans the given vector, A in N time, to produce the count vector. After that it "reads" through the count vector, in a maximum of an additional N time, looking for any index from 1 to m (inclusive) that has 0 as number of occurrence. m is the maximum number in the given vector, A. The first index found, beginning from the left of the count vector, is the missing smallest positive integer. If no such index is found, then the answer is m+1.

This efficient algorithm is applicable to the case, where vector, A has both positive and negative integers; as long as the count vector is produced beginning from index 0. The 0 may still be a convenience 0.

Cousins

BACK

Broad Network

Series Articles