Broad Network


Prefix Sums for app.codility.com-programmers Explained in C plus Plus

Foreword: Category of Article - Technology, Computers and Software, Algorithm

By: Chrysanthus Date Published: 28 Jan 2024

Introduction

Consider the following array of 8 elements:

4030201020706050

The non-smart way to sum all these elements is:

40 + 30 = 70
40 + 30 + 20 = 90
40 + 30 + 20  + 10 = 100
40 + 30 + 20  + 10 + 20 = 120
40 + 30 + 20  + 10 + 20 + 70 = 190
40 + 30 + 20  + 10 + 20 + 70 + 60 = 250
40 + 30 + 20  + 10 + 20 + 70 + 60 + 50 = 300

There are 28 operations here. A time complexity of O(1) (i.e. constant time), is for 1 operation. A time complexity of O(n) for 8 elements, has 8 operations. A time complexity of O(n2) for 8 elements, has 82 = 64 operations. 28 operations is approximately 24 = 8 x 3 = 8 x log2(8) operations, and is within the limits of 8 and 64. 8 = 23 => log2(8) = 3. The time complexity for the above addition can be given approximately as n.log2(n).

Prefix sum means the sum from the first element to the previous element, as the array (vector) is scanned, from the left. This is the sum of all the previous elements, excluding the current element of the given array. With prefix sums, all the above elements are summed in linear time, O(n) time complexity. The following table shows the prefix sums for the above array (vector):

Given Vector4030201020706050
Prefix Sums00+40=4040+30=7070+20=9090+10=100100+20=120120+70=190190+60=250250+50=300

The first prefix sum is 0, when nothing has been added. 0 is added to the first element. That result is added to the second element, whose result is added to the third element, and so on; to form all the prefix sums. The number of prefix sums is N + 1, where N is the number of elements in the given array (vector). So the number of prefix sums is one more than the number of given elements. In general, the prefix sums are:

p0 = 0, p1 = p0+a0, p2 = p1+a1, p3 = p2+a2, p4 = p3+a3, - - - pi = pi-1 + ai-1 - - - pn = pn-1 + an-1

where ai is the current given element.

Though prefix sums have been given here for all the elements, prefix sums is usually appreciated for the first k elements, that is: pk = pk-1 + ak-1 .

Prefix Sums Code obtained in O(n) Time
A coding example is:

    #include <iostream>
    #include <vector>
    using namespace std;
    
    vector<int> prefix_sums(vector<int> A) {
        int N = A.size();      
        vector<int> P (N+1, 0);
        
        for (int k=1; k<N+1; k++)
            P[k] = P[k - 1] + A[k - 1];
        
        return P;
    }

    int main(int argc, char **argv)
    {
        vector<int> A = {40, 30, 20, 10, 20 ,70, 60, 50};
                
        vector<int> retV = prefix_sums(A);
        for (int i = 0; i < retV.size(); i++)
            cout << retV[i] << ", ";
        cout << endl;

        return 0;
    }

The output is:

    0, 40, 70, 90, 100, 120, 190, 250, 300,

as expected.

Suffix Sums - O(n)

Consider the given array of 8 elements, again:

4030201020706050

A non-smart way to sum all these elements, from the right end is:

60 + 50 = 110
70 + 60 + 50 = 180
20 + 70 + 60 + 50 = 200
10 + 20 + 70 + 60 + 50 = 210
20 + 10 + 20 + 70 + 60 + 50 = 230
30 + 20 + 10 + 20 + 70 + 60 + 50 = 260
40 + 30 + 20 + 10 + 20 + 70 + 60 + 50 = 300

There are 28 operations here. A time complexity of O(1) (i.e. constant time), is for 1 operation. A time complexity of O(n) for 8 elements, has 8 operations. A time complexity of O(n_sp]2) for 8 elements, has 8_sp]2 = 64 operations. 28 operations is approximately 24 = 8 x 3 = 8 x log2(8) operations, and is within the limits of 8 and 64. 8 = 23 => log2(8) = 3. The time complexity for the above addition can be given approximately as n.log2(n).

Suffix sum means the sum from the last element to the element, just after the current element; summing from the right end. This is the sum of all the elements ahead, from the right end, excluding the current element, of the given array. With suffix sums, all the above elements are summed in linear time, O(n) time complexity, just like with prefix sum. The following table shows the suffix sums for the above array (vector):

Given Vector4030201020706050
Suffix Sums260+40=300230+30=260210+20=230200+10=210180+20=200110+70=18050+60=1100+50= 500

The last suffix sum is 0, when nothing has been added. 0 is added to the last given element. That result is added to the last-but-one element, whose result is added to the preceding element, and so on; to form all the suffix sums. The number of suffix sums is N + 1, where N is the number of elements in the given array (vector). So the number of suffix sums is one more than the number of given elements. In general, the suffix sums are (from the right end):

sn = 0, sn-1 = sn + an-1, sn-2 = sn-1 + an-2, sn-3 = sn-2 + an-3, - - - si=1 = si + ai-1 - - - s0 = s1 + a0

where ai is the current given element, whose index increases from the left end. Note that the index for the suffix sums end at n+1 (of array length, n+1), while the index for the given array ends at n. The index for the suffix sums begins at 0 (zero based indexing).

Though suffix sums have been given here for all the elements, suffix sum is usually appreciated for the last k elements, that is: sk-1 = sk + ak-1, where k increases from the left end.

Suffix Sums Code obtained in O(n) Time
A coding example is:

    #include <iostream>
    #include <vector>
    using namespace std;
    
    vector<int> suffix_sums(vector<int> A) {
        int N = A.size();    
        vector<int> S (N+1, 0);
        
        for (int k=N; k>0; k--)
            S[k-1] = S[k] + A[k-1];
        
        return S;
    }

    int main(int argc, char **argv)
    {
        vector<int> A = {40, 30, 20, 10, 20 ,70, 60, 50};
                
        vector<int> retV = suffix_sums(A);
        for (int i = 0; i < retV.size(); i++)
            cout << retV[i] << ", ";
        cout << endl;

        return 0;
    }

The output is:

    300, 260, 230, 210, 200, 180, 110, 50, 0,

as expected.

Sum of Slice of an Array with Prefix Sums

The time complexity for the whole array prefix summing, is O(n). After the prefix sums for all the elements of the array have been obtained in linear time, i.e. O(n), the total of the elements of a slice within the given array, can be obtained in constant time, O(1). This is done by just subtracting the prefix sum, at the beginning of the slice (section), from the prefix sum, just after the end of the slice. The same is true for suffix sums, but the subtraction is done in reverse order.

The following code gives the sum of the slice of numbers from index 3 to index 5 (inclusive), using prefix sums:

    #include <iostream>
    #include <vector>
    using namespace std;
    
    int difference;
    
    vector<int> prefix_sums(vector<int> A) {
        int N = A.size();       
        vector<int> P (N+1, 0);
        
        for (int k=1; k<N+1; k++)
            P[k] = P[k - 1] + A[k - 1];

        difference = P[6] - P[3];
        
        return P;
    }

    int main(int argc, char **argv)
    {
        vector<int> A = {40, 30, 20, 10, 20 ,70, 60, 50};
        for (int i = 0; i < A.size(); i++)
            cout << A[i] << ", ";
        cout << endl;
                
        vector<int> retV = prefix_sums(A);
        for (int i = 0; i < retV.size(); i++)
            cout << retV[i] << ", ";
        cout << endl;
        cout << difference << endl;

        return 0;
    }

The output is:

    40, 30, 20, 10, 20,   70,   60,   50,
    0,   40, 70, 90, 100, 120, 190, 250, 300,
    100

The difference is 100 and correct. It is obtained after all the prefix sums have been developed. The code line for this is:

        difference = P[6] - P[3];

Remember that the first prefix sum is 0. The current prefix sum is the sum of all the previous elements of the given array, plus 0, excluding the current element in the given array. So, difference is P[6] - P[3] . As a formula, that is written as:

    P[y + 1] - P[x]

where y is 5 and x is 3, for the above given array (vector).

Exercise

Problem: You are given a non-empty, zero-indexed array A of n (1 <= n <= 100 000) integers
a0 , a1 , . . . , an-1 (0 <= ai <= 1 000). This array represents number of mushrooms growing on consecutive spots along a road. You are also given integers k and m (0 <= k, m < n).

A mushroom picker is at spot number k on the road and should perform m moves. In
one move she moves to an adjacent spot. She collects all the mushrooms growing on spots
she visits. The goal is to calculate the maximum number of mushrooms that the mushroom
picker can collect in m moves.

For example, consider array A such that:

Mushrooms per spot23751394
Indexes (spots)01234567

The mushroom picker starts at spot k = 4 and should perform m = 6 moves. She might
move to spots 3, 2, 3, 4, 5, 6 and thereby collect 1 + 5 + 7 + 3 + 9 = 25 mushrooms. This is the
maximal number of mushrooms she can collect. End of Problem!

Comments: For maximum, she starts from index (position) 4, picking 1 mushroom at index 4, makes two moves to the left picking mushrooms, then two moves to the right without picking anything, giving a total of 4 moves so far; and finally, two more moves to the right, picking mushrooms, for the total of 6 moves. The two moves to the right from index 2 to index 4, are wasted moves.

What if she had to go from index 4 down to index 0, and then makes two wasted moves to the right to have 6 moves. For this, she would pick, 1 + 5 + 7 + 3 + 2 + 0 + 0 = 18, which is less than 25, still with two wasted moves. Note that she first has to pick all the mushrooms on the spot, where she starts.

What if she had to go from index 4 up to index 7, and then makes three wasted moves to the left to have 6 moves. For this, she would pick, 1 + 3 + 9 + 4 + 0 + 0 + 0 = 17, which is less than 25, but this time, with three wasted moves.

What if she had to go from index 4 to index 5, and then makes one wasted move back to index 4, then one move to the left at 3, then a wasted move back to index 4, then a wasted move to index 3 again, then a move to index 2, making a total of 6 moves. For this, she would have picked, 1 + 3 + 0 + 5 + 0 + 0 + 7 = 16, which is less than 25, still with three wasted moves. 16 is the smallest number of total possible mushrooms picked, so far.

The conclusion to draw here, is that in whatever to-and-fro moves the mushroom picker makes, there will be a sequence of positions (spots) of mushroom picking. The above sequence for maximum mushrooms picked, is from index 2 to index 6, (inclusive). For maximum number of mushrooms picked, wasted moves have to be minimized. However, it is possible to have more than one sequence with the same number of wasted moves but with different totals of mushrooms picked. Note that as the mushroom picker moves, she cannot skip any spot (position).

So, the best strategy is to move in one direction, optionally followed by some moves in the opposite direction. In other words, the mushroom picker should not change direction more than once, in order to minimize wasted moves, and so end up with the optimum sequence of numbered (indexed) consecutive spots.

So, the maximum slice sum from the prefix sums array, is required, which must include m number of moves, and must include spot number k.

Solution

Slice Ranges;
There are two scenarios in order to know the different slice ranges from which to chose the maximum sum. The maximum sum is chosen from the maximum sum of both scenarios. One scenario involves left complete initial movement, and the other scenario involves right complete initial movement.

Left Complete Initial Movement
With left complete initial movement, the first range is obtained by moving m places to the left. This consists of m+1 consecutive array cells, assuming that the cell for index 0 is not crossed. Remember that the mushroom picker has to pick at the kth position, where she starts. In this case, the sequence of array cells is from k-m to k (inclusive).

For the next range, the mushroom picker moves one place to the right and then m-1 places to the left, with 1 wasted move. The range here is from k-m+2 to k+1 (inclusive). For the next range, the mushroom picker moves 2 places to the right and then m-2 places to the left, with 2 wasted moves. The range here is from k-m+4 to k+2 (inclusive). For the next range, the mushroom picker moves 3 places to the right and then m-3 places to the left, with 3 wasted moves. The range here is from k-m+6 to k+3 (inclusive).

The ranges are obtained in this way, skipping 1 cell from the left end, until the start of the next range is k-2 or k-1. It would be k-2 if m is even and k-1 if m is odd. Whether m is even or odd, there is skipping of one cell for the start of a range, in this scenario. The highest index for the start of a range in this scenario is k, which is not reached. These last ranges will be arrived at, if the cell for index n-1 of the given array, is not crossed, for the end of a range.  

Limits for Range of Left Complete Initial Movement
The left and right limits for the above ranges in order are:

    k-m to k    
    k-m+2 to k+1
    k-m+4 to k+2
    k-m+6 to k+3
        |
        |
        |
    k-2 to k+i or k-1 to k+i (see below for meaning of i)

Among all these ranges, the maximum sum for the scenario has to be chosen.

Let the integer variable, i represent a range, in the above list of possible ranges (from top to bottom), beginning from 0. Then the left slice limit is given by k - m + 2*i and the right slice limit is given by k+i. These limits are based on the assumption that the left slice limit does not cross below 0 of the given array, and the right slice limit does not cross above n-1 of the given array. Since for some lists this is not possible, for the lower limit, the program has to choose the maximum value between 0 and  k - m + 2*i; and for the upper limit, the program has to choose the minimum value between k+i and n-1 (zero based indexing).

Since there is skipping of one cell for the start of the next range, some possible ranges are omitted in this scenario. This is compensated for, by the next scenario.

Right Complete Initial Movement
With right complete initial movement, the first range is obtained by moving m places to the right. This consists of m+1 consecutive array cells, assuming that the cell for index n-1 is not crossed. Remember that the mushroom picker has to pick at the kth position, where she starts. In this case, the sequence of array cells is from k to k+m (inclusive).

For the next range, the mushroom picker moves one place to the left and then m-1 places to the right, with 1 wasted move. The range here is from k-1 to k+m-2 (inclusive). For the next range, the mushroom picker moves 2 places to the left and then m-2 places to the right, with 2 wasted moves. The range here is from k-2 to k+m-4 (inclusive). For the next range, the mushroom picker moves 3 places to the left and then m-3 places to the right, with 3 wasted moves. The range here is from k-3 to k+m-6 (inclusive).

The ranges are obtained in this way, skipping 1 cell from the right end, until the end of the next range is k+2 or k+1. It would be k+2 if m is even and k+1 if m is odd. Whether m is even or odd, there is skipping of one cell for the end of a range, going backwards, in this scenario. The lowest index for the end of a range in this scenario is k, which is not reached. These last ranges will be arrived at, if the cell for index 0 of the given array, is not crossed, for the start of a range.  

Limits for Range of Right Complete Initial Movement                      
The left and right limits for the above ranges in order are:

    k to k+m
    k-1 to k+m-2
    k-2 to k+m-4
    k-3 to k+m-6
        |
        |
        |
    k-i to k+2 or k-i to k+1 (see below for meaning of i)

Among all these ranges, the maximum sum for the scenario has to be chosen.

There are two scenarios and each has its maximum sum. The higher of either maximum sum is the required maximum sum.

Let the integer variable, i represent a range, in the above list of possible ranges (from top to bottom), beginning from 0. Then the left slice limit is given by k-i and the right slice limit is given by k + m - 2*i. These limits are based on the assumption that the left slice limit does not cross below 0 of the given array, and the right slice limit does not cross above n-1 of the given array. Since for some lists this is not possible, for the lower limit, the program has to choose the maximum value between 0 and  k-i; and for the upper limit, the program has to choose the minimum value between k + m - 2*i and n-1 (zero based indexing).

A range must include m number of moves, and must include spot number k. In either scenario, ranges are skipped, by one cell. Each scenario compensates for the other for this skipping of ranges. When the ranges of both scenarios are put together, no range is skipped.

O(n2) Solution
O(n2) Solution takes into account the above scheme but does not use prefix sums. The code for this has not been provided in this document.

O(n+m) - Solution

The O(n+m) solution uses the above scheme with prefix sums. The prefix sums are obtained in O(m) time and maximum slice sum is obtained in O(n) time, giving a total of O(n+m) time. The program (code) is:

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;

    int mushrooms (vector<int> &A, int k, int m) {
        int N = A.size();      
        vector<int> P (N+1, 0);    //prefix sums vector
        for (int k=1; k<N+1; k++)
            P[k] = P[k - 1] + A[k - 1];
            
        int finalMax = 0;    //final maximum
        int tempMax = 0;    //temporary maximum
    
        //Left Complete Initial Movement
        for (int i=0; (k - m + 2*i)<k; i++) {  //number of alternate ranges
            int leftSliceLimt = max(0, k - m + 2*i);
            int rightSliceLimt = min(k+i, N-1);
            tempMax = P[rightSliceLimt + 1] - P[leftSliceLimt];
            finalMax = max(tempMax, finalMax);    //compare max of slice with final max at once
        }
        
        //Right Complete Initial Movement
        for (int i=0; (k + m - 2*i)>k; i++) {  //number of alternate ranges compensating above
            int leftSliceLimt = max(0, k-i);
            int rightSliceLimt = min(k + m - 2*i, N-1);
            tempMax = P[rightSliceLimt+1] - P[leftSliceLimt];
            finalMax = max(tempMax, finalMax);    //compare max of slice with final max at once
        }
        
        return finalMax;
    }

int main(int argc, char *argv[])
{
    vector<int> A = {2, 3, 7, 5, 1, 3, 9, 4};
    int ret = mushrooms(A, 4, 6);
    cout << ret << endl;

    return 0;
}

The output is 25. The algorithm library had to be included, for the max() and min() functions.

Conclusion

A prefix sum at an index, for a given array, is the sum of all the preceding elements of the array. A prefix sums array can be produced in O(n+1) time, were n is the number of elements in the given array. The number of elements in the prefix sums array, is 1 more than the number of elements in the given array. The first prefix sum is 0. The sum of a slice in the given array, is obtained from the corresponding prefix sums array, in constant time, O(1). This is done by just subtracting the prefix sum, at the beginning of the slice (section), from the prefix sum, just after the end of the slice.




Related Links

More Related Links

Cousins

NEXT

Comments