Season-Success is an algorithmic problem that does not only deal with probabilistics but also benefits from dynamic programming. In order to fully understand what going on, you should be familiar with the fundamentals of probability and dynamic programming.

The problem goes as follows:

Assume that you are a farmer and your new years resolution was to invest in your farm. In order get a loan from the bank you have to compute the probability of a successful harvest this year. To do that you manage to get a list of n days and, for each day, the probability of rain on this day p_i . How can you compute the probability that the season is a success, which it is if there are at least k rainy days among those n days.

  • Input
    • The input starts with two integers n,k where n is the total number of days of the season and k is the minimum number of days it must rain for the season to be a success.
    • The next n lines each contain one real number 0 \leq p_i \leq 1 that denote the probability of rain on day i
  • Output
    • You should output the probability (0 \leq p \leq 1) of the season beeing successful

The first step here is to understand what we have to calculate. We subsequently assume that we have already handled the input and that we have

  • An integer n
  • An integer k
  • An array of probabilities prob[] = (p_0, p_1, \hdots, p_{n-1})

We can generalize the problem above:

Given a set of n events and a function Pr[\cdot]: \{0,1,\hdots, n-1\} \rightarrow [0,1] that assigns each of those events a probability of occuring, calculate the probability that at least k of those n events will occur.

Now let X_i be some indicator variable that is defined as follows:

    \begin{align*}X_i = \begin{cases} 1, & \text{ event $i$ occurs}\\ 0, & \text{ else} \end{cases}\end{align*}


We can clearly define the random variable X that denotes the number of occuring events like this

    \begin{align*}X = \sum_{0\leq i < n} X_i * Pr[i]\end{align*}


What we need is the probability of at least k events occuring which is

    \begin{align*}Pr[X \geq k] = \left(\sum_{k \leq i \leq n} Pr[X = i]\right)\end{align*}

A trivial solution would therefore be to compute Pr[X \geq k] by calculating all probabilities Pr[X=i] for k \leq i \leq n and taking their sum. This however, can quickly become quite nasty in terms of runtime. Even if we take relatively low values like n=20 and k = 8 we have

  • \binom{20}{8} = 125970 outcomes for Pr[X=k]
  • \binom{20}{9} = 167960 outcomes for Pr[X=k+1]
  • \hdots

So going through all of them becomes extremely expensive for bigger n,k

It should be quite intuitive that computing Pr[X = i] for some i involves computing Pr[X = i-1] (because the probability of i events occuring equals the probability of i-1 events occuring times the probability of another event occuring). This means that we have some sort of overlapping subproblems and this should ring a bell. We can use dynamic programming to significantly lessen the number of computations needed. Let’s look at how an implementation could look like:


Let cache[][] be some two-dimensional DP-table that is defined as follows:

  • Table structure
    • The cache is of size n \times (n+1)
    • cache[i][j] stores the probability that j events up to event i occur (note that counting i starts at 0)
  • Computation of an entry

        \begin{align*} cache[i][j] = \begin{cases}1.0 - Pr[0], & \text{if } i=j=0\\Pr[0] & \text{if } i = 0, j = 1\\cache[i-1][0] * (1.0 -Pr[i]) & \text{if } i \neq j = 0\\cache[i-1][j-1] * Pr[i] + cache[i-1][j]*(1.0 - Pr[i]) & \text{else}\\\end{cases}\end{align*}


    Explanation of the cases:
    • cache[0][0] because the probability that no events occur up to the first event is exactly the probability that the first event does not occur
    • cache[0][1] because the probability that one event occurs up to the first event is exactly the probability that the first event occurs
    • cache[i][0] because probability that no events occur up to event i equals the probability that no events occur up to event i-1 times the probability of event i not occuring
    • cache[i][j] because the probability that j events occur up to event i equals the probability that either j events occur up to i-1 or that j-1 events occur up to i and that i does occur
  • Computation order
    We compute the entries of the table from left to right, top to bottom
  • Retrieving the solution
    By the design of our table,

        \begin{align*}cache[n-1][j] &= Pr[X = j]\\\Rightarrow Pr[X \geq k] &= \left(\sum_{k \leq j \leq n} Pr[X = j]\right) = \left(\sum_{k \leq j \leq n} cache[n-1][j]\right)\end{align*}


    So adding up cache[n-1][j] for all k \leq j \leq n will give us our solution

Great. Now that we know how to solve this problem using dynamic programming, lets give a java implementation. First we need a method that takes an empty table and the array of probabilities for each event, and fills the table as we have specified above

//Fills a dynamic programming table where cache[i][j] contains the probability of
//j events occuring up to event i
static double[][] fillProbCache(double[] probs, double[][] cache) {
      
   //Base cases
   cache[0][0] = 1.0 - probs[0];
   cache[0][1] = probs[0];
   for(int i = 1; i<cache.length; i++) {
     cache[i][0] = cache[i-1][0] * (1.0 -probs[i]);
   } 
      
   //Fill cache left-to-right and top-to-bottom
   for(int i = 1; i<cache.length; i++) {
     for(int j = 1; j<cache[0].length; j++) {
       cache[i][j] = cache[i-1][j-1] * (probs[i]) + cache[i-1][j]*(1.0 - probs[i]);
     }   
   }
   
   return cache;
}

Now we have to instantiate the table and compute the final solution

//Given an array of length n with probabilities for n events, compute the probability that at least k of these events occur
public static double solve(double[] probs, int k) {

   //Create dp-cache.
   //cache[i][j] stores the probability that at least i days of snow occured
   //at the first j days of the season
   int n = probs.length;
   double[][] cache = fillProbCache(probs, new double[n][n+1]);
        
   //Get solution
   double out = 0.0;
   for(int i = k; i < n+1; i++) {
     out += cache[n-1][i];
   }
        
   //output
   return out;
}

And thats it.


Runtime

Analyzing runtime is, as with most dp-algorithms, fairly easy. Our total runtime is dominated by filling the table. Each of the n*(n+1) = n^2 + n table entries can be computed in constant time, so our total runtime is quadratic with

    \begin{align*}n^2+n * O(1) \leq O(n^2)\end{align*}