Analytical versus Monte Carlo
In the past, when main frame computers had a RAM of a few K, statisticians and mathematicians tried their utmost best to find analytical solutions to stachastic problems. Now, many problems of combining distributions, or simulating stochastic processes can be done through Monte Carlo methods (MC) without great costs in computer time. The results will never be exact but good enough for the purpose. One of the advances of the MC methods is that the mechanism is easier to explain than a rather complicated math solution. MC goes repeatedly through the same process, but each "cycle" uses different random choices from the input variables. The result is a vector of thousands of equally likely results. This column vector is sorted in ascending order and then can be used as an expectation curve.
Pseudo random numbers
MC procedures use random numbers generated by a computer program. Actually, these numbers are only pseudo-random numbers. The only true random numbers would be generated by natural processes, such as radioactive decay. In the computer random numbers between 0 and 1 are generated by a linear congruential generator such as:
The modulo means that when the result of a cycle exceeds the number m, we take the remainder and carry on with that. There is a whole science behind the choice of the constants a and b. An unlucky choice can give a sequence that at first sight appears to be random, but does not pass as such in statistical tests for randomness. Here is an example that will cycle after each 100 random numbers:
The constants are:
X0 = 0
a = 11
b = 17
m = 100
Random number X Result before modulo m 0 17 17 204 4 61 61 688 88 985 85 952 52 589 89 996 96 1073 73 820 37 424 24 281 8 908 5 105 72 72 9 809
For a random number between 0 and 1 we would divide the whole numbers above by m = 100. In the (old, 32-bit) computer program I used m = 2,147,483,647. Hence over two billion random numbers before recycling exactly the same sequence. Sufficient for the rough work we do in prospect appraisal.
Generating analytical distributions used in MC
While the MC method of random draws from an expectation curve as shown above will work for any kind of doistribution represented by a sorted vector of values, it may be possible to make draws from a distribution that is analytically defined.
Convenient distributions in MC simulation are the normal, the lognormal, triangular, double triangular and also the simple rectangular (uniform) distribution between 0 and 1. The latter is useful in simulating "yes/no" situations.
Binomial Simulation. To start with the yes/no situation, a random 0 or 1 can be generated (rectangular distribution) to simulate a probability of success P (as a fraction). If the random number between 0 and 1 is less than P then the result is set to 1, otherwise to 0. You might consider the generated vector of 1 and 0 values as a "Yes/No" vector.
For the other distributions mentioned above we need the inverse cumulative distribution formula. This means that the random number represents a cumulative probability. This, inserted in the formula, generates the required random draw x from the distribution. For the details see the pages on the distributions mentioned above.
Latin Hypercube Sampling (LHS)
The generating algorithms practically always start with uniform random numbers between 0 and 1. Depending on the number of Monte Carlo cycles, some freak results may occur. It could be that far more values fall in the 0 to 0.5 interval than above 0.5. To avoid misleading results, the sampling could be split up in intervals and the results for intervals combined. Instead of taking the 0 to 1 interval in one go, it is split up into e.g. 20 intervals of 0.05. Equal numbers of samples are drawn from each interval. The following simulation of a triangular distribution shows that some stability is obtained by LHS sampling with 20 intervals (the black bars). Also less error in getting the mean and variance.
In the LHS case the resulting random draws are not coming in a random order, because the sampling interval are dealt with in sequence. For many follow-up calculations in the MC model it is required to "shake" the vector of random draws to get a random order by "sort and carry" routines.
Prediction based on a regression equation
If a calibration study found a useful relationship between a variable Y and a variable X then the regression equation can be used to predict in a MC simulation the value of Y given a value for X. The regression equation provide an estimate of the conditional mean of Y, given the X value. Then at this point a normal distribution is assumed with this mean and a standard deviation which is the standard error of estimate from the rgression, but with some adjustment. The formulas for this are explained under correlation.
For many purposes this method is good enough. However, when a graph is drawn of the correlated vectors, i.e. Y versus X, a quite unnatural picture is obtained: the "Saturn effect". The correlated elements are forming an almost perfect straight line, while the unsorted parts form a almost circular cloud of points around it. It is looking at the planet Saturn from its side.
For a negative correlation, the sorted parts of the vectors have to be put into reverse order by a "flip routine".
The Saturn effect can be avoided by using a more sophisticated method as follows.
Assume we wish to simulate a correlation of r = 0.5 between two variables, X and Y, represented by vectors of random draws from their respective distributions. The algorithm is basically a combination of statistics and a sorting procedure:
In the context of aggregation of reserve estimates we may have to deal with more than two vectors. In those cases the vectors that have the same dependence are treated in the above way. Other vectors (prospects) with a different interdependence are separately summed, after which the sums of groups are summed to the grand total. Within a group of 4 prospect with degree of dependence r the correlation matrix would look like follows:
If, for some reason, a correlation matrix with several different intercorrelations would be required, a more sophisticated statistical method would be required (Haugh, M., 2004).
When the prospects have full dependence, the POS of the sum (POSDependent) will be equal to the maximum POS in the prospect set. In case of complete independence of the prospects, the maximum POS (POSIndependent) is reached. In between these two extremes we can interpolate between these values in a linear manner, using the correlation coefficient r. Note that the POSIndependent >= POSDependent in all cases.
For 100% dependent prospects (r = 1.0) the "POSdep" of the sum becomes:
For the independent prospects (r = 0.0) the "POSind of the sum becomes:
For any r the formula is:
That the difference between the dependent and independent case is proportional with r can be seen in the following Monte Carlo result, in which three prospects were summed with the individual POS's as indicated below the graph, of which 0.90 is the maximum:
Unrisked Sum. Although the above calculation can be done by hand, the of the "unrisked" sum of prospects requires a Monte Carlo addition, including the correlation, whereby in each cycle a prospect resource is added to the total, but only if the binomial simulation of its POS is 1. For the Mean Success Volume again a hand calculation might suffice, provided we have the means of the unrisked volumes of the prospects available. The each mean is multiplied by its POS and added to the sum.
Here n is the number of Monte Carlo cycles.
Similar rules are used for probabilities (POS), using the binomial distribution. If not too skewed, the binomial can be approximated by the normal distribution, especially in this context, with many MC cycles (= large samples). For an estimate of POS derived from n Monte Carlo cycles, its standard deviation (s is given by the following formula.
Here is an example of confidence ranges from the prospect appraisal program
Gaeapas with 1000 MC cycles.
If the number of MC cycles is small in comparison with the uncertainties from the analysis, it may be that the calculated confidence limits at the lower end become less than zero. In that case the lower confidence must be set to zero. At the upper end, for POS, the 95% limit may exceed 1.00. Also there the POS limit has to be set to 1.00.