"Success has many fathers, but failure is an orphan"
Old proverb
Many examples from exploration have shown how important a detailed analysis of a disappointing wildcat can be. An interesting story is that of the Fahud field in Oman (Tschopp, 1967), where a very disappointing well turned into a giant field. But also successes may have been systematically over- or under-estimated.
Hindsight analysis (also called "dry hole analysis" or "Post Mortem" in a more negative way) is here the statistical investigation after prospects have been drilled, by comparison of results and predictions (Nederlof, 1994). Because of the probabilistic nature of the estimates before drilling, such a comparison is far from straightforward. Basically a single number, the outcome, has to be compared to the expectation curve before drilling. But more often and more meaningful, a number of prospect outcomes is compared with their corresponding expectation curves.
Why would we bother to make such comparisons? There are few good reasons:
If the outcome of a single prospect has to be compared to the expectation curve before drilling, the possibilities for testing are very limited. Obviously, if the outcome falls completely outside the range of the expectation curve, there must be something wrong. But generally the situation is less clearcut. The most simple approach is to use the cumulative distribution and pick the percentile where the actual outcome is situated. If it is far in the right tail of the distribution, say at the 5% or at less percentage, the deviation can be regarded as significant, and indicating underestimation. However, if the outcome is zero, and the expectation curve contains many zeros (POS << 100%), we cannot conclude much, although overestimation may be indicated. Considerably more can be learned if a number of expectation curves can be compared with the corresponding drilling results. The analysis will be fourfold:
(1) Probability of success
(2) The volume in the case of success
(3) The total sum of volumes discovered and the sum of expectations
(4) The ranking ability.

| 0.23 | 0.35 | 0.12 | 0.38 | 0.03 | 0.57 | 0.14 | 0.41 |

The total sum of volumes discovered and the sum of expectations
I once checked the outcome of 20 ventures that had been estimated by various prospect appraisal methods. I had the expectations and the actual results. The sum of the actuals was about seven times smaller than the sum of the expectations. A rather typical sign of over-optimism. Now the evaluations in this example were far from thorough, and date from a time that not too much attention was paid to get it right. A later example, based on more sophisticated appraisal is from Oman.
The procedure is:
At this stage we can compare a single number (the total actual) with the expectation curve (MC summation). The result for the 16 prospects evaluated and drilled in Oman were:

Ranking ability
Even if the Pos, the KS and the Sum test have found everything in order, we still do not know whether the appraisal system can rank prospects in an optimal sequence. This property is "ranking ability" and is quite independent from the other tests. The test devised for this purpose is the 
Most CAT curves will befound between the two extremes: the best possible (only practically possible if you have a sixths sense) and the worst possible (suggested by the Devil, and detrimental to the Present Value of the discoveries). In order to test a sequence of prospects we need a test statistic that measures the deviation from the "random drilling" line. This is:

We can also calculate the cumulative area under the observed curve and divide this by the total cumulative discovery. This we denote as the CAT-r, a kind of correlation coefficient, that can vary from -1 to +1. It would be zero for the (average) random drilling line. For the significance test, however, we use the above CAT statistic. It can be proven that this statistic has a mean of zero and a variance of

Where x is the set of n actual values. The variance of the x's is calculated considering that the n values form a population in the randomization. The total number of permutations being n!. Even for small n the sampling distribution of the n! possible CAT values tends to normality. The figure below shows a case where only 9 prospects with expectations ranging from 0 to 50 mb are permuted 20 times. The resulting distribution of that sample follows rather nicely the normal.

A real case was published by Sluyk & Parker (1986). In a period of 10 years 165 prospects had been appraised by the Shell prospect system (Sluyk & Nederlof, 1984) and also had been conclusively drilled. The result of the CAT test was:

For smaller sample sizes, the CAT test for ranking ability is supposedly better than a simpla rank correlation (Spearman or Kendall) because CAT uses more information and does not require a correction for tied observations. For the above large sample the significance level is about the same.
An interesting aspect is the indication of the usefulness of including the geochemical/geological data, above the practically pure structural information (at the time of the above analysis) from seismic methods.