Prior distributions for geological variables


General observations

A prior distribution ("prior") contains information about what to expect when your personal experience is limited. They from an important part of the calibrated prospect appraisal approach. Expert systems do usually contain some background experience data, assembled from many sources, or individuals. Therefore it can be more effective than the judgment of a single individual. This "canned" experience may be in the form of a simple histogram, or a functional relationship between two or more geological variables. In this page a number of useful priors are discussed. The word "prior" in this context refers to a prior distribution of the mean value and variance of the data that forms our experience, usually world-wide experience. Some of these distributions are used as default distributions in the Gaea50 prospect appraisal program.

In a volumetric estimate a number of factors are multiplied, for instance, gross reservoir thickness times a NetToGross fraction times porosity, and so on. The numbers used for these factors are supposed to be the mean value for the reservoir rock. For this reason we are interested in the mean value of a given factor and in the case of a Monte Carlo simulation, in the variance of this mean. Because we work here at a second level of uncertainty: We have a distribution of observations with a mean and a variance, but now we are going to worry about the uncertainty of these parameters. The parameters of the distribution describing the uncertainty of the mean, or of the variance, are the parameters of the prior distribution. The term "hyperparameter" is used to distinguish these parameters from the parameters of the overlying data distribution.

The basic idea in bayesian estimation is to start with a prior distribution and use local pertinent data to update the prior to a posterior distribution. The local data are "analogs" of the prospect to be evaluated. Priors are usually better than a state of complete ignorance. A prior which is close to ignorance is called a weak prior. "Ignorance" about porosity would be expressed as "porosity can vary between 0 and 100%". Of course, we know better than that, as is described below. Although, we have to be careful to limit porosity to the range of 0 to 50%, because in the Amposta field, offshore Spain, the drill bit unexpectedly fel some 20 feet in a 100% porosity cave.
Hence, the distribution we use is either the prior of the mean in case of no analogs available, or the posterior of the mean when analogs are at hand. Note that the analog values of factors are themselves mean values of a reservoir. Therefore our prior, or posterior mean is actually a "mean of means".
The prior distribution of e.g. the mean of a process is not easily estimated subjectively in practice. However, the collection of world-wide observations is a frequentistic sampling process which allows applying simple rules to get the mean and the variance of the prior distribution of the mean. At the same time we have the variance of the world-wide observations, which is the "process variance". These statistics are required in a bayesian updating process. This update uses a sample of relevant "local" observations as new information which, together with the prior gives the parameters of the posterior distribution. The latter is usually a more realistic input to a Monte Carlo simulation than using an often too small sample of local relevant data. The drawback of using only the few local data at hand is to underestimate the true uncertainty.

Residuals as priors

Many variables relevant to appraisal are functions of the depth: temperature, pressure, oil gravity, porosity, etc. Tp predict such variables it is not sufficient to make a map of observed values and interpolate the value at an unknown location (see kriging). Although the location of analogon data is important, the 2-D picture can be misleading if the depth information is not used. Therefore all available data are used to make a regression of the variable on depth. This provides prior distributions for any depth in the form of the residuals of the regression. For an unknown location, the value is then estimated with the regression equation by inserting the depth for the particular prospect. Most of the time it can be assumed that the residuals are normally distributed. Sometimes it is better to work with the log-transformed data.
A quantitative petroleum system model (PSA model), such as Gaeapas, contains a number of "calibration results". These are various constants obtained from multivariate analysis. The analysis results are in the form of regression coefficients and distribution parameters of the residuals. A prior distribution in such case is the distribution of the mean estimate, given a number of X-variables and the variance of this mean estimate ( the "noise" about the regression line or variance, divided by the sample size). In a Monte Carlo simulation the regression with full uncertainty is reproduced in a prediction routine, which involves more than just the "standard error of esitimate".

Sampling problems

Data gathering for a prior should not become biased by the availability of data. The usual data that one encounters is clustered and not necessarily independent. So care has to be taken to spread the sampling. The variance then will be larger, but the distribution is more representative. An example of this is porosity, where data from developed fields greatly outnumber the data from wildcats in the area, with the danger that the oilfield data bias the prior distribution for the area.

Probability of hydrocarbon charge

This probability ("P[HC]") is of major importance in the Gaeapas material balance model. Charge is defined as the arrival of a significant amount of oil and gas at a trap. In the exercise I did an amounty of 100,000 barrels oil in place was taken as a minimum accumulation. However, an oil seep may also be a good indication of generation/migration, although quantification of amounts is usually impossible. And a good HC show in an a wildcat also counts. For this prior data were assembled in some 300 explored sedimentary basins/provinces, some 33,000 exploration wells in all, plus data on seeps. The data were represented as "Yes/No", hence we assigned 1 to positive HC charge case and 0 to the others. A kriging analysis produced a significant variogram showing a range of about 50 Km. So beyond 50 km, on average, charge may not be predicted at A on the basis of well B, that is too far away. The conclusion is that an oil-province (on average) has a radius of 50 Km. Geologically speaking, this is of course a "Mickeymouse" approach, but it has been proven to be helpful.
It will be obvious that from an existing discovery only one well is counted.

The next process was to count cases in the 300+ "provinces". The counts gave estimates of the P[HC] for each. The distribution of this global sampling of P[HC] was approximated with a Beta distribution. The parameters of this distribution form the prior information for a sedimentary basin. This prior suggests that in a sedimentary basin where no drilling or seeps are available, we may assume three hypothetical wells of which one has HC shows making P[HC] about 0.3. The beta distribution can be easily updated to obtain a local "posterior" P[HC] on the basis of locally observed HC shows/wells.


Many studies are available in literature that analyze the relationship of porosity to geological variables, such as age, depth, lithology, pressure, overpressure, maturity, quartz content and so on.
I used the correlation of porosity with depth and lithology given by Ehrenberg & Nadeau (2005) as a prior in the Gaeapas program. This data is based on a world-wide sampling of 30,122 siliciclastic and 10,481 carbonate reservoirs. Both the mean and the variance change with depth. Any local porosity depth pairs, for the correct lithology, can be used to update the prior at a given depth to, hopefully, a narrower posterior porosity distribution.

An earlier study by Schmoker (1988) relates porosity to maturity and a paper of 1982 concentrates on carbonates. Good examples of studies that provide prior porosity distributions. See also Scherer(1987) and Gluyas et al. (1997) for estimates on a physics basis.

Top seal

Seal capacity is usually estimated as the ability to hold a certain differential pressure. Oil and gas in situ being less dense that the formation water, a pressure differential ("Pd") is created at the base of the seal. This variable was measured in some 160 well-documented cases of un faulted top seals (Nederlof & Mohler, 1981). The analysis had to handle some censored data as the Pd was in certain cases a minimum or maxmum observation. The geological factors for Pd were, amongst others, thickness, depth, lithology. Although capillary pressure is one of the most important factors, it was mainly represented by lithology as a proxy. Thickness should, at first sight, not have an influence, but it proved to be important. The conclusion was that thickness of the top seal is a proxy for the difficulty of leakage by small faults and fractures, which could not be observed in the "unfaulted caprocks" in our sample. Another process explaining thickness as a factor is diffusion. For gas this might cause considerable loss, as modeling by Montel et al. (1993) showed.

The results of this study are incorporated in Gaeapas as regression constants and the prior distributions of the residuals. In practice the thickness, lithology and the degree of faulting and fracturing are used as input variables to get an estimate of Pd. In addition, the reservoir engineering data for calculating in-situ densities of the HC and water are used.

Oil density (API)

Any serious quantitaive appraisal program will contain a number of formulas for calculating the PVT conditions in the reservoir and the Formation Volume Factor (FVF) of the oil, or the Expansion Factor (EF) for gas. The oil density occurs in many places in this process. API gravity data are fortunately widely published. Using world-wide data, I made a plot of API versus depth, in which the depths are sub seafloor:

"Prediction" of API on the basis of depth would be an exaggerated term, but the idea is to use the trend and the considerable scatter to provide a prior distribution that should be a reasonable guide for a particular depth. The following plot of the residuals shows that the prior follows a normal distribution with a standard deviation of 7.5 degree API:

Great scatter in the API values comes from a number of causes. It is possible that maturity plays a significant role. In that case some "mature"or light oil (high API) migrates vertically up to levels where a lower API would be expected. At shallow levels, where temperature permit bacterial action on originally light oil, a heavy residue after the light "goodies" have been eaten.

Oil fields size

Although not often applied, a prior distribution of field sizes could help in estimating sizes of discoveries. This discussed elsewhere in this website. The shape of the prior is lognormal, or a related skew distribution.

Condensate ratio (richness)

Condensates are usually light oils that are dissolved in a free gas accumulation (Fan et al., 2006).They can have different origins, such as changes of oil in the migration path, decomposition of normal, black oil inan accumulation, etc. A prior distribution of the condensate richness ("CGR") could be based on a detailed knowledge of the geochemistry of the petroleum system and the history of the accumulation and PVT conditions, but such research has not been done, or I have not seen it.

Instead, I have gathered some data from 73 known condensate accumulations, where the pressure, temperature and depth, as well as the condensate ratio were available. A multivariate regression of CGR on Pressure and Temperature explains about 46% of the variance. The contribution of T is negligable, so with only P as a predictor we still obtain 46% R-square. Therefore the estimation procedure would be (1) Estimate the probability that there is condensate, given free gas, and (2) the estimate of CGR based on P. The first condition is important because the most common situation is one with no condensate at all, hence "dry gas", with any kind of PVT condition. In the condensate cases the regression can then be used to predict the CGR.

Recovery efficiency

A world-wide survey of primary and secondary recovery efficiency should lead to all sorts of regression equations, using depth, API, drive mechanism, permeability, etc. as significant factors. However, general data available did not show a correlation that would hold generally.
A number of studies for local area or plays is more useful. If those are not available, or thought not to be applicable, a simple histogram of cases observed world-wide gives a reasonable prior. Here is the prior for primary oil recovery efficiency used in the gaeapas program:

In Gaeapas the user can enter a number of locally observed recovery efficiencies. Then the bayesian update of the above prior is performed.

Initial well productivity

Well productivity in barrels per day or in terms of the productivity index can be studied on the basis of a "proxy" model, simplifying the engineering formula. A colleage in Shell, Dr. Leine, made a world-wide study and formulated a simple model on the basis of the formula for a producing well:

(Pe - pwf) = drawdown (a pressure differential)
k = permeability
h = thickness of pay zone
divided by:
μ = viscosity
The complicated part between brackets has to do with
the diameter of the drainage zone around the borehole and
the formation damage (skin factor).

In order to translate this formula in "geological terms" the reasoning was as follows:

The result is a simple proxy formula, not to make precise estimates, but to provide prior distributions for a prospect:

Z = depth
φ = porosity
h = net pay
API = oil gravity

This approach worked quite well, especially by separating data in onshore/offshore/ and clasitc reservoirs/carbonates. Also a remarkable effect was found when plotting residuals of the above regression against year. A typical logistic curve in the period 1940 to 1980 became apparent. This effect was the improvement of technology over those years, moving to larger tubing diameters, better control of skin, etc.
Therefore the older data are not a reliable guide to a prior, but new research should be undertaken to arrive at valid priors for today.

Reservoir engineering parameters, such as PVT data.

Many factors are involved in estimating amounts of oil and gas in place, gas/oil ratio, expansion factor, water saturation, residual water saturation, etc. The problem for a geologist appraising an undrilled structure is the lack of data. From analogons, the pertinent factors have to be estimated. To get a grip on the uncertainty of the estimates, geological proxies have to be translated into the required variables.

Gaeapas uses approximations for the density of the formation water, subsurface pressure, GOR, gas gravity, Z-factor, Formation Volume Factor (FVF or boi), oil API. The uncertainty is generated by the uncertainty of the input variables. It would be better if the uncertainty of the published regressions could be used, but most publications do not specify the margin of error sufficiently. Fortunately, the impact on prospect appraisal results is minor, compared to other uncertainties, such as HC charge.