Is there a need to audIt CVM applICatIons to the enVIronMent ?

It is well known to economists that the contingent valuation method (CVM) fills an important gap in valuation technology with respect to managing public environmental goods and services. Currently acceptable CVM practice requires many challenging steps to be followed. One of these important steps is that of assessing the theoretical validity of the household willingness to pay (WTP) finding, but it is far from being a sufficient basis for reaching conclusions as to the credibility predicted community willingness to pay for environmental services. This paper reviews the step of testing for theoretical validity and challenges its importance relative to other more fundamental assessments of the credibility of the predicted household and societal WTP. This paper then deduces that an external “audit” assessment may be necessary, in addition to an internal one, for these values to attain credibility in the determination of public choices.


Introduction
There are two main ways of valuing environmental goods with strong public good characteristics: by analysis of values revealed in actual markets and by analysis of values elicited in constructed markets (Folmer et al., 1995).The former technique observes values of the environmental benefits and costs associated with the goods or services as indirectly reflected in the markets.The latter technique generates values in the form of willingness to pay (WTP) responses elicited from people who are placed in hypothetical valuation positions.The people selected for interviewing are asked questions on what a specific change of the current status of an environmental service is worth to them (Turner et al., 1990).The people surveyed are those expected to have a demand for the goods/ service.As this technique is based on verbal (expected) responses, it is a stated preference (SP) technique.
The most widely used SP technique is the contingent valuation method (CVM).The empirical roots of CVM may be traced back to a study by the US National Park Service in the Delaware River basin area (Hanemann, 1992).This basin is home to about 8 million people, includes the world's largest freshwater port (Philadelphia) and houses the second largest refining petro-chemical industry in the United States, as well as many other major industries.The basin is a habitat for large numbers of plants and animals and is a massive recreation asset (Partnership for the Delaware Estuary, 2006).It was inevitable that there would be tradeoffs between environmental goods and services, recreational demand and development.The CVM was used as an important method of measuring these tradeoffs.
CVM rapidly gained popularity during the 1970s (Wattage, 2001: 5).By the 1980s it was being widely applied in North America, Europe and many other parts of the world and has continued to be widely applied.One of the reasons for CVM's rise in popularity is its capacity to measure both passive use and existence values of environmental goods (Breedlove, 1999: 5).These values are not as easily captured using other prominent environmental goods/service valuation methods such as the hedonic pricing model (HPM) and travelling cost method (TCM).In addition, it is often difficult to identify the value of the specific feature in which one is interested using these other methods because of the strong public good characteristics and composite nature of many environmental goods/services (Field, 1994;Kahn, 1995).
The rise in the popularity of applying CVM has occurred despite the technique being very sensitive to many discretionary elements and its validity being challenged on the grounds of implausible results (Bateman & Wallis, 1999: 4).Part of the reason for CVM's ongoing popularity is that there have been many advances made in reducing the discretionary elements that affect it.This has been achieved through adherence to procedural guidelines and the conduct of recommended tests for validity (Arrow et al., 1993;Bateman & Wallis, 1999;Carson et al., 2001).
This paper addresses the question of whether the predicted values are credible when these guidelines are adhered to or not.The paper is organised as follows: the stages of applying CVM that are typically most prominent in reports are described, some of the many other important aspects relating to the credibility of an application of CVM are identified and a case is advanced for incorporating external "audit" into the accreditation process.

Bid functions and theoretical validity of a CVM
The application of CVM can be broken down into various steps or stages, such as: questionnaire design; administration of survey; data capture and screening; bid function estimation and generating predicted bids; aggregation across society; and credibility assessment (Hanley & Spash, 1993).Arguably, many studies applying CVM devote most of their attention to reporting the steps of bid function estimation and the theoretical validation components of credibility assessment, e.g., Whittington et al. (1990), Lauria et al. (1999) and Hosking and du Preez (2004).The reason for this is that, other than the questionnaire itself, these steps are the easiest to report on.
There can be no denying, however, that bid function estimation and tests for theoretical validity are important.
Bid functions (curves) predict individual WTP from a selection of determinants (Lauria et al., 1999).The relevant explanatory variables are measured from responses made by a sample of selected respondents.Estimating bid curves can be a one stage or a two stage process.The most important task (first stage) is estimating a bid function in which all the relevant explanatory variables for WTP for which data have been collected are included, that is, the estimation of the complete model.Following an analysis of the significance of the coefficients in the complete model, another model may be estimated in which only those coefficients significant enough in the complete model are incorporated.The significance of the coefficients and overall explanatory power of this reduced model may provide further insight into the relationships being explored.The reduced model would be the preferred one by which to predict WTP in the event of there being an interest in how changes to the explanatory variables would affect WTP.Typically, complete model bid functions take the following form: where: WTP i = Willingness to pay S i = Socio-economic characteristics of the respondent C i = Characteristics of the environmental goods O i = Other relevant characteristics of the respondent.
These are estimated using one or more of the Ordinary Least Square (OLS), Tobit, Logit and Probit models.The OLS and Tobit models explain the variation of the WTP amount in monetary terms (Lauria et al., 1999) while the Logit and Probit models explain the WTP probabilities (Buckland et al., 1999).The Ordinary Least Square (OLS) models define the best fitted straight line that relates a dependent variable to several independent variables while the sum of squares of the residuals (SSR) is minimised (Bowerman & O'Connell, 1990).The standard equation takes the form: where: y i = the ith observation of the dependent variable x ij = the ith observation of the jth independent variable for j = 1, 2 , …, p  0 = the intercept  j = the parameter of the jth independent variable  i = the ith residual observed.
The main criticism of the OLS model applied to the WTP case is that some of the predicted WTP values are negative, which of course, is illogical, because they can only be positive.In addition, OLS models become awkward with respect to the zero WTP responses at the specification stage.These zero WTP responses pose a dilemma -should they be treated like other positive observations, or should they be left out altogether (Hill et al., 2001)?The Tobit model avoids these problems and, as a result, is preferred to the OLS model from a statistical point of view.
The Tobit model was first proposed by Tobin in 1958 and is also known as a limited dependent variable regression model.It employs the same classical regression model framework of the OLS model, but divides the data into two groups (Gujarati, 2003).The first group includes all the data that have values of the dependent variable above the censoring value.The second group consists of the censored observations, i.e., those observations that fall at or below the censoring value.The general form of the Tobit model is shown below (Green, 2003): where: a = censoring point (value) x i ' = a row vector whose first element is 1 followed by p independent variables, x j , j = 1, 2, …,p  = a column vector of parameters i, i = 0, 1, 2, …, p.
The parameters for this model are obtained by maximising the log-likelihood function.The log of the likelihood function (ln L) in the censored Tobit regression model takes the following form (Green, 2003: 767): The first part of the equation is the linear regression model for the unlimited observations that uses the maximum likelihood estimator (MLE) procedure.The second part of the equation indicates the relevant probabilities for the limited observations (Green, 2003).As a result, the estimators ( b ) have several desirable distributional asymptotic properties, i.e., they are normally distributed, unbiased and have minimum variance (Green, 2003).
The Logit model is an appropriate estimation model when a dependent variable is measured by discrete choices.In this case the dependent variable takes the value of either 0 or 1 (absence or presence of an attribute) and the model predicts the outcome in terms of probability of occurrence (Hanemann & Kanninen, 1999).Due to the nature of the dependent variable in this model, the relationship between the dependent variable and independent variables it describes is always nonlinear.
The Logit cumulative distribution function (c.d.f.) is expressed as follows (Gujarati, 2003): where: Z = a continuous random variable with a logistic distribution.
In the application of the dichotomous dependent variable estimation, the Logit (based on logistic cumulative function) is not the only suitable c.d.f..An alternative, and in some cases preferred, model is the Probit model (Gujarati, 2003).
The Probit c.d.f. is expressed as follows (Hill et al., 2001): where Z = a continuous random variable with a standard normal probability distribution.
Both the Logit and Probit functions have the characteristics of an S-shaped c.d.f., whose probabilities approach zero when the independent variables are at low levels, and one when the independent variables are at high levels.All predicted probabilities fall between zero and one.
In both the OLS and Tobit models the coefficients measure the value change of the dependent variable for a unit change in the value of an independent variable, where other independent variables are held constant (Gujarati, 2003).The partial coefficients of the Logit and Probit models have no direct interpretations (Verbeek, 2000).A meaningful interpretation of the estimated coefficient of an independent variable requires determining the slope relation between P / (the estimated probability of WTP) and X (the independent variable), P/ X 2 2 / (Mirer, 1995).This determination must be obtained when P/ X 2 2 / is equal to zero.For both Logit and Probit models, this occurs at the point where Z / = 0, that is, when the probability density functions (p.d.f.), of the logit and probit, denoted g(Z) and f(Z) respectively, are maximised.This occurs when g(Z) = 0,25 and f(Z) = 0,399.
It can be shown that in the Logit model: and in the Probit model: In regression cases, both Logit and Probit models can be extended to a multiple regression framework (Mirer, 1995).
There is no theoretical reason to prefer the Logit over the Probit model, or vice-versa.The choice made is a matter of convenience (Gujarati, 2003;Hill et al., 2001) and of how well the data fits the model.
The contribution of individual parameters to explain WTP is typically done with either a t-test or z-test (for the models using the method of Maximum Likelihood Estimation).To compare the predictive qualities of the complete and reduced models, provided one is nested within the other, an F-test may be used for the OLS model and a log-likelihood ratio statistic may be used for the Tobit, Logit and Probit models.The test of the null hypothesis is that none of those variables excluded in the reduced model are significant and are therefore equal to zero.
The measurements of the overall significance of the regression models can be explained by R 2 (multiple coefficient of determination) or adjusted R 2 (adjusted multiple coefficient of determination) (Mendenhall & Sincich, 1996).The applicable significant measurement for the binary response models are the McFadden R 2 and the count R 2 (Gujarati, 2003: 605).
The models best suited to predicting both mean and median willingness to pay are the OLS and Tobit ones.Two different types of means and medians may be identified.The sample mean and median include all observations in their calculation and the predicted mean and median include only those observations used to estimate the WTP (bid) functions.Only the latter are used in contingent valuations (CVs).
Bid curves are not only used to predict the WTP variations expected as a result of changing the independent variables, but also to examine for coherence and consistency in the relationship predicted between the non-WTP and WTP responses.The signs and significance of the coefficients are then examined for plausibility and consistency with economic theory, an examination known as the test for theoretical validity (Hanley & Spash, 1993).
Typically the adjusted R 2 found for bid functions are quite low.Based on a survey of CVs, Hanley and Spash (1993) argue that an adjusted R 2 value of the estimated model of at least 0,15 (15 per cent) is acceptable.Some studies have utilised this argument as the benchmark for the assessment of construct validity of the CV (Hosking et al., 2004).
Arguably, the overall explanatory power of the bid function is the less important of the expectation validation tests.Much more important is the consistency, or lack of it, found between the coefficient signs and values of the bid function and what one would expect from economic theory and experience.Typical theoretical principles are, other things being equal, that people pay more if they use more of a goods item/service, earn more, or are wealthier, and pay less the more abundant and efficient the item/service substitutes are and the more difficult it is for them to access the amenity.Findings that conflict with these theoretical expectations call into question the survey results and therefore the validity of the household WTP finding.
In addition to the bid function expectation test, it is desirable to build other validation checks into the questionnaire, for example, checks aimed at testing for part-whole (or embedding) bias problems in the responses and at testing the plausibility of the share of budget allocated to the goods item in the form of the bid.

The questionnaire should provide a sound basis for eliciting the required information
The CVM questionnaire must state its purpose clearly so that the respondents understand the context (hypothetical scenario) accurately and fully and the lay out should facilitate easy understanding and completion of the form.The interviewers must be well trained in order to secure the respondents' cooperation and allow the respondents to participate in an informed manner.The context of the questionnaire should be realistic and encourage truthful responses.
Focus groups and a pilot study should be used before administering the questionnaire in order to discover and rectify inadequacies and improve the accuracy of the responses.
Care should be exercised to value a defined set of goods and services rather than a moral position.For instance, it should not be the health of the environment that is being valued (a moral position) but a specified change in the services yielded by the environment that is being valued (a specific service package).When moral positions, rather than packages of goods, are valued, the CV is said to incorporate a category valuation error (Keat, 2002).
In order to minimise what is known as respondent fatigue, the amount of information provided to the respondents and the number of questions asked should not be excessive.Respondent fatigue causes the respondent to provide responses aimed first and foremost at bringing the interview to an end.
The way in which the payment for the item or service is to be made (the payment vehicle) must be credible, realistic, relevant and acceptable to the respondents.In some cases the appropriate payment vehicle may be a national tax.In others, it may be a local tax or a user fee -perhaps linked to other payments for public services or access.It may also be in the form of a price increase.Given the free-rider problems associated with public goods provision, the least appealing would be a payment in the form of a voluntary contribution or donation.Ideally, the way the payment is to be made, in the event of charges really being introduced, should be decided by the responsible collecting authority prior to the survey being undertaken.
The types of WTP question formats most frequently used to elicit responses from respondents are open-ended, closed-ended, bidding game and payment card ones (Hanley & Spash, 1993).

The elicitation process should be accurate and appropriate
In addition to the survey instrument, the administration of the survey and capture of the information is important.Due to the many complexities involved in the CV questionnaire, the questionnaires are typically administered by face-to-face interviews.,As this is expensive, budgets frequently constrain the sample to inadequate sizes.Too small a sample undermines the power of the statistical test's significance (Hair et al., 1998).This is not, however, the most fundamental challenge of sample design.That challenge is selecting a sample that represents the target population.If the sample design is improperly determined this will lead to biased estimates (Wattage, 2001).The role of the person conducting the interview is crucial in the pursuit of authentic responses.Inter alia, this person must encourage the respondents to keep their budget constraints and the availability of item/service substitutes in mind and also to concentrate only on the goods and services being valued.One of the main problems encountered in the elicitation process is that respondents bid on a wider scope of goods and services than they are being asked to -what is referred to as the part-whole bias or embedding problem.

The data capture and screening processes should be scientific and checked
If the information collected is not accurately captured, all the benefits of following proper elicitation procedures are lost.For this reason, the accuracy of the data captured should be checked.
The data clearing process is where some responses are discarded.The data that are discarded include unrealistically high bids (outliers), zero bids without reasons (protest bids), refusals to participate and data where critical explanatory information has not been given, e.g. annual income.If a mean bid is used, outliers must be excluded (Jones- Lee et al., 1985).The norm for the exclusion of outliers is three standard deviations from the mean.The problem of deciding on exclusions does not occur if the median bid is used because outliers do not distort the central tendency of the median.
Successfully meeting the challenges outlined in Sections 3.1-3.3above lays a good foundation for the prediction of WTP.

The user population should be accurately estimated
The predicted societal (total) WTP is the product of the predicted household mean or median WTP and the predicted number of households estimated to have a demand for the relevant environmental goods/services (N).The accuracy of the prediction of N is of vital importance.It is, however, frequently problematic because many environmental service user populations are transient and there are no records kept of them.If the estimate (prediction) of N is incorrect, the societal WTP will be incorrect.If N is very wrong the resultant societal WTP is likely to be absurd -a situation where it would probably be better not to generate any value at all, because what is generated serves more to mislead than to inform (Diamond & Hausman, 1994).

An internal credibility assessment of the predicted societal WTP should be undertaken
There are two opinions that should be expressed on the credibility of the WTP predicted, an internal one and an external one.The internal credibility assessment will be undertaken by the team carrying out the contingent valuation.An external assessment would be one undertaken by a person who was not part of this team.The credibility of the WTP is assessed in terms of both reliability and validity.Reliability refers to being able to replicate the results by executing the same method under the same circumstances.
Besides the correspondence, or lack of it, between the predictions of the bid function and economic expectations and theory, there are also many other "construct" factors to consider with respect to validity, for instance, the internal consistency of responses (no contradictions).
Ideally, the societal WTP deduced using the CV method, should also be compared to equivalent service valuations generated with the same method but at different locations, or at the same location but using other methods such as the travel cost method or the hedonic pricing method.Any assessment of the validity of the CV based on this type of comparison is said to be a test for convergence (of value findings).Tests for convergence are desirable and can substantially add credibility to a CV finding.These tests are, however, fallible and the generated values are frequently themselves subject to considerable error, making the interpretation of divergence difficult.
Once all the assessments for the credibility of the predicted values have been completed, there arises a need to assess them as a whole (composite).One way of doing this is in the form of a score sheet -as done in Hosking et al. (2002).

The need for an external audit
The administration of tests for validity and reliability by the team generating the contingent valuation do not conclusively establish the credibility of the predicted WTP.These tests will not necessarily reveal errors in the way the information was elicited, captured and screened and may also not reveal errors in the population size, especially if these errors are carried over into alternative methods of valuation.In addition, any weaknesses that such internal assessment tests reveal may be unacceptably minimised by the way they are reported.
Having conducted several contingent valuations, this author is particularly aware of the scope for erring in objectivity with respect to the internal credibility assessment (Hosking & du Preez, 2004).The potential for lapses in objectivity arises due to factors including author bias (the influence of the author's preferences), client bias (the influence of the commissioning agency's preferences) and inexperience.
For these reasons there is a case for requiring an external audit opinion of the credibility assessment of contingent valuations before they are used to shape public choices.This type of audit would be best done, not by an auditing firm, but by a person recognised as having the required competence and experience.The following aspects should be included in the enquiry upon which the external opinion was formulated: • An assessment of the sufficiency of the scope of the internal credibility assessment (what was included and what was excluded); • An assessment of the extent to which the design of the contingent valuation study was influenced by the need for a credibility assessment; • An assessment of the technical competence of the team conducting the contingent valuation; • An assessment of the authenticity of the survey data reported; • An assessment of the risk of misstatement of the predicted household WTP; and • An assessment of the risk of misstatement of the predicted user population.

Concluding comment
There is no doubt that CVM makes a useful contribution to the valuation technology available to economists with respect to guiding public decision making on the management of environmental goods and services.The method is suitable for the specific valuation needs of management with respect to many policy and allocation issues, in South Africa and elsewhere.CVM has been widely applied and refined.
Notwithstanding this suitability, the method has many critics and if applied improperly can yield misleading results.Undoubtedly, the method will continue to be modified in the light of ongoing experience and what will constitute acceptable CV practice will continue to change over time.
Current practice in applying CVM entails various guidelines being followed; many of which are challenging.An important part of this practice is the internal credibility assessment, but on its own it is an insufficient basis for making important public choices.The internal credibility assessment is itself prone to various biases.
For this reason, as a matter or good practice, external opinions on the credibility of contingent valuations should be commissioned for those values which will influence public decisions on the environment.This function would be similar to the service contributed by the external auditor with respect to the credibility of financial statements.