Evaluating Sectoral Training: A Utility Tool for Setas

The South African skills development framework has mandated Sectoral Education and Training Authorities (SETAs) to initiate sector-specific training programmes. If SETA planning is to be proactive, the evaluation and forecasting of improvements in industry outcomes from these training programmes (such as productivity or profitability metrics) should be of concern. This article pursues this end through the well-established area of decision theoretic utility analysis. It suggests a method whereby SETAs may forecast or estimate the industry gains from a given training programme. It is suggested that percentage increases in output may be the utility output of greatest interest and use to SETAs. The national accounts of South Africa are used to estimate the appropriate input data for each industry in these techniques. Other issues in application and research are also suggested. JEL J24


INTRODUCTION
Under the South African skills levy system, one of the primary mandates of Sectoral Education and Training Authorities (SETAs) is to initiate sectoral training initiatives, especially in smaller and medium sized enterprises (Department of Labour, 2001). Underlying assumptions of such training are that increased skill levels among employees will stimulate sectoral output, firm profitability, economic growth and employment levels (ibid: 2).
However, one of the difficulties of such programmes is estimating in advance the impact on industry productivity or profitability. While it may be possible to guess that certain training may increase metrics such as output or profitability, it would be far better if tools could be made available to help SETAs predict improvements. Such tools would enable SETAs to choose between programmes, and therefore to prioritise. It would also help them to budget and account for their activities.
While complex econometric methods exist, these techniques require highly skilled experts to apply and interpret, and often have considerable data requirements. However it would not be feasible for SETAs to retain or hire such experts on a long-term and ongoing basis. SETAs need estimates that are quicker and easier to develop. Therefore accessibility of the methodology is an issue.
This article will accordingly propose a simpler solution, stemming from the well-established industrial psychology theory of decision theoretic utility analysis. Therefore, after a brief introduction to training evaluation and a statement of the problem, classic decision theoretic utility analysis is reiterated (for the training case) and South African data is presented as inputs to the model. Various implementation issues and examples are given, and recommendations for further research made.

TRAINING EVALUATION
Classic training evaluation theory holds that there are four levels of training evaluation. From least to most difficult and useful, these are (Kirkpatrick, 1996): Level 1. Reactions: How positively do trainees react towards the training? Did they enjoy training, are they satisfied, inspired etc.? Level 2. Learning: Do trainees acquire the desired knowledge, skills, attitudes etc.? Level 3. Behaviour: Do actual on-the-job behaviours improve? Level 4. Outcomes: Are key business indicators, such as profitability or productivity, being improved by the training?
Level one evaluation is generally seen as inadequate for business use. Levels two and three evaluations are the most commonly utilised, usually through experimental designs. Here companies will test the effect of the training on control and treatment groups, attempting to establish if any difference can be detected. Differences in knowledge, skills, attitudes or behaviours between control and experiment groups are measured as a standardised 'effect size'. If the effect size is measured as metric, it is linked fundamentally to the t-statistic.
Obviously, however, knowledge of business and industry results (level four evaluation) is most desirable to decision makers. For SETA training specifically, the ultimate construct of interest is productivity or profitability within a whole sector due to grant-funded training.
Unfortunately, it is often unfeasible to measure level four outcomes at the level of the individual employee. To calculate the profitability of an individual worker requires complex cost accounting techniques, tailored to the specific job and situation (Schmidt et al., 1979: 615). This is an unfeasible task in normal firmbased evaluation. The difficulty in this case is compounded by the fact that the evaluations are being conducted by SETAs, who have to generalise across companies without reference to the specific operations of each one. Therefore, as alluring as the possibility of actually calculating level four outcomes at the individual level may seem, it is generally not feasible.
However decision theoretic utility analysis is a method that has long been used to shortcut this problem. This is achieved by making an overall estimation of increased productivity or profitability attributable to an intervention, without need to measure this at the individual level, therefore solving the inherent measurement problem.
The way in which such estimations of increased value are made is through an intermediate level two or three measurement. That is, as long as it can be proved that training is increasing knowledge, skills, attitudes or behaviour, and it can also be illustrated that variance in the level four outcome is dependent on variance in the intermediate variable, then an overall judgement of increased value can be made. This relationship is seen in Figure 1 below:

Figure 1 Simple training -intermediate variable -value relationship
Take, for example, the training of welders. In a pre-training experiment, SETAs can relatively easily assess whether the knowledge or skills about welding, or indeed the welding itself, has improved due to the training. This level two or three evaluation is useful, but now increases in productivity or profitability are desired. By estimating the variance in level four outcomes attributable to variance in welding knowledge, skills or ability, one has a link by which overall gains from the training can be estimated. This essentially is decision theoretic utility analysis, which will be derived and adapted to the SETA situation next.

Better knowledge/ skills (level two)
Better performance (level three) Higher economic value (level four)

DECISION THEORETIC UTILITY ANALYSIS FOR TRAINING
The following sections will derive traditional decision theoretic utility analysis, although framing it specifically for SETA training. First, it is worth explaining why the general statistical approach (in this case linear) is taken.
It has been established that SETAs, or firms, implement management interventions such as training with the ultimate intention of impacting a hard-tomeasure dependent variable. In the case of training, the WORTH of employee behaviours to the industry or organisation is the variable of final interest. Now, there are different ways of operationalising employee worth. One could conceive of worth in monetary terms, which is of course the variable of real interest to firms. One could also measure employee worth in terms of his/her output (which, it is argued below, is of greater interest in the industry-wide case). When looking at levels two or three evaluation, it is immediately apparent that one is measuring exactly the same underlying worth in behavioural, attitudinal or skill terms. At the enterprise level, most commonly rating scale points (i.e. performance appraisals) are seen as a surrogate for worth. The same underlying construct is being measured, just in different ways.
What stops these measures of worth (monetary value, output, rating scale points etc.) being exactly equal? First is measurement error, in other words if one or all of them poorly measured then they won't come out the same. Performance appraisals, for example, are generally subject to much error. Error will be dealt with later, assume for now that there is no measurement error, in other words that true monetary worth and true performance scale worth can be assessed accurately. The other reason that measures of worth are not the same is that they are measured in different units: one performance appraisal 'unit' is generally not worth one Rand (monetary worth) or one unit of output.
Because both true (without error) monetary value (Y t ) and true performance scores (R t ) are measuring the same underlying construct (employee worth), they are therefore perfectly linearly related (congeneric), with the intercept and slope of the linear equation defined only by the difference in units (by true we mean with no measurement error). That is (Raju et al., 1990: 4): Remember that there is assumed to be no measurement error. Actually, measurement error in terms of intra-rater reliability is only problematic in individual or small-group measurements, not in aggregate measures over large groups. This is because classical test theory says that the group mean of an observed score is equal to the group mean of the true score (Raju et al., 1990:4).
Also, the expected value (mean) of an error term in large groups is generally zero. Therefore the only measurement error worth worrying about is scale or inter-rater reliability. This will be discussed later.
Therefore, the basis for decision theoretic utility analysis is linear regression (Raju et al., 1990: 4;Schmidt et al., 1979: 613). As mentioned before, it approximates the aggregate productive gain (in the chosen units of analysis) due to increase in an intermediate variable (knowledge, skills, attitudes or behaviour) arising from a management intervention of some kind. In order to approximate the final monetary gain, the technique uses an estimated linking variable that translates intermediate change into value. The linking variable is generally amenable to global judgmental or empirical estimation techniques. Brogden (1946Brogden ( & 1949 and Cronbach & Gleser (1965) first developed the decision theoretic technique for the analysis of a selection method. Later, Schmidt et al. (1982) derived a version for training (or any performance enhancement intervention), which will be derived here (although using a different derivation to theirs).
A succinct statement of the problem is as follows. A SETA wishes to estimate the total change in the result-based 'utility' that the training brings about (productivity or profitability increases of some kind, hereafter to be called 'utility'). The problem is that the utility for the trained group cannot be measured every time that the training is being done. Estimation is needed.
The estimation model can be developed as follows (Schmidt et al., 1979: 611-12). Let the independent variable X be any intermediate, level two or three measure, i.e. knowledge, attitudes, skills or behaviours. Let the dependent variable Y be any results-based dependent variable, such as productive output. Based on the previous discussion a linear model can be assumed, so: where Y = the monetary value of job performance; β = the linear regression weight on test scores for predicting job performance; X = knowledge, attitudes, skills or behaviours (the predictor of individual value); y µ = mean value of job performance of random untrained employees; and e = prediction error. This equation applies to an individual. If it is to be applied to a selected sample, the following is achieved: Since E(e) = 0, and β and y µ are constants, this can be rendered as: In a case where the average knowledge, skills, attitudes or behaviours (the level two or three predictor, X) differs between the control and experimental groups (designated 'C' and 'E' respectively), then the average change in utility can be said to be: Since generally in a congeneric case X Y βσ σ = (Judiesch et al., 1993: 904): where d t is the effect size of the training spoken of earlier (i.e. the standardised change in the predictor X brought about by training). To achieve a utility formula, therefore, only the following is necessary: 1.
Calculate the effect size of the training on the intermediate variable (d t ). This requires a comparison between the performance ratings of the trained group and a control group, both standardized on the control group's standard deviation. 2.
Calculate σ Y . See below on various estimation techniques to do this.
The above utility equation is formulated for one time period (generally per year), and for one trainee. Generally, users multiply by the number of years (T) and people being trained (N) to come to a complete measure of utility. Also, if there are direct costs attributable to the training (such as loss of productive time) it is common practise to subtract these ('C' below) from the utility estimates. Thus overall utility becomes: The direct cost term is not included below, merely for purposes of brevity, although it should always be included in practise where relevant and calculable.
It is also possible that the effect of training (i.e. the knowledge or skills incorporated by employees from training) may degrade over a certain number of years (T) at constant rate i, in which case (if utility is cumulative) the equation becomes: Equations 6 and 7 represent ways of calculating how big a productivity or profitability increase can be expected by a SETA or firm when training with a certain effect size is implemented. One issue that has not been discussed fully is what measures of productivity or profitability can and should be used, and consequently how σ Y is to be estimated. As will be seen next, this is perhaps the area in which the practise will differ for firms as opposed to SETAs.

CALCULATING THE STANDARD DEVIATION OF Y (σY)
As can readily be seen, the calculation of the standard deviation of Y is the crux of the decision theoretic utility procedures. This construct is the linking variable which translates the effect size of a change in X directly into utility terms. Therefore it is crucial. It is however also the most difficult variable to estimate. While cost accounting techniques can be used, these are very time consuming and costly, which was the main reason for the slow adoption and application of decision theoretic utility in organisations prior to the 1980s (Schmidt et al., 1979: 615).
However over the past two decades, several feasible techniques have been introduced for estimating σ Y . These include the following: Analyses of empirical studies of σ Y found that its lower and upper limits correspond with 40 per cent and 60 per cent of wages and salary respectively (Schmidt et al., 1983: 407). Therefore, if the measure of utility Y is monetary value, a short estimation method is simply to take 40 per cent of average salary. This does, however, give a somewhat conservative value for σ Y (Judiesch et al., 1992).

3.
Cascio and Ramos (1986) derived the so-called CREPID method ('Cascio Ramos Estimate of Performance in Dollars'). This eight step procedure essentially estimates σ Y through salary (as a surrogate for employee worth) weighted by the estimated importance of each principal activity undertaken by the employee. Raju et al. (1990: 7) suggest that the CREPID method can be summarised in the following equation: where a Y = the economic value of employee a, M = average annual salary, K is the number of principal job activities, W i = the proportional importance of principal activity i (such that ΣW i = 1.00) and P = the performance rating for employee a on principal activity I (with P always between zero and two). For more on this procedure see Cascio and Ramos (1986). Note that the CREPID procedure assumes that the dependent variable (Y) is monetary value. Having reported very briefly some of the many methods for estimating σ Y , this article will next explore the specific issues in decision theoretic utility for SETA training, and suggest ways in which SETAs could maximise their planning from these techniques.

AN INDUSTRY-WIDE AND OUTPUT-BASED APPLICATION FOR SETAS
In decision theoretic utility analysis, the measure of ultimate utility could be almost anything of interest to the users ( In initiating sector-wide training, SETAs will very often be interested more in sectoral output than the profitability of individual firms. This is because the profit is largely a contextual, organisation-based construct. However output is at the heart of standard productivity measurements, and is a vital input into competitiveness statistics. Profitability is also overly based on and affected by external factors to be a reliable metric for industry-wide evaluations. If output is standard across the industry (i.e. units of output are the same across firms), then the best way of estimating σ Y for changes in output probably remains the Schmidt et al. (1979: 619-25) judgmental approach. In such a case, subject matter experts from across the industry would estimate the output for a certain standard task resulting from employees at the 15 th , 50 th and 85 th percentiles of X (or at least for the 50 th and one of the other percentiles). See Schmidt et al. (ibid) and further publications (e.g. Cascio, 1999: 226-32) for more on this procedure. Of course, for this to work, output would have to be fairly standard and measurable (as it might be in standard production environments). This condition does not always hold.
In cases where output is not standard enough to directly be converted into common units, which may be often, Schmidt et al. (1983) suggest a procedure for assessing percentage increases in output. This is a far more general construct, and should be especially useful in an industry context. The procedure involves substituting "σ p " (the standard deviation of changes in output) instead of σ Y . σ p is more properly defined as the standard deviation of output as a percentage of mean output. This procedure could, of course, also be used if output is standard, although as will be seen below, it may be less reliable. σ p can be approximated from the 40 per cent rule. Schmidt and Hunter (1983) do so by multiplying the 40 per cent by the percentage of output made up by wages and salaries. They estimated that this figure is 57 per cent for the U.S. economy.
If Y is defined as output (such that ∆U is percentage increases in output from training), then using the 40 per cent rule σ p should be 23 per cent of average salary (57 per cent of 40 per cent of salary). Again, this gives percentage increases in output, not absolute increases.
In South Africa, an output calculation based on the 40 per cent or 70 per cent rules would require different adjustments. As can be seen in data from the national accounts (Statistics South Africa, 2003) suggest that wages and salaries have constituted approximately 49 per cent of the value of goods and services produced by the whole economy over the past ten years. If this figure were to be used, the linking variable for increases in output should be 49 per cent of 40 per cent = 19.6 per cent of salary.
There are however significant fluctuations in this figure over industries. As can be seen in Table 1 through Table 6, industry averages of wages as a percentage of output have ranged from 31 per cent (agriculture) to 98 per cent (other) over the past ten years. This suggests that each SETAs should adjust for output calculations by its appropriate sectoral figure, not by the national percentage.  Table 2 Wages as percentage of GDP, primary industries Table 3 Wages as percentage of GDP, secondary industries

Construction Manufacturing
Output Wages % Output Wages %  An illustration might be in order. Take a simple example with the following parameters: • The Construction SETA (CETA) is planning a training programme for welders.

•
The effect of training on the predictor variable X (in this case a set of work sample assessments) is evaluated on control and experiment groups, on a scale of 0 to 200. Using standard experimental design, it is shown that the experimental group on average score 120 (X E ) and control group only 108 (X C ) after training, with no difference in pre-training.

•
The standard deviation of X scores in the control group (σ x ) is 24.
Following equation 5, it is first necessary to estimate the standardised effect size brought about by the training (d t ). Since it is highly unlikely that assessments of welding skill are perfectly reliable, d t for scale or inter-rater unreliability is also adjusted for ( XX R ). Schmidt et al. (1982: 336) substitute the commonly utilised empirical estimate (based on meta-analyses of prior reliability studies) of .6 for R xx . d t is therefore calculated as: Now that d t is estimated, it is multiplied by σ p . Using the 40 per cent rule, σ p can be found simply by taking 40 per cent of the percentage of wages and salaries that make up output in the construction SETA.
From Table 1 is can be seen that, for the construction sector, wages and salaries make up 64 per cent of output (the ten year average). Therefore σ p is estimated by 40 per cent of this figure. Overall, therefore, percentage increase in output is expected to be: Thus an improvement of approximately 15 per cent in output can be expected (without skills decay) from welders exposed to this training programme. Note that in this case, time or number of people trained is not multiplied into the equation, as the percentage figure calculated is proportional across these. Of course, should absolute output be quantifiable in unitary form (e.g. number of welding hours per R100 000 of production), then one can multiply this percentage by the average output of trainees, the number of trainees and the time given to come to an absolute increase in welding output.
If, due to staff turnover or knowledge / skill degradation, a 10 per cent degradation in this effect (i) can be expected over ten years (T), then the expected actual percentage improvement in output is expected to be: Note that the discounting term is not additive here, as in Equation 7, because the output percentage is not a cumulative construct. That is, being a percentage, it is constant or constantly decaying over time, and not compound.
Thus an improvement of only 5.8 per cent can be expected in the case of the stipulated skills decay from welders exposed to this training programme.
These figures for the percentage improvement in output can now be used as an input into overall industry calculations of productivity. It must be emphasised that these are relatively rough calculations -as stated above, the 40 per cent of salary rule has been shown to give conservative utility totals. In addition, various factors have been shown to affect utility. Hunter, Schmidt & Judiesch (1990) found that σ p increases with complexity of the job. Furthermore, Schmidt et al. (1983) illustrated that σ p is lower for jobs with incentive pay (especially piece rates) than for purely salaried jobs. It is probably preferable to get a better estimate of σ p , if possible through a global estimation procedure such as that developed by Schmidt et al. (1979).
However the utility estimates achieved through procedures such as this need not be perfect. It is enough that SETAs are confident that at least sizeable gains are being made in industry output. Furthermore, even with relative unreliability, SETAs can still compare programmes using this methodology, choosing to implement the most productive training. Thus it is proposed that this sort of utility formulation could be very useful to SETAs, as it has been proved to be in firms.

RECOMMENDATIONS FOR FURTHER RESEARCH
Within the context of skills development, research is still at the validation level. Concrete evidence is required of the impact upon crucial outcomes, both of the skills levy in general and normative suggestions such as that suggested here. Unfortunately, the complexity underlying productivity in any given industry makes it unlikely that research could detect whether utility estimates do in fact lead to the overall changes. However it may be possible and indeed desirable to create experimental situations to assess this. Task environments with standard output should be easiest in this regard.
Within the decision theoretic utility framework, ongoing and industry-specific research on the reliability of each element (effect size calculations, estimations of standard deviation etc.) should be conducted. Most pressing is a validation of the 40 per cent of salary rule for South African conditions. Since the suggested methodology for calculating σ p consists of multiplying the 40 per cent rule by the percentage of industry value added made up by wages and salaries, it is vital that local conditions comply with the former heuristic.
Finally, it may be useful for future researchers to consider how techniques such as this could be used to evaluate the more general training funded out of the National Skills Fund. Perhaps a way could be found to adapt the techniques to utilise employment as the output, although obviously employment is at least partially demand-driven, and relies imperfectly on supply of skills. However, in the right contexts, it may be possible to estimate the effect of general training on employment chances or even levels.

CONCLUSION
The skills development system in South Africa has begun to settle down to a 'business as usual' phase. It is important that the money being funnelled into the system get utilised in as efficient a manner as possible. Intuitive analyses as to what training is needed in any given industry should, if possible, be complemented by hard estimates of productivity or profitability improvements. This paper has therefore suggested one possible technique in this vein. Based on decades of research, and adapted for South African statistics, it is a natural addition to the skills development system. Easy conversion into a computerised information system should ease any computational jitters and resistance to implementation, and hopefully make for improved SETA and NSF decisions.