USING BENFORD ’ S LAW TO DETECT DATA ERROR AND FRAUD : AN EXAMINATION OF COMPANIES LISTED ON THE JOHANNESBURG STOCK EXCHANGE

Accounting numbers generally obey a mathematical law called Benford’s Law, and this outcome is so unexpected that manipulators of information generally fail to observe the law. Armed with this knowledge, it becomes possible to detect the occurrence of accounting data that are presented fraudulently. However, the law also allows for the possibility of detecting instances where data are presented containing errors. Given this backdrop, this paper uses data drawn from companies listed on the Johannesburg Stock Exchange to test the hypothesis that Benford’s Law can be used to identify false or fraudulent reporting of accounting data. The results support the argument that Benford’s Law can be used effectively to detect accounting error and fraud. Accordingly, the findings are of particular relevance to auditors, shareholders, financial analysts, investment managers, private investors and other users of publicly reported accounting data, such as the revenue services. JEL M 40


Introduction
Albert Einstein was playing his violin in a duet with Werner Heisenberg, who was accompanying him on the piano. After a while Heisenberg slammed his hands down on the keys and said: 'It's one, two, one, two, Einstein! Can't you count?' (Arthur, 1993) From the mid-1990s investment markets witnessed a surge in the incidence of exposed accounting frauds and irregularities which, in turn, prompted a significant tightening in the regulatory environment as part of a regulatory effort to stamp out the occurrence of accounting deceit. 2 Although recent evidence suggests that this regulatory response has been effective in reducing the occurrence of dishonest accounting, the impact has not been comprehensive. Moreover, the experience of the past decade demonstrates that, whilst the country, industry and business detail behind the data distortions vary, the cases share a common harmful ailment: accounting frauds have resulted in considerable destruction of investor wealth. 3 In addition, recent evidence shows that the number and size of companies that are disclosing accounting irregularities and frauds have grown with time. For example, the number of restatements due to accounting irregularities in the United States (US) increased by over 150 percent between 1997 and 2001 (Floyd, 2003, 5). Moreover, the median size of companies making restatements in the US, measured by market capitalisation, increased from $500 million in 1997 to $2 billion in 2002 (Floyd, 2003, 7). In South Africa, the trends have been similar, with a growing number of firms reporting accounting irregularities and frauds over the past decade.
Against this backdrop, and as noted, numerous efforts are being made to improve accounting standards and auditing practices. Regulators are also at pains to make firms' managers and directors more sensitive to the consequences of financial malpractice. However, the pace of progress is slow and the effects unfinished. Moreover, human behaviour is such that fraudulent practices will linger even in a world of perfect accounting systems and watertight auditing practices. Thus, those interested in the accuracy of publicly reported accounting data -including auditors, shareholders, financial analysts, investment managers, private investors and other users of publicly reported accounting data, such as the revenue services -must remain vigilant for fraudulent accounting practices.
Helpfully, at this juncture, a little known but powerful mathematical law, called Benford's Law (Benford, 1938), presents itself as a potentially potent tool for rooting out fraudulent practices from a wide array of information sets, include accounting data. Significantly, the law has been used in a range of international settings to detect data error and fraud, including the case of accounting data. Despite this potential, it is surprising to find that whilst Benford's Law has been used by practitioners in the South African setting, no attempt has been made to publish evidence on the effectiveness of Benford's Law in detecting accounting data error or fraud in a domestic setting. This paper aims to address this gap in research by exploring the relevance of Benford's Law in the detection of anomalies in data presented by firms listed on the Johannesburg Stock Exchange (JSE).
The remainder of the paper is divided into five sections. Section 2 provides an overview of Benford's Law, while Section 3 examines the mechanics of employing Benford's Law to detect accounting data irregularities as well as the data set employed in this study. Section 4 is devoted to analysing the results, and provides comment on the reliability and relevance of the tool as a detector of fraudulent or erroneous accounting data. On this score, the findings of this study suggest that Benford's Law has the capacity to play a helpful role in assisting users of accounting data detect error or fraud in financial information. These findings are in line with expectations and concur with the results of similar studies carried out in other countries. Some comment is also made in this section on areas for further research. Section 6 is devoted to concluding remarks.

An overview of Benford's Law
In 1881, the astronomer-mathematician Simon Newcomb published a short article in the 'American Journal of Mathematics' describing his observation that books of logarithms were more worn in the beginning and progressively unspoiled throughout (Newcomb, 1881). From this, Newcomb inferred that researchers (including fellow astronomers and mathematicians, as well as biologists, sociologists and other scholars) using the logarithmic tables were looking up numbers starting with the digit 1 more often than numbers starting with the digit 2. Similarly, Newcomb inferred that researchers were looking up numbers starting with the digit 2 more often than those beginning with the digit 3, and so on (Hill, 1998: 1). After a short heuristic argument, Newcomb (1881: 40) concluded that the probability (P) that a number (D 1 ) has the first significant digit (that is, first non-zero digit) d 1 is: From Newcomb's rule, it can be calculated that the probability of 1 occurring as the first digit is 0.301 (or 30.1 percent). Similarly, the probably of 2 being the first digit is 0.176 (17.6 percent). In this vein, Table 1 shows the probabilities of first digits based on the above equation. That the digits are not equally likely comes as a surprise to most observers. However, it is even more striking that Newcomb (1881) was able to claim the existence of an exact rule describing the distribution of first digits. Despite the profound insights offered, Newcomb's article went unnoticed. However, more than half a century later, and independently of Newcomb's findings, American physicist Frank Benford made exactly the same observation about logarithmic books and concluded the same first-digit law. But Benford went further than Newcomb by testing his conjecture with an 'effort to collect data from as many fields as possible and to include a wide variety of types [of data]' (Benford, 1938: 551). To be more specific, Benford's published findings were based on 20 229 observations from such diverse data sets as areas of rivers, atomic weights and street addresses (in all, 20 widely different data sets were sampled). Benford's findings indicated that the data closely fitted the logarithmic law. 4 Moreover, apart from this empirical advantage, Benford's paper benefited from a second factor: it was published adjacent to a soon-to-be famous physics paper. With Newcomb's contribution having become completely forgotten, the logarithmic probability law came to be known as Benford's Law. Before proceeding, it is useful to offer an intuitive explanation of Benford's Law. Consider making a deposit of R100 in a bank account that pays interest at the rate of 10 percent per annum. The first digit will continue to be 1 until the account balance rises to R200. This will take a 100 percent increase which, at an annual compound rate of 10 percent, would take about 7.3 years. When the account balance reaches R200, the first digit will be 2. However, growing at 10 percent per annum, the account balance will rise from R200 to R300 in about 4.2 years. Moving from R300 to R400 will take about three years, and from R900 to R1 000 will require roughly 1.1 years. However, moving from R1 000 (where the first digit is once again 1) to R2 000, will take 7.3 years. Thus earlier digits have higher frequencies of occurrence, with the law holding with any phenomenon that has a constant or erratic growth rate (Nigrini, 1999: 2-3).
Interestingly, there is also a general significantdigit law which includes first digits but also higher order digits (which may be equal to 0) (Hill, 1996). 5 For example, the general law holds that the probability that the second significant digit (D 2 ) of a number is equal to d 2 is: From this general law it follows that the second significant digits, although monotonically decreasing in frequency through the digits (as in the case of first digits), are much more uniformly distributed than the first digits. As noted, the rule holds for higher order digits; to illustrate this point, Table 1 shows the unconditional probabilities of occurrence for the second, third and fourth significant digits. Furthermore, the general law also specifies the joint distribution of significant digits. For instance, the general law allows for calculation of the probability that the first and second digits are 1 and 2, respectively. Importantly, the joint distribution is not purely the probability of the first digit multiplied by the probability of the second digit. Rather, the significant digits are dependent. 6 To demonstrate this point, a simple calculation shows that the unconditional probability that the second digit is 2 is ≅ 0.109. But, the conditional probability that the second digit is 2 given that the first digit is 1 is ≅ 0.115 (Hill, 1998: 2). As an aside, Benford's Law is the only probability distribution on significant digits which is invariant under changes of scale (for example, converting from English to metric units or from Yen to Euros), or under changes of base (for example, replacing base 10 by base 8 or base 2, in which case the logarithmic base 10 is replaced by logarithm to the new base) (Hill, 1996). 7 In proceeding, it is worth noting that in the 65 years since Benford's article appeared there have been numerous attempts to 'prove' the law (Hill, 1998: 3). Indeed, by 1990 close on 100 papers had been published focusing on explaining or deriving the law in theoretical terms. 8 But there have been two main stumbling blocks to explaining the law. First, some data sets satisfy the law, whilst others do not. Until recently, there has not been a clear definition of a general statistical experiment that would predict which tables would comply with the law. Second, although there was some success in showing that Benford's Law is the only set of digital frequencies which remain fixed under scale changes, none of the proofs were rigorous as far as probability theory is concerned. Recently, however, these stumbling blocks have been removed by the discovery of mathematical laws of probability which explain and predict the appearance of the logarithmic distribution (Hill, 1995a and. In this vein, Hill (1996: 2) shows that if probability distributions are selected at random, and random samples are then taken from each of these distributions so that the overall process is 'unbiased', then the leading significant digits of the combined sample will converge to Benford's Law (Hill, 1996: 2). More specifically, using modern mathematical probability theory it has been shown that the frequencies of significant digits will conform to the law when data distributions are selected at random and random samples are taken from these distributions. As an aside, not all writers are in agreement with Hill's (1996) conclusion. Brookes (2002), for instance, is critical of Benford's Law. However, Brookes (2002: 4) acknowledges that in the case of data sets that consist of 'quantisized items such as oranges, cows … trees [and money]' these criticisms are not serious.
Histrionics aside, the theorems alluded to above explain why many tables of numerical data follow the logarithmic distribution described by Benford's Law and why others do not. The latter set includes items such as telephone numbers in a given region that usually begin with the same few digits, administered numbers such as personal identify numbers, hourly wage rates, bank account numbers, postal codes and tax payer numbers. As already noted, however, and significantly in the current argument, the theorem also explains why a surprisingly diverse collection of information tends to obey Benford's Law. Examples of such data include large accounting tables, stock market figures, tables of physical constants, numbers appearing in newspaper articles, demographic data, numerical computations in computing and aspects of scientific calculations (Raimi, 1969;Ashcraft, 1992;Dehaene and Mehler, 1992;Hill, 1996 and1998;Ley, 1996;Nigrini, 1999). The explanation for conformity with Benford's Law now is well established: the data sets are composed of samples from many different distributions.
Returning to the main focus of this paper, the prevalence of the logarithmic distribution in true accounting data sets has led to Benford's Law being used in an international setting to detect fraud or fabrication of data in financial documents under the hypothesis that when people fabricate data they do not choose numbers which follow a logarithmic distribution (Hill, 1996). Moreover, it is well documented that people cannot behave truly randomly even when such behaviour is to their advantage (Chapanis, 1953;Bakan, 1960;Neuringer, 1986;Hill, 1999). Further to this, recent studies support the hypothesis that concocted data do not follow Benford's Law closely. Nigrini (1996 and has led the way in this respect, by amassing extensive empirical evidence of the occurrence of Benford's Law in many areas of accounting data. On the back of the accumulated evidence, Nigrini has come to the conclusion that in a wide variety of accounting situations, the significantdigit frequencies of true data confirm closely to Benford's Law (see also Carslaw, 1988;Thomas 1989). Conversely, then, Benford's Law serves as an ideal tool for detecting variances between true accounting data and data that have been manipulated or that contains errors. However, apart from providing a tool that can alert users to possible errors or potential fraud, Benford's Law holds a second advantage over other methods used to detect data corruption: the law is easily applied (Nigrini, 1999: 1). Such a tool for testing data conformity is described in Section 3 below.

Test method
The aim of the current study is to test the potential effectiveness of Benford's Law in detecting data error or fraud in accounting information produced by JSE-listed companies. As a point of departure, it should be recognised that testing need not be confined to the first-digit level. Nigrini and Mittermaier (1997) provide a review of the range of tests available. To start with, because of the general law, testing can be applied to higher-order digits as easily as to first digits (Nigrini, 1999: 4). The law can also be used to test joint frequencies, such as the first-two, first-three or, more generally, first-n digit combinations. Other tests are available. For instance, the analyst can test for rounding of numbers, which suggests estimation. Testing for duplication of numbers or combinations of numbers is also a potential investigative tool that hints at fraudulent or administrative manipulation. Thus, numbers can be binned to test for conformity in various ways. Most commonly, though, testing is done at the level of first-or first-two significant digits. This paper tests data conformity with Benford's Law at the level of the first-significant digit. This basis for testing conforms to the broad-level testing criteria established by Nigrini (2000).
Having identified the test level, the process turns to establishing whether the observed digit(s) deviate(s) significantly from the expected frequencies derived from Benford's Law. In this regard, following Nigrini (2000) a simple regression analysis is employed to assess the significance of any observed deviations from the expected frequencies.
Specifically, to test for conformity with Benford's Law, a regression line is estimated of the form: where Y i is the value of the frequency of the i-th significant digit(s) drawn from the sample data; β 0 and β 1 are parameters; X i is a known constant, namely the value of the independent variable (frequency of the ith significant digit[s]) as per Benford's Law; and ε i is a random error term with mean E{ε i } = 0 and variance σ 2 = {ε i } = σ 2 ; and ε i and ε j are uncorrelated so that the covariance σ ij = 0 for all i, j where i ≠ j and i = 1,2, … , n. A perfect correlation between the sample data and Benford's Law would yield: β 0 = 0; and From this, a t-test is used to test the joint null hypotheses that β 0 = 0 and β 1 = 1, which are the necessary conditions for observed data to conform to Benford's Law.
Given the testing method, it becomes necessary to establish the data sampling technique adopted. Unfortunately, Benford (1938) offered no comment in this regard. Indeed, some writers have gone so far as to hint at Benford having mined the data analysed (Scott and Fasli, 2001: 7). 9 Elsewhere, little insight is offered into suitable data sampling techniques. For this reason, this paper adopts a more 'classical' sampling stance by observing principles that are widely recognised as the basis for generating adequate samples: the samples used are random and sufficiently large and variable to deliver test statistics that offer an appropriate level of precision. The data set is described below.

Data set
To test the potential of Benford's Law to detect error or fraud in accounting data, two data sets are employed. The first consists of a sample of 'errant' companies that were listed on the JSE during the five-year period 1 July 1998 to 30 June 2003. These companies are commonly suspected or known to have committed accounting fraud or produced erroneous data, and their shares were either suspended or delisted during the reference period as a consequence. 10 This sample of 17 so-called 'errant' companies is detailed in Table 2. One firm, Amalia Gold Mining and Exploration Company Limited, was dropped from the sample due to lack of data.  Table 3. to explore statements that are more likely to include errant data. The most obvious place to search for data error is in the income statement. Thus, testing is done on first-digit data drawn from the income statement. The other principal statements produced by firms in their annual financial reports -namely the cash flow, change of equity and balance sheet statements -are less prone to manipulation. That said, data error or fraud that arises in the income statement is likely to percolate into derived statements that include statements of change in equity and balance sheets. So, to eliminate the potential for double-counting of errors, the data set is based on income statement data. Third, in the case of errant firms, only the last set of publicly reported information is used. For 'compliant' companies, the sample set is drawn from the 2002 financial year, as explained above.
Thus, two sets of data are produced by the sampling method, namely: (a) income statement first-digit data drawn from 'errant' companies on a per company basis and (b) income statement first-digit data drawn from 'compliant' companies on a per company basis, with the income statement data consisting of 30 line items as reported by the companies. Accordingly, the full data set consists of 1 020 income statement observations as reported Data drawn from the financial statements of the two sets of companies are confined by three additional parameters. First, the tests run are confined to raw data whose significant number frequencies are expected to follow a geometric sequence when ordered and counted. Raw accounting data read as line items are appropriate for testing. Numbers that are a function of more than one set of other numbers (such as earnings per share, which is a function of earnings and the number of share in issue) are not expected to follow Benford's Law. 12 To ensure data homogeneity, the same line items are used for all companies, as published by data vendor I-Net Bridge's Financial Analysis System (FAS). Moreover, the data that are sampled are 'as reported', which thus excludes all possible influences of adjustments that are typically made by data vendors in their efforts to standardise accounting data. In proceeding, it should be noted that the raw data identified satisfy the main criteria for having expected digit frequencies that are Benford-like, namely: the numbers describe the sizes of similar phenomena; the numbers have no built-in maximums or minimums; and the numbers are not assigned numbers (such as bank account numbers) (Nigrini and Mittermaier, 1997).
Second, because the aim of the tests is to identify data manipulation, it makes sense by the companies. The sampling method then binned data on a per company basis, with testing at the company level justified by the argument that knowing that a group of companies employ errant or questionable reporting practices is of marginal use when compared to the knowledge that a single company adopts such reporting practices.
Thus, for each company the reported income statement data are binned. The binned data frequencies are then regressed on theoretical frequencies to test for significant deviations from Benford's Law. It is expected that the testing process would reveal significant deviations from Benford's Law in the case of 'errant' companies, whilst the frequencies generated by 'compliant' company data are expected to observe Benford's Law. In proceeding, it ought to be noted that in a priori testing, rejection of the null hypothesis does not prove data error, bias or fraud -legitimate explanations for deviations are sometimes found. Rather, a positive test result signals potential data problems, which the data user should then employ as grounds for a more detailed examination of the information. This argument, however, does not necessarily apply in the case of backward-looking tests. Related to this point, it must be recognised that the unit of analysis is the firm, although clearly it is not firms that falsify data, but rather agents of the firm. However, detection of data error at the firm level is arguably a first, necessary step required in any search for the existence of fraudulent company data (this point is returned to below).

Test results
Tables 4 and 5 set out the test results on a per company basis. Table 4 deals with the results of tests conducted on 'errant' companies, and shows the estimated values of β 0 and β 1 ; the standard deviation of the estimated values; and the t-statistic on the estimated values.
The acceptance of the independent null hypotheses that β 0 = 0 and β 1 = 1 at the five percent level of significance is indicated by an asterisk on cell entries in Table 4. However, to satisfy the test requirements, it is necessary that β 0 = 0 and β 1 = 1 lie within two standard deviations of the estimated values of β 0 and β 1 . Accordingly, the test results lead to acceptance of the null hypothesis that β 0 = 0 in 13 of 17 cases. However, as can be inferred from the estimates of β 1 , in all 13 cases the test results reject the null hypothesis that β 1 = 1 at the five percent level. Hence, the joint requirement that β 0 = 0 and β 1 = 1 is rejected in all of these cases. As an aside, there are three instances of significant estimates of β 1 . But all three results fail to meet the criteria of β 1 = 1 lying within two standard deviations of the estimated value of β 1 . Moreover, none of these three cases coincide with acceptance of the null hypothesis that β 0 = 0. Further to this, it is interesting to note that four estimates of β 1 carry the wrong sign. These cases hint at 'extreme' violation of Benford's Law: as first-digits increase from one through to nine, the frequency of first- 'early' data manipulation has a cascading effect. To put the argument differently, misstatement of line items that occur low down in the income statement would mean that a random sample of first digits may comply with Benford's Law due to the possible compliance of earlier numbers which, in the case of 'late' manipulation would make up the majority of first digits. Thus, from the findings presented in Table 4 it is inferred that it is more likely that data manipulation in the current sample occurred early in the income statement rather than late in the statement. This, then, sharpens the fraud detection tool as it is not the company that perpetrates a fraud, but rather agents of the company and, given the above arguments, most likely agents that are able to influence 'early' line items. However, as noted, the unit of analysis in the current study is the firm, and so a more detailed study is left for investigation elsewhere. These comments aside, continuing with the argument, whilst it may be useful to know that 'errant' companies fail to comply with Benford's Law, the test only becomes a useful screening tool if it can be shown that 'compliant' companies generate first-digit frequencies that conform to Benford's Law. Consequently, the second set of tests ensures against Type II error. Given this backdrop, the test results for the 17 'compliant' companies are reported in Table 5, which sets digits increases. Accordingly, first-digit distributions in these data sets are highly suspect. That aside, and in short, none of the data sets tested passes the test conditions established for conformation to Benford's Law.
Thus, the preliminary finding, based on the above sample set, is that Benford's Law is a useful indicator of the existence of fraudulent or erroneous data. All 17 companies that are believed or found to have generated fraudulent data over the sample period fail the test of conformity of the distribution of first significantdigits with Benford's Law. It is unsurprising to note that the estimated values based on pooled data for the 17 'errant' companies indicates that, if measured as a group, the first significant digit frequencies fail to conform to Benford's Law.
As an aside, in the case of 'errant' companies it is evident that the non-compliance of the data with Benford's Law can occur due to manipulation of line items at any level of the income statement. However, that all companies fail to satisfy the intercept and slope aspects of the test implies that data manipulation in the sample occurs in line items that appear close to the top of the income statement (the overstatement of revenue is the most obvious culprit). More to the point, the higher up the statement that manipulation occurs, the greater the deviance of the balance of the statement as the out the estimated values of β 0 and β 1; the standard deviation of the estimated values; and the t-statistics on the estimated values.
is not found to be significantly different from zero at the five percent level (although the estimate is significant at the 10 percent level). Importantly, of the 16 estimates of β 1 that are found to be significantly different from 0, only three estimates fail to meet the further condition that β 1 = 1 lies within two standard deviations of the estimated value of β 1 . Thus, of the set of 'compliant' companies, 13 of the 17 firms pass the joint test of β 0 = 0 and β 1 = 1, indicating conformity with Benford's Law. It is interesting to note that the estimated values based on The results set out in Table 5 for the sample of 'compliant' companies indicate that the null hypothesis that β 0 = 0 cannot be rejected at the five percent level for any of the companies. Moreover, in all 17 cases, β 0 = 0 lies within two standard deviations of the estimated values of β 0 . Thus, all 17 of the 'compliant' companies have significant first-digit frequencies that indicate conformity with Benford's Law in the case of β 0 = 0. In considering the estimates of β 1 , the coefficient is significant in 16 of the 17 cases. The estimate of β 1 on Goldfields (β 1 = 0.64) pooled data for the 17 'compliant' companies indicates that the group's first significant-digit frequencies conform to Benford's Law, with β 0 = 0 and β 1 = 1 for the group. As a final comment on the estimated values of β 0 and β 1 , it is noteworthy that the standard errors on the estimates in the case of 'errant' companies (0.10 and 0.75, respectively) are more than twice the size of standard errors on the estimates β 0 and β 1 in the case of 'compliant' companies. This result offers further anecdotal evidence of the superior 'quality' of 'compliant' company data over 'errant' company data.

Implications and limitations
In short, the results of the testing procedure indicate that conformity to Benford's Law may serve as a robust tool forewarning users of accounting data of the potential existence of data error or fraud. The results are particularly encouraging in this regard in that the test procedure yielded a false-positive result in four of 34 cases (11.8 percent of the sample). Put differently, when applied at the time of annual financial reporting to the above sample of 'errant' and 'compliant' companies, the test of Benford's Law correctly identified 88.20 per cent of the cases (30 of 34 companies), and correctly identified 100.0 per cent of 'errant' cases. 13 The reason for this appears to be elegantly simple: like supernovae, fraudulent companies give themselves away by shining more brightly than their peers as they zealously thrash away their final moments.
Nevertheless, whilst these early results of the application of Benford's Law yield encouraging findings, the test procedure and data set have limitations that suggest further research is required. Some of the more obvious limitations are identified below.
First, the data collection method may include an obvious source of sample bias in that with the benefit of hindsight, the status of 'errant' and 'compliant' companies was known before testing was conducted. This begs the question of whether the test method would be as reliable in the case of live data, that is, as a prediction tool (where the value of the instrument is unambiguously greatest). There is no cause to doubt that this is the case. Nevertheless, testing of live data would go some way in confirming the tool's validity.
Second, and related to this point, the results reveal that the test functions in a highly effective fashion in the tails of the distribution -correctly failing 'errant' companies and passing 'compliant' companies. However, the data set used in this study offers no insight as to 'what goes on in between'. Over most of the sample period there were in excess of 500 listed companies on the JSE. Thus, this study covers less than 10 per cent of the population. A broader study is required to establish the effectiveness of the tool across all firms. Until such time, then, the instrument is arguably best used as an indicator of potential data error or fraud rather than a corroborator of data problems.
Third, the results offer no guide as to whether all companies that fail the test ultimately fail and, if so, what the extent of the lag in time is between detection and failure.
Fourth, in the international setting, Benford's Law has been applied more widely than accounting data as the basis for detecting data error or fraud. Indeed, the potential applications of the law are wide. For instance, the law has been identified as relevant to the interrogation of design efficiency (Hamming, 1970 andKnuth, 1981 in Scott and Fasli, 2001), the examination of authenticity of mathematical models (Varian, 1972 in Scott and Fasli;Nigrini, 1996), assessment of the validity of research results (Matthews, 1999: 26) and the examination of data storage and data management efficiency (Nigrini, 1999). Moreover, as noted in Section 2, the tool also is applicable as an instrument for detecting fraud in claims (such as insurance claims and expense account claims), payments (bank payments and payroll disbursements) and tax fraud (income declarations and expense claims). However, constraints of time confine the extant study to a consideration of accounting data problems amongst listed firms. Broader, and more detailed, studies of Benford's Law should address these limitations.

Conclusion
Over the past decade, the frequency of accounting data error and fraud has increased in the international and domestic settings. The adverse economic effects of these data problems are considered to be material. For this reason, broad-based efforts are being made by the accounting and auditing professions and regulatory authorities to reduce the incidence of data error and fraud. However, even in a world where recording and reporting of data is potentially error free, elements of human behaviour (such as greed and deceit) will linger on, causing data error and fraud to persist. Moreover, the pace at which progress in accounting, auditing and regulatory advances are being made is slow. For these reasons, error and fraud detection instruments are likely to remain important instruments in the toolkits of auditors, shareholders, financial analysts, investment managers, private investors and other users of publicly reported accounting data, such as the revenue services. One such potential tool is Benford's Law. However, whilst the potential effectiveness of the law has been established in the international literature, the domestic research environment is silent on the topic.
Accordingly, this paper examines the potential effectiveness of Benford's Law in the detection of data error and fraud in a South African setting. To examine the case, a simple regression tool is applied to data generated by a set of 34 companies listed on the JSE. For the sake of the study, the test sample consists of data drawn from an equal number of so-called 'errant' and 'compliant' companies. The results of the study are convincing, with the tool correctly failing all 17 of the 'errant' companies; three of the 17 'compliant' companies fail the test. Despite the incidence of false-positive results, the number is considered to be sufficiently small (11.2 percent of the full sample) to conclude that Benford's Law has the capacity to serve as an effective indicator of data problems in accounting information. Moreover, under test conditions that are broader than the a priori conditions that were set, the success rate of the test climbs to 97.1 percent. Further, whilst the study has some limitations, none of these is considered to be sufficient to challenge the basic result: Benford's Law has the potential to act as a highly effective detector of data error or fraud in accounting information.  Floyd (2003), where comment is made on the growing incidence of accounting irregularities amongst listed firms. 4 It ought to be noted that the validity of Benford's (1938) findings has been drawn into question by some researchers. For example, Scott and Fasli (2001: 5) note that Benford's claim that the tested data sets conformed to his law rested entirely on the apparent similarity of the numbers. To be sure, Benford made no attempt to test the goodness of fit of the data. However, this has not led to the rejection of Benford's Law. Rather, this shortcoming in Benford's work has led to the refinement of our understanding of the types of data to which the law applies (Scott and Fasli, 2001: 2). 5 In his paper, Newcomb (1881) also determined the probability of the ten second digits, independent of the first digits (Brookes, 2002: 1). 6 Hill (1995a) provides the exact formulas of the joint probability calculations. 7 Pinkham (1961) provided a key development in the understanding of Benford's Law by arguing that for any digit-distribution law to hold consistently, it would have to be scale invariant. Pinkham's (1961) proof was later extended by Hill (1995b). 8 See Raimi (1976) for an early review of the literature and Scott and Fasli (2002) for a more recent literature survey. Three main groups of explanations emerge from these literature surveys.

Endnotes
The first set argues that Benford's Law is due to the numbering system that we use to count upward through the natural numbers. The second group of mathematical explanations is based on the notion of 'randomness' and the central limit theorem. The third approach to deriving Benford's Law is termed 'ontological' because it asks: 'What form would a digit law take if such a law existed? ' Scott and Fasli (2001: 3-5) and Brookes (2002) offer comment in this regard. That aside, of these three approaches, the second remains the most widely accepted plausible explanation for conformity of a data set to Benford's Law (Scott and Fasli, 2001: 15). 9 It is noteworthy that the test statistics generated by Scott and Fasli (2001: 6) to interrogate Benford's (1938) results conform to Benford's Law. 10 The companies identified by the author that are commonly suspected or believed to have published false or fraudulent data were supplied by a group of ten investment brokers and managers representing five different financial services firms who dealt in listed companies over the reference period. 11 It is acknowledged that Ernst and Young's 'Excellence in Financial Reporting' is not intended by the authors to test or validate the authenticity (correctness) of the numbers reported in financial statements. Rather, in the absence of such a tool, the report is used here as a proxy for indicating the authenticity of reported accounting data. 12 To illustrate this point, all ending digits in earnings per share figures are expected to be distributed with equal probability. Further, first digit counts on financial ratios, such as return on equity or return on assets are, in many instances, likely to conform more closely to a binary distribution than to the distribution implied by Benford's Law. 13 It is interesting to note that at the 10.0 per cent level of significance and allowing for true values of β i to lie within three standard deviations of the estimated β i values all of the 17 'errant' companies continue to fail the test, whilst the number of falsepositive results declines to one. Thus, under this set of broader test criteria, the overall success rate of the test climbs to 97.1 per cent.