<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1d1 20130915//EN" "http://jats.nlm.nih.gov/publishing/1.1d1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">SAJEMS</journal-id>
<journal-title-group>
<journal-title>South African Journal of Economic and Management Sciences</journal-title>
</journal-title-group>
<issn pub-type="ppub">1015-8812</issn>
<issn pub-type="epub">2222-3436</issn>
<publisher>
<publisher-name>AOSIS</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">SAJEMS-29-6348</article-id>
<article-id pub-id-type="doi">10.4102/sajems.v29i1.6348</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Note</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Examining different artificial intelligence models&#x2019; ability to pass Certificate of Theory in Accountancy-level tax questions</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-8212-2487</contrib-id>
<name>
<surname>Ram</surname>
<given-names>Asheer J.</given-names>
</name>
<xref ref-type="aff" rid="AF0001">1</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-3300-9664</contrib-id>
<name>
<surname>van Zijl</surname>
<given-names>Wayne</given-names>
</name>
<xref ref-type="aff" rid="AF0001">1</xref>
</contrib>
<aff id="AF0001"><label>1</label>Margo Steele School of Accountancy, Faculty of Commerce, Law and Management, University of the Witwatersrand, Johannesburg, South Africa</aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><bold>Corresponding author:</bold> Asheer Jaywant Ram, <email xlink:href="asheer.ram@wits.ac.za">asheer.ram@wits.ac.za</email></corresp>
</author-notes>
<pub-date pub-type="epub"><day>23</day><month>01</month><year>2026</year></pub-date>
<pub-date pub-type="collection"><year>2026</year></pub-date>
<volume>29</volume>
<issue>1</issue>
<elocation-id>6348</elocation-id>
<history>
<date date-type="received"><day>12</day><month>06</month><year>2025</year></date>
<date date-type="accepted"><day>20</day><month>11</month><year>2025</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2026. The Authors</copyright-statement>
<copyright-year>2026</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>Licensee: AOSIS. This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.</license-p>
</license>
</permissions>
<abstract>
<p>As Artificial Intelligence (AI) models become more sophisticated and entrenched in accountancy professions, this raises questions about their ability to outperform humans. This article is one of the first to examine the ability of five different AI models to pass professional tax examinations.</p>
<sec id="st1">
<title>Contribution</title>
<p>This article provides evidence about AI&#x2019;s current ability to support or replace tax practitioners. It provides a baseline to track the progress of different AI models as they evolve. Only Grok passed, while ChatGPT, Claude, CoPilot, and Gemini failed. Notably, the AI models provided persuasive answers despite being incorrect, negating their ability to replace tax practitioners.</p>
</sec>
</abstract>
<kwd-group>
<kwd>artificial intelligence</kwd>
<kwd>education</kwd>
<kwd>ChatGPT</kwd>
<kwd>Claude</kwd>
<kwd>Copilot</kwd>
<kwd>Gemini</kwd>
<kwd>Grok</kwd>
<kwd>taxation</kwd>
</kwd-group>
<funding-group>
<funding-statement><bold>Funding information</bold> This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.</funding-statement>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s0001">
<title>Introduction</title>
<p>Artificial intelligence (AI) is rapidly becoming commonplace in our lives. As part of this, different professions are grappling with the extent to which AI supports or replaces tasks historically carried out by their professionals. The accounting profession is no different, and a key area of professional advice stems from the tax practitioner community. Tax practitioners are often a necessity for businesses and individuals alike because of the complexity and constantly changing nature of tax. It is currently unclear to what extent different AI models can support or replace tax practitioners. This article addresses this gap by evaluating different AI models&#x2019; ability to cope with tax queries through a qualitative analysis.</p>
<p>Tax examination questions written by final-year Certificate of Theory in Accountancy (CTA) students studying for their Chartered Accountant (South Africa) (CA[SA]) designation were given to five different AI models. Chartered Accountant (South Africa) is a designation conferred by the South African Institute of Chartered Accountants (SAICA). The SAICA is regularly lauded as the number one chartered accountant institute in the world (SAICA <xref ref-type="bibr" rid="CIT0014">2023</xref>). Students who seek to become SAICA members need to complete an approved undergraduate degree and a postgraduate degree (commonly referred to as CTA) (Van Wyk <xref ref-type="bibr" rid="CIT0021">2011</xref>). As part of this, they need to complete advanced taxation studies to complete their CTA. The CTA examinations do not have any multiple-choice questions (MCQs) or true or false questions, as they focus on detailed calculations and interpretive-essay-type questions. This is to simulate what work one would perform as a tax practitioner.</p>
<p>To be a tax practitioner in South Africa, one must belong to a professional body that is a recognised controlling body by the South African Revenue Service (SARS <xref ref-type="bibr" rid="CIT0017">2024</xref>). The SAICA is one such body. As a result, there is an important link between tax practitioners and their education and one would expect that tax practitioners are able to appropriately handle CTA tax questions. Tax practitioners support businesses and natural persons in ensuring that they comply with tax laws and regulations. Any tax practitioner would also need to comply with relevant levels of knowledge and due care as per the professional requirements of their recognised controlling bodies.<xref ref-type="fn" rid="FN0001"><sup>1</sup></xref> This article asks the question: <italic>Can current AI models reliably provide tax advice at the level expected of professional tax practitioners?</italic></p>
<p>This article makes multiple important contributions. Firstly, the article evaluates how well different AI models are able to pass a real-world tax examination that requires different skill sets in comparison to students studying to become CA(SA)s. Secondly, the article sets an important benchmark that can be used in the future to evaluate the progress of AI models in providing tax-related business advice. Finally, the article provides important insights from running the experiment and builds on the limited non-United States (US) research evaluating AI&#x2019;s ability to support and/or replace accounting professionals.</p>
</sec>
<sec id="s0002">
<title>Literature review</title>
<sec id="s20003">
<title>Artificial intelligence models</title>
<p>Large language models (LLMs) are designed to simulate conversations, with users entering natural-language prompts and questions to receive human-like responses (Borger et al. <xref ref-type="bibr" rid="CIT0005">2023</xref>; Teubner et al. <xref ref-type="bibr" rid="CIT0020">2023</xref>). However, the response is not generated by a free-thinking human. It is generated using the probabilities and patterns inherent in the data used to train the respective AI models, reflecting the importance and impact of the training data provided. Different models use different approaches. For example, some focus on large quantities of unspecified data and often use the internet as a primary source (Roberts, Baker &#x0026; Andrew <xref ref-type="bibr" rid="CIT0013">2024</xref>). Others may follow a more focused strategy and selectively feed it information that facilitates its knowledge base in a particular field, discipline, or function. Regardless of the strategy, extensive neural networks act as a system for working through and making sense of the training data (Borger et al. <xref ref-type="bibr" rid="CIT0005">2023</xref>; Roberts et al. <xref ref-type="bibr" rid="CIT0013">2024</xref>).</p>
<p>An AI model&#x2019;s training will dictate its strengths and weaknesses. Artificial intelligence models trained in specific tasks will be better at those tasks. This comes at the expense of handling more general, perhaps conversational tasks. Other models are trained on a wide variety of information from different disciplines and functions. These models may perform better in what is known as zero-shot performance. This refers to an AI model&#x2019;s ability to be given an entirely new task in which it has not been specifically trained (Eulerich et al. <xref ref-type="bibr" rid="CIT0009">2024</xref>).</p>
<p>Considering the models used in this article, ChatGPT, being trained on OpenAI&#x2019;s proprietary data and web data, is considered more of a conversational AI well-suited for content creation. Gemini, being trained on Google-scale datasets (text, images, audio, video), appears to be aimed towards multimedia content creation, cross-modal analysis, and multimodal AI tasks. Claude, trained in public web content, open-source code, documentation, and books, is suitable for programming, in-depth scientific studies and lengthy essays. Copilot, trained on a vast amount of code from public repositories, is useful for generating efficient solutions to algorithmic problems (Jabbar, Ul Islam &#x0026; Boudjadar <xref ref-type="bibr" rid="CIT0010">2025</xref>). Grok, trained on X.com (formerly Twitter) data, is suited to analysing trends and interactions on social networks (De Carvalho Souza &#x0026; Weigang 2024).</p>
<p>Artificial intelligence models suffer from four key concerns. Firstly, the extent to which AI models hallucinate is considered. When AI models provide an answer despite lacking the requisite knowledge and capabilities to form one, the output may be a hallucination. This is a difficult area, as there is a trade-off. On the one hand, allowing AI models to be creative enhances their ability to solve problems in new ways, thereby leveraging the benefits AI can offer. On the other hand, creativity increases the likelihood that AI models will provide false answers, with no indication to the user that the response may be a hallucination or its degree of uncertainty or confidence in its answer (Dahl et al. <xref ref-type="bibr" rid="CIT0007">2024</xref>).</p>
<p>Secondly, AI models typically lack the ability to discern truth from untruth and bias (Roberts et al. <xref ref-type="bibr" rid="CIT0013">2024</xref>). The training material given to a model is, consequently, a significant factor in determining the credibility and potential bias of the AI model&#x2019;s output. Using general internet data as training material may, accordingly, have negative implications for a model&#x2019;s ability to provide credible and reliable solutions to queries in, for example, highly regulated disciplines. Thirdly, Roberts et al. (<xref ref-type="bibr" rid="CIT0013">2024</xref>) also raised the idea of AI models being &#x2018;overconfident&#x2019;. Said differently, AI models may <italic>want</italic> to provide you with an answer, whether or not they can do so. This may result in hallucination to satisfy the user&#x2019;s request. Finally, Borger et al. (<xref ref-type="bibr" rid="CIT0005">2023</xref>) and Roberts et al. (<xref ref-type="bibr" rid="CIT0013">2024</xref>) raise important concerns about an AI model&#x2019;s likelihood of reinforcing stereotypes, especially when trained on biased data and biased users accept these outputs.</p>
</sec>
<sec id="s20004">
<title>Research on the use of artificial intelligence models to answer assessments</title>
<p>There is limited research into AI models&#x2019; ability to answer formal assessments. Many focus only on MCQs or on a single AI model, leaving two major gaps: Firstly, AI&#x2019;s ability to address essay and discipline-specific assessments and secondly, there is no comparison of different AI models&#x2019; ability to handle the same assessments that evaluate different types of assessment components.</p>
<p>In one of the first articles to consider the efficacy of AI in answering accounting assessment questions, Bommarito et al. (<xref ref-type="bibr" rid="CIT0004">2023</xref>) evaluated the performance of OpenAI&#x2019;s earlier versions of ChatGPT and Text-davinci-003 on the uniform CPA examination written in the US. Text-davinci-003 achieved a correct rate of 14.4&#x0025; on Regulation examination questions, underperforming human capabilities on quantitative reasoning in zero-shot prompts. The model achieved human-level performance in remembering, understanding, and application, answering 57.6&#x0025; of questions correctly. ChatGPT 3 outperformed the text-davinci models.</p>
<p>Wood et al. (<xref ref-type="bibr" rid="CIT0022">2023</xref>), also with a US focus, evaluated ChatGPT 3&#x2019;s performance on accounting questions in comparison to students&#x2019; performance. ChatGPT achieved an average of 56.5&#x0025; with partial credit<xref ref-type="fn" rid="FN0002"><sup>2</sup></xref>, significantly underperforming in comparison to students&#x2019; average score of 76.7&#x0025;. ChatGPT performed better on true or false (68.7&#x0025;) and MCQs (59.5&#x0025;) but struggled with calculation (28.7&#x0025;) and short-answer questions (39.1&#x0025;). ChatGPT performed relatively well in accounting information systems and auditing assessments but had lower accuracy in taxation, financial, and managerial accounting questions. Higher-order learning questions posed a challenge to ChatGPT. Notably, &#x2018;ChatGPT struggled to handle long, written questions with multiple parts, even when allowing for &#x201C;carry over&#x201D; mistakes&#x2019; (Wood et al. <xref ref-type="bibr" rid="CIT0022">2023</xref>:15). ChatGPT often &#x2018;made up&#x2019; facts and provided descriptive explanations for its answers, even if incorrect, which can easily, but incorrectly, convince AI users of AI&#x2019;s correctness.</p>
<p>Atanasovski et al. (<xref ref-type="bibr" rid="CIT0003">2023</xref>) explored the effectiveness of ChatGPT 3.5 in answering examination questions in accounting and auditing in North Macedonia. The research involved 11 subject examinations with a total of 401 questions. ChatGPT 3.5 successfully passed 8 out of the 11 subjects, achieving a pass rate of 73&#x0025;. For true or false questions, ChatGPT had a 65&#x0025; correct response rate. For MCQs with a single correct answer, ChatGPT 3.5 achieved a 72&#x0025; correct response rate, while for MCQs with multiple correct answers, ChatGPT 3.5 only achieved a 48&#x0025; correct response rate. This is similar to a study in Portugal where ChatGPT failed to pass the Portuguese Order of Chartered Accountants examination (Albuquerque &#x0026; Dos Santos <xref ref-type="bibr" rid="CIT0002">2024</xref>). In open-ended short questions, a strong performance was demonstrated with a 78&#x0025; correct response rate. However, for essay questions, ChatGPT 3.5 earned a 55&#x0025; score.</p>
<p>ChatGPT 3.5 excelled in subjects such as Principles of Accounting, Auditing, and Internal Auditing but struggled with Management Accounting II, Governmental Accounting, and International Accounting. The study concludes that while ChatGPT is proficient in qualitative questions and simpler MCQs, it faces challenges with quantitative calculations, complex MCQs and essay questions (Atanasovski et al. <xref ref-type="bibr" rid="CIT0003">2023</xref>). Given the nature of the CTA examination as discussed in the introduction, one would expect ChatGPT to struggle with passing.</p>
<p>Later, Eulerich et al. (<xref ref-type="bibr" rid="CIT0009">2024</xref>) examined ChatGPT&#x2019;s performance in the US CPA and other accounting certification examination (CMA, CIA, EA). They find that ChatGPT 3.5 scored an average of 53.1&#x0025; across all examinations, failing to pass any<xref ref-type="fn" rid="FN0003"><sup>3</sup></xref>. However, ChatGPT 4 improved its scores by 16.5&#x0025; after 10-shot training<xref ref-type="fn" rid="FN0004"><sup>4</sup></xref>; it achieved an average score of 85.1&#x0025;, passing all the areas. Their work concurs with that of Bommarito et al. (<xref ref-type="bibr" rid="CIT0004">2023</xref>), evidencing that more recent versions of ChatGPT perform better than their older counterparts. However, zero-shot attempts still showed issues with passing examinations.</p>
<p>Following a legal perspective, Katz et al. (<xref ref-type="bibr" rid="CIT0011">2024</xref>) found that ChatGPT 4 substantially outperformed human students and previous ChatGPT models in the US Bar examinations. ChatGPT achieves roughly 297 points, well above the passing threshold for all universal Bar jurisdictions. On the multistate Bar examination, which consists solely of MCQs, ChatGPT 4 achieves a 75.7&#x0025; accuracy rate, outperforming the average human test-taker by more than 7&#x0025; and demonstrating a 26&#x0025; increase over previous ChatGPT versions. On the multistate essay examination, ChatGPT 4 scored 4.2 out of 6 points, with ChatGPT 3 scoring 3 out of 6 points. A passing grade is considered 4 out of 6 points, which indicates that ChatGPT 4 was able to marginally pass the multistate essay examination. The decrease in its score from the MCQs is noticeable, indicating once more that essay- and discussion-type questions prove more challenging for the AI model.</p>
<p>Cheng et al. (<xref ref-type="bibr" rid="CIT0006">2024</xref>) explored the capabilities of ChatGPT 3.5 and 4 to answer educational accounting cases in the US. They found that ChatGPT 4 performs better than ChatGPT 3.5, especially in tasks requiring explanation, application of rules, and ethical evaluation. However, both ChatGPT 3.5 and ChatGPT 4 struggled with tasks requiring financial statement creation, journal entries, or software use. As indicated in the previous studies, much of the research has been centred in the US (Bommarito et al. <xref ref-type="bibr" rid="CIT0004">2023</xref>; Eulerich et al. 2023; Katz et al. <xref ref-type="bibr" rid="CIT0011">2024</xref>; Wood et al. <xref ref-type="bibr" rid="CIT0022">2023</xref>).</p>
<p>Pinto et al. (<xref ref-type="bibr" rid="CIT0012">2024</xref>) compare and contrast different AI models. They evaluate the performance of ChatGPT 3.5, ChatGPT 4, and Gemini in the Portuguese Chartered Accountant Examination. This examination consists solely of MCQs. With an average accuracy for tax questions of 48&#x0025;, ChatGPT 4 outperformed Gemini (38&#x0025;) and ChatGPT 3.5 (36&#x0025;). All these models failed, reinforcing the findings of Albuquerque and Dos Santos (<xref ref-type="bibr" rid="CIT0002">2024</xref>), who found that ChatGPT struggled with tax questions where judgement was required. The AI models struggled most with management and financial accounting questions, but performed better in taxation and ethics. Interestingly, this result in respect of taxation is contrary to Wood et al. (<xref ref-type="bibr" rid="CIT0022">2023</xref>). In further contrast, the USA&#x2019;s tax system (22nd most complex out of 64 countries) is regarded as being less complex than that of Portugal (17th most complex out of 64 countries) (Tax Complexity Index <xref ref-type="bibr" rid="CIT0019">2022</xref>). Consequently, one would expect the AI models to perform better in the US than in Portugal. However, the Portuguese examination, consisting only of MCQs (Albuquerque &#x0026; Dos Santos <xref ref-type="bibr" rid="CIT0002">2024</xref>; Pinto et al. <xref ref-type="bibr" rid="CIT0012">2024</xref>), demonstrates, again, that ChatGPT performed better in the MCQs compared to the written question considered by Wood et al. (<xref ref-type="bibr" rid="CIT0022">2023</xref>). As the South African CTA tax examinations do not include any MCQs, it is expected that these models will struggle with the written and complex calculations required. These studies clearly demonstrate that, while AI models show potential, they are not yet capable of consistently passing rigorous professional examinations without further training and improvement.</p>
<p>Most existing studies have focused on MCQ questions or assessments that include both essays and MCQs. The South African approach to CTA tax is very different, with no MCQs used at all. Instead, higher-order thinking and critical evaluation skills via complex calculation and application-type questions are used. Accordingly, this article contributes to the existing literature and evaluates different AI models&#x2019; ability to handle different types of questions. How this was achieved is discussed next.</p>
</sec>
</sec>
<sec id="s0005">
<title>Method</title>
<p>The article takes an exploratory descriptive empirical research approach to assessing the tax capabilities of the AI models. The article posed one discussion theory-based tax question and one numerical tax question to five different AI models. The purpose was to assess their ability to pass a final-year exit-level examination at a SAICA-accredited South African university with a zero-shot prompt. Because of the lengthy limitations of most free AI models (AI for Education <xref ref-type="bibr" rid="CIT0001">2025</xref>), the paid versions of ChatGPT 4, Claude, Copilot, Gemini, and Grok were used. This also ensures that the AI models provide their best possible answers to examination questions. In addition, the latest AI models as of June 2025 were used. The article did not opt to use older models that have been assessed in prior articles, as the purpose is not to assess any particular version but rather the latest and best AI capabilities at present.</p>
<p>The only prompts given to each AI were the purpose of the study, the scenario, and the two tasks required to be completed (as given to 2024 students). The question paper with the scenario information and the <xref ref-type="table" rid="T0001">Table 1</xref> tasks was uploaded to each AI model. There was no attempt to alter or amend the information or tasks to possibly assist readability by the AI models. This is a first zero-shot look at the ability of the AI models to respond to these tax questions with no training or prior preparation, to better simulate how one may use an AI model to replace a tax practitioner.</p>
<table-wrap id="T0001">
<label>TABLE 1</label>
<caption><p>Tasks required to be performed provided to the artificial intelligence models.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left">Number</th>
<th valign="top" align="left">Description</th>
<th valign="top" align="center">Marks</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><bold>1.</bold></td>
<td align="left"><bold>Refer to the section titled &#x2018;The case of Melusi Gwabe&#x2019;.</bold><break/><break/>You are required to assist O&#x2019;lerato Gwabe with the determination of the Estate Duty payable amount in respect of her late father, Melusi Gwabe.<break/><break/>Using only the information provided in the scenario, draft a list of questions which you would need to ask O&#x2019;lerato Gwabe to accurately and completely assist with completing the Estate Duty payable calculation. For each question, provide a reason as to why you are asking that question.<break/><break/>Supporting calculations are not required. References to any relevant legislation are not required.</td>
<td align="center">12</td>
</tr>
<tr>
<td align="left"><bold>2.</bold></td>
<td align="left"><bold>Refer to the section titled &#x2018;The case of Melusi Gwabe&#x2019;.</bold><break/><break/>Assume that the bank account had a balance of R428 000 invested, the BMW 1-series had a market value of R380 000 and the shares in the unlisted company had a market value of R240 000 on the date of Melusi&#x2019;s death.<break/><break/>Furthermore, assume the following:
<list list-type="bullet">
<list-item><p>The funds in the bank account and the BMW 1-series were bequeathed to Zinhle Gwabe,</p></list-item>
<list-item><p>The unlisted company shares were bequeathed to O&#x2019;lerato Gwabe,</p></list-item>
<list-item><p>All the transfers took place in February 2024.</p></list-item>
</list>Calculate the taxable income of the deceased estate of the late Melusi Gwabe for the 2024 year of assessment.<break/><break/>Provide reasons for nil amounts.</td>
<td align="center">11</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Given the localised nature of the tax examination information and tasks provided (<xref ref-type="table" rid="T0001">Table 1</xref>) and that this was an in-person written examination, along with the fact that this examination was set new and never used previously, it was not considered a risk that any of the data had leaked into the training data used in the AI models. Considering this, no contamination checks were run on the AI models (see Katz et al. <xref ref-type="bibr" rid="CIT0011">2024</xref>).</p>
<p>The same solution as that used for the 2024 students was used, and the same marker of the 2024 students graded each AI model&#x2019;s answer. This enhances the validity of the comparison between the AI&#x2019;s marks and the marks of the students. The specific tasks provided to each AI model are detailed in <xref ref-type="table" rid="T0001">Table 1</xref>.</p>
<p>A qualitative approach is taken to assess the quantitative results of the AI models in responding to <xref ref-type="table" rid="T0001">Table 1</xref> tax tasks. This research approach is consistent with other AI studies in this area (Atanasovski et al. <xref ref-type="bibr" rid="CIT0003">2023</xref>; Bommarito et al. <xref ref-type="bibr" rid="CIT0004">2023</xref>; Eulerich et al. 2023; Wood et al. <xref ref-type="bibr" rid="CIT0022">2023</xref>). As an exploratory study, the aim is not to present a generalisable positivist conclusion, but to provide insight into the real-world ability of AI models to handle tax questions.</p>
</sec>
<sec id="s0006">
<title>Results</title>
<p><xref ref-type="table" rid="T0002">Table 2</xref> presents the results. Notably, four of the five models failed overall<xref ref-type="fn" rid="FN0005"><sup>5</sup></xref>, with Grok standing out by passing with 56.52&#x0025; (and exceeding the next best mark by 17.39&#x0025; points). As most AI models are LLMs, the intuitive expectation would be that the AI models would perform better in the discussion questions. This was not generally the case and supports the findings of Wood et al. (<xref ref-type="bibr" rid="CIT0022">2023</xref>) and Atanasovski et al. (<xref ref-type="bibr" rid="CIT0003">2023</xref>). Our results reflect mixed outcomes, with three models performing better in the calculation-style questions and the other two achieving better discussion question results.</p>
<table-wrap id="T0002">
<label>TABLE 2</label>
<caption><p>Results from the artificial intelligence models.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" align="left">AI model (in alphabetical order)</th>
<th valign="top" align="center">Discussion (question 1) score achieved (&#x0025;)</th>
<th valign="top" align="center">Calculation (question 2) score achieved (&#x0025;)</th>
<th valign="top" align="center">Total for <xref ref-type="table" rid="T0001">Table 1</xref> tasks (&#x0025;)</th>
<th valign="top" align="center">Deviation of the total percentage from the student average</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">ChatGPT 4</td>
<td align="center">16.67</td>
<td align="center">27.27<xref ref-type="table-fn" rid="TFN0001">&#x2020;</xref></td>
<td align="center">21.74</td>
<td align="center">&#x2212;41.26</td>
</tr>
<tr>
<td align="left">Claude</td>
<td align="center">50.00<xref ref-type="table-fn" rid="TFN0001">&#x2020;</xref></td>
<td align="center">27.27</td>
<td align="center">39.13</td>
<td align="center">&#x2212;23.87</td>
</tr>
<tr>
<td align="left">Copilot</td>
<td align="center">16.67</td>
<td align="center">18.18<xref ref-type="table-fn" rid="TFN0001">&#x2020;</xref></td>
<td align="center">17.39</td>
<td align="center">&#x2212;45.61</td>
</tr>
<tr>
<td align="left">Gemini</td>
<td align="center">33.33<xref ref-type="table-fn" rid="TFN0001">&#x2020;</xref></td>
<td align="center">18.18</td>
<td align="center">26.09</td>
<td align="center">&#x2212;6.91</td>
</tr>
<tr>
<td align="left">Grok</td>
<td align="center">50.00</td>
<td align="center">63.64<xref ref-type="table-fn" rid="TFN0001">&#x2020;</xref></td>
<td align="center">56.52</td>
<td align="center">&#x2212;6.48</td>
</tr>
<tr>
<td align="left">Student average</td>
<td align="center">63.06</td>
<td align="center">62.94</td>
<td align="center">63.00</td>
<td align="center">-</td>
</tr>
<tr>
<td align="left">Student pass rate</td>
<td align="center">75.86</td>
<td align="center">72.80</td>
<td align="center">74.33</td>
<td align="center">-</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>AI, artificial intelligence.</p></fn>
<fn id="TFN0001"><label>&#x2020;</label><p>, Values show the highest component (discussion or calculation) for each AI model.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>The students&#x2019; pass rates of 75.86&#x0025; and 72.80&#x0025;, respectively, for the discussion (question 1) and calculation (question 2) questions are high. This suggests that your average entry-level tax practitioner should be able to cope with them relatively easily. Similarly, the students&#x2019; average mark for both questions was approximately 63&#x0025;. Given the intensive training and knowledge required for tax practitioners, as discussed in the introduction, this reinforces that the questions posed to the AI models were reasonable and not so complex as to indicate that only very experienced tax practitioners would be able to cope with them.</p>
<p>In the discussion component, nothing stood out between the AI models&#x2019; quality of language and that of the students. Both were fairly professional and on topic. A key insight gained is that where Grok simply supplied its answer, all other AI models sought to convince the reader that their answer was justified and correct, supporting the findings of Wood et al. (<xref ref-type="bibr" rid="CIT0022">2023</xref>). The important message this finding sends is that these AI models make it particularly difficult for users to gauge the credibility of their advice and solutions. In contrast, a critical distinction exists in professional practice. While tax practitioners may provide exploratory explanations during initial client consultations, professional conduct codes (as discussed in the introduction) require that formal tax advice (such as tax return positions or formal opinions) only be provided when the practitioner has sufficient knowledge and certainty. Artificial intelligence models do not make this contextual distinction, providing equally persuasive answers regardless of whether they represent exploratory discussion or formal advice. Said differently, it would take an up-to-date expert to determine whether these AI models&#x2019; answers are correct, negating their value as providers of tax support and advice.</p>
<p>A noticeable issue between AI models&#x2019; responses and those of students is that, typically, students who do not know the answer will have short, if any, solutions provided. Artificial intelligence models provided full solutions, despite the marks clearly indicating that they did not have the requisite knowledge and capabilities to correctly provide solutions. This can be linked to current concerns about AI models&#x2019; hallucinations, reducing their value for tasks where correct answers, without any creativity on the AI model&#x2019;s part, are required (Dahl et al. <xref ref-type="bibr" rid="CIT0007">2024</xref>; Roberts et al. <xref ref-type="bibr" rid="CIT0013">2024</xref>). This may suggest that, should businesses and individuals want AI models that can support or replace tax advice, AI models with limited scope for creativity may be preferable.</p>
<p>The results, in a tax sense, align with Wood et al. (<xref ref-type="bibr" rid="CIT0022">2023</xref>) but are contrary to Pinto et al. (<xref ref-type="bibr" rid="CIT0012">2024</xref>). This raises the point that the AI models may be able to do better with respect to tax in certain jurisdictions compared with others. South Africa&#x2019;s tax system is ranked as the 45th most complex out of 64 countries<xref ref-type="fn" rid="FN0006"><sup>6</sup></xref> (Tax Complexity Index <xref ref-type="bibr" rid="CIT0019">2022</xref>), which is a lower complexity than the tax systems of Portugal (17th most complex out of 64 countries) (Tax Complexity Index <xref ref-type="bibr" rid="CIT0019">2022</xref>) (see Pinto et al. <xref ref-type="bibr" rid="CIT0012">2024</xref>) and the US (22nd most complex out of 64 countries) (Tax Complexity Index <xref ref-type="bibr" rid="CIT0019">2022</xref>) (see Wood et al. <xref ref-type="bibr" rid="CIT0022">2023</xref>). Accordingly, one would have expected the AI models to perform better given the lower SA tax complexity, but this does not seem to be the case, suggesting jurisdiction-specific training data used in the AI models or complexity interpretation issues. Furthermore, Pinto et al. (<xref ref-type="bibr" rid="CIT0012">2024</xref>) showed that ChatGPT outperformed Gemini, contrary to this study, which found that Gemini outperforms ChatGPT by 4.35&#x0025; points. Given the mixed nature of these findings, what is clear is that no AI model can confidently be used to support or replace tax practitioners at present.</p>
</sec>
<sec id="s0007">
<title>Conclusion</title>
<p>Overall, the results, especially in the light of other studies&#x2019; findings, point to the fact that AI models are still untrustworthy as far as providing tax support and advice is concerned. They may be useful when used as a type of &#x2018;sounding board&#x2019; to facilitate knowledgeable tax practitioners&#x2019; own reasoning and decision-making rather than prescribing absolute solutions. But they cannot replace or be used in place of tax practitioners and consultants.</p>
<p>Interestingly, Grok was at least able to pass both questions, indicating that AI has the potential to play a significant role in tax practitioners&#x2019; work. Coupled with the rapid advancement of AI (Teubner et al. <xref ref-type="bibr" rid="CIT0020">2023</xref>), this study should be repeated regularly to track the progress of AI models and to provide empirical, comparable observations about the ability of AI models to perform as tax practitioners. This information should inform regulators, universities, and students&#x2019; planning so that they stay relevant and efficient while maintaining high standards of tax advice and support.</p>
<p>This article finds that AI models&#x2019; persuasiveness and creativity are key risks and areas that require more research. In addition, regulators and universities need to track AI&#x2019;s performance to design acceptable uses for AI that take advantage of its efficiencies yet protect the integrity of the profession. Research is also required to better understand the impact of different countries&#x2019; tax complexity on AI&#x2019;s capabilities.</p>
<p>This study has some limitations. A sample size of only two tasks was used, and a single marker was used (although it was consistent with who marked the students). The different AI models were used at a specific time, and there may be changes to these over time. As an exploratory study grounded in a South African study, there are limitations to the generalisability of these results.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<sec id="s20008" sec-type="COI-statement">
<title>Competing interests</title>
<p>The authors declare that they have no financial or personal relationships that may have inappropriately influenced them in writing this article. The author, Wayne van Zijl, serves as an editorial board member of this journal. Wayne van Zijl has no other competing interests to declare.</p>
</sec>
<sec id="s20009">
<title>CRediT authorship contribution</title>
<p>Asheer J. Ram: Methodology, formal analysis, investigation, Writing-original draft, resources and writing &#x2013; review &#x0026; editing. Wayne van Zijl: Conceptualisation, formal analysis, investigation, Writing-original draft, visualisation and writing &#x2013; review &#x0026; editing. All authors reviewed the article, contributed to the discussion of results, approved the final version for submission and publication, and take responsibility for the integrity of its findings.</p>
</sec>
<sec id="s20010" sec-type="data-availability">
<title>Data availability</title>
<p>The authors declare that all data that support this research article and findings are available in the article and its references.</p>
</sec>
<sec id="s20011">
<title>Disclaimer</title>
<p>The views and opinions expressed in this article are those of the authors and are the product of professional research. They do not necessarily reflect the official policy or position of any affiliated institution, funder, agency or that of the publisher. The authors are responsible for this article&#x2019;s results, findings and content.</p>
</sec>
</ack>
<ref-list id="references">
<title>References</title>
<ref id="CIT0001"><mixed-citation publication-type="journal"><person-group person-group-type="author"><collab>AI for Education</collab></person-group>, <year>2025</year>, <source><italic>AI model comparison: Free vs Paid tiers</italic></source>, <comment>viewed 26 October 2025, from <ext-link ext-link-type="uri" xlink:href="https://www.aiforeducation.io/ai-resources/ai-model-comparison-free-vs-paid-tiers">https://www.aiforeducation.io/ai-resources/ai-model-comparison-free-vs-paid-tiers</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0002"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Albuquerque</surname>, <given-names>F</given-names></string-name>. &#x0026; <string-name><surname>Dos Santos</surname>, <given-names>P.G</given-names></string-name></person-group>., <year>2024</year>, &#x2018;<article-title>Can ChatGPT be a certified accountant? Assessing the responses of ChatGPT for the professional access exam in Portugal</article-title>&#x2019;, <source><italic>Administrative Sciences</italic></source> <volume>14</volume>(<issue>7</issue>), <fpage>1</fpage>&#x2013;<lpage>15</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/admsci14070152">https://doi.org/10.3390/admsci14070152</ext-link></comment></mixed-citation></ref>
<ref id="CIT0003"><mixed-citation publication-type="conference"><person-group person-group-type="author"><string-name><surname>Atanasovski</surname>, <given-names>A</given-names></string-name>., <string-name><surname>Tocev</surname>, <given-names>T</given-names></string-name>., <string-name><surname>Dionisijev</surname>, <given-names>I</given-names></string-name>., <string-name><surname>Minovski</surname>, <given-names>Z</given-names></string-name>. &#x0026; <string-name><surname>Jovevski</surname>, <given-names>D</given-names></string-name></person-group>., <year>2023</year>, &#x2018;<article-title>Evaluating the performance of ChatGPT in accounting and auditing exams: An experimental study in North Macedonia</article-title>&#x2019;, in <person-group person-group-type="editor"><string-name><given-names>M.</given-names> <surname>Trpeska</surname></string-name> (ed.)</person-group>, <conf-name>4th international scientific conference: Economic and Business Trends Shaping the Future, online conference proceedings</conf-name>, <conf-loc>Skopje, North Macedonia</conf-loc>, <conf-date>November 9&#x2013;10, 2023</conf-date>, pp. <fpage>40</fpage>&#x2013;<lpage>50</lpage>, <comment>viewed n.d., from <ext-link ext-link-type="uri" xlink:href="http://hdl.handle.net/20.500.12188/28871">http://hdl.handle.net/20.500.12188/28871</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0004"><mixed-citation publication-type="web"><person-group person-group-type="author"><string-name><surname>Bommarito</surname>, <given-names>J</given-names></string-name>., <string-name><surname>Bommarito</surname>, <given-names>M</given-names></string-name>., <string-name><surname>Katz</surname>, <given-names>D.M</given-names></string-name>. &#x0026; <string-name><surname>Katz</surname>, <given-names>J</given-names></string-name></person-group>., <year>2023</year>, &#x2018;<article-title>GPT as knowledge worker: A zero-shot evaluation of (AI) CPA capabilities</article-title>&#x2019;, <comment>Report, arXiv. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2301.04408">https://doi.org/10.48550/arXiv.2301.04408</ext-link></comment></mixed-citation></ref>
<ref id="CIT0005"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Borger</surname>, <given-names>J.G</given-names></string-name>., <string-name><surname>Ng</surname>, <given-names>A.P</given-names></string-name>., <string-name><surname>Anderton</surname>, <given-names>H</given-names></string-name>., <string-name><surname>Ashdown</surname>, <given-names>G.W</given-names></string-name>., <string-name><surname>Auld</surname>, <given-names>M</given-names></string-name>., <string-name><surname>Blewitt</surname>, <given-names>M.E</given-names></string-name>. <etal>et al</etal></person-group>., <year>2023</year>, &#x2018;<article-title>Artificial intelligence takes center stage: Exploring the capabilities and implications of ChatGPT and other AI-assisted technologies in scientific research and education</article-title>&#x2019;, <source><italic>Immunology &#x0026; Cell Biology</italic></source> <volume>101</volume>(<issue>10</issue>), <fpage>923</fpage>&#x2013;<lpage>935</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/imcb.12689">https://doi.org/10.1111/imcb.12689</ext-link></comment></mixed-citation></ref>
<ref id="CIT0006"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cheng</surname>, <given-names>X</given-names></string-name>., <string-name><surname>Dunn</surname>, <given-names>R</given-names></string-name>., <string-name><surname>Holt</surname>, <given-names>T</given-names></string-name>., <string-name><surname>Inger</surname>, <given-names>K</given-names></string-name>., <string-name><surname>Jenkins</surname>, <given-names>J.G</given-names></string-name>., <string-name><surname>Jones</surname>, <given-names>J</given-names></string-name>. <etal>et al</etal></person-group>., <year>2024</year>, &#x2018;<article-title>Artificial intelligence&#x2019;s capabilities, limitations, and impact on accounting education: Investigating ChatGPT&#x2019;s performance on educational accounting cases</article-title>&#x2019;, <source><italic>Issues in Accounting Education</italic></source> <volume>39</volume>(<issue>2</issue>), <fpage>23</fpage>&#x2013;<lpage>47</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.2308/ISSUES-2023-032">https://doi.org/10.2308/ISSUES-2023-032</ext-link></comment></mixed-citation></ref>
<ref id="CIT0007"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Dahl</surname>, <given-names>M</given-names></string-name>., <string-name><surname>Magesh</surname>, <given-names>V</given-names></string-name>., <string-name><surname>Suzgun</surname>, <given-names>M</given-names></string-name>. &#x0026; <string-name><surname>Ho</surname>, <given-names>D</given-names></string-name></person-group>., <year>2024</year>, &#x2018;<article-title>Large legal fictions: Profiling legal hallucinations in large language models</article-title>&#x2019;, <source><italic>Journal of Legal Analysis</italic></source> <volume>16</volume>(<issue>1</issue>), <fpage>64</fpage>&#x2013;<lpage>93</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/jla/laae003">https://doi.org/10.1093/jla/laae003</ext-link></comment></mixed-citation></ref>
<ref id="CIT0008"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>de Carvalho Souza</surname>, <given-names>M.E</given-names></string-name>. &#x0026; <string-name><surname>Weigang</surname>, <given-names>L</given-names></string-name></person-group>., <year>2025</year>, &#x2018;<chapter-title>Grok, Gemini, ChatGPT and DeepSeek: Comparison and applications in conversational artificial intelligence</chapter-title>&#x2019;, <source><italic>Report, Dept. of Computer Science</italic></source>, <publisher-name>University of Bras&#x00ED;lia</publisher-name>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.14885243">https://doi.org/10.5281/zenodo.14885243</ext-link></comment></mixed-citation></ref>
<ref id="CIT0009"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Eulerich</surname>, <given-names>M</given-names></string-name>., <string-name><surname>Sanatizadeh</surname>, <given-names>A</given-names></string-name>., <string-name><surname>Vakilzadeh</surname>, <given-names>H</given-names></string-name>. &#x0026; <string-name><surname>Wood</surname>, <given-names>D.A</given-names></string-name></person-group>., <year>2024</year>, &#x2018;<article-title>Is it all hype? ChatGPT&#x2019;s performance and disruptive potential in the accounting and auditing industries</article-title>&#x2019;, <source><italic>Review of Accounting Studies</italic></source> <volume>29</volume>(<issue>3</issue>), <fpage>2318</fpage>&#x2013;<lpage>2349</lpage>.</mixed-citation></ref>
<ref id="CIT0010"><mixed-citation publication-type="conference"><person-group person-group-type="author"><string-name><surname>Jabbar</surname>, <given-names>A</given-names></string-name>., <string-name><surname>Ul Islam</surname>, <given-names>S</given-names></string-name>. &#x0026; <string-name><surname>Boudjadar</surname>, <given-names>J</given-names></string-name></person-group>., <year>2025</year>, &#x2018;<article-title>A comparative review of LLM-based conversational systems: Insights from DeepSeek, ChatGPT, Gemini, Claude, and Copilot</article-title>&#x2019;, in <person-group person-group-type="editor"><collab>Institution of Engineering and Technology</collab> (ed.)</person-group>, <source><italic>International Conference on AI and the Digital Economy (CADE 2025)</italic></source>, <conf-name>Hybrid Conference</conf-name>, <conf-loc>Venice, Italy</conf-loc>, <conf-date>July 14&#x2013;16, 2025</conf-date>, pp. <fpage>167</fpage>&#x2013;<lpage>173</lpage>.</mixed-citation></ref>
<ref id="CIT0011"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Katz</surname>, <given-names>D.M</given-names></string-name>., <string-name><surname>Bommarito</surname>, <given-names>M.J</given-names></string-name>., <string-name><surname>Gao</surname>, <given-names>S</given-names></string-name>. &#x0026; <string-name><surname>Arredondo</surname>, <given-names>P</given-names></string-name></person-group>., <year>2024</year>, &#x2018;<article-title>GPT-4 passes the bar exam</article-title>&#x2019;, <source><italic>Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences</italic></source> <volume>382</volume>(<issue>2270</issue>), <fpage>1</fpage>&#x2013;<lpage>17</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1098/rsta.2023.0254">https://doi.org/10.1098/rsta.2023.0254</ext-link></comment></mixed-citation></ref>
<ref id="CIT0012"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Pinto</surname>, <given-names>A.S</given-names></string-name>., <string-name><surname>Abreu</surname>, <given-names>A</given-names></string-name>., <string-name><surname>Costa</surname>, <given-names>E</given-names></string-name>. &#x0026; <string-name><surname>Paiva</surname>, <given-names>J</given-names></string-name></person-group>., <year>2024</year>, &#x2018;<chapter-title>AI in accounting: Can AI models like ChatGPT and Gemini successfully pass the Portuguese chartered accountant exam?</chapter-title>&#x2019;, in <person-group person-group-type="editor"><string-name><given-names>A.</given-names> <surname>Abreu</surname></string-name>, <string-name><given-names>J.V.</given-names> <surname>Carvalho</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Mesquita</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Sousa Pinto</surname></string-name> &#x0026; <string-name><given-names>M.</given-names> <surname>Mendon&#x00E7;a Teixeira</surname></string-name> (eds.)</person-group>, <source><italic>BT &#x2013; Perspectives and trends in education and technology</italic></source>, pp. <fpage>429</fpage>&#x2013;<lpage>438</lpage>, <publisher-name>Springer Nature Switzerland</publisher-name>, <publisher-loc>Cham</publisher-loc>.</mixed-citation></ref>
<ref id="CIT0013"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Roberts</surname>, <given-names>J</given-names></string-name>., <string-name><surname>Baker</surname>, <given-names>M</given-names></string-name>. &#x0026; <string-name><surname>Andrew</surname>, <given-names>J</given-names></string-name></person-group>., <year>2024</year>, &#x2018;<article-title>Artificial intelligence and qualitative research: The promise and perils of large language model (LLM) &#x201C;assistance&#x201D;</article-title>&#x2019;, <source><italic>Critical Perspectives on Accounting</italic></source> <volume>99</volume>, <fpage>102722</fpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.cpa.2024.102722">https://doi.org/10.1016/j.cpa.2024.102722</ext-link></comment></mixed-citation></ref>
<ref id="CIT0014"><mixed-citation publication-type="journal"><person-group person-group-type="author"><collab>South African Institute of Chartered Accountants (SAICA)</collab></person-group>, <year>2023</year>, <source><italic>CA(SA) and SAICA are back to number 1 in the world</italic></source>, <comment>viewed 06 May 2025, from <ext-link ext-link-type="uri" xlink:href="https://www.saica.org.za/news/south-african-chartered-accountants-lead-in-global-trustworthiness">https://www.saica.org.za/news/south-african-chartered-accountants-lead-in-global-trustworthiness</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0015"><mixed-citation publication-type="journal"><person-group person-group-type="author"><collab>South African Institute of Chartered Accountants (SAICA)</collab></person-group>, <year>2024</year>, <source><italic>SAICA code of conduct</italic></source>, <comment>viewed 31 May 2025, from <ext-link ext-link-type="uri" xlink:href="https://www.saica.org.za/about/general/ethics/saica-code-of-conduct">https://www.saica.org.za/about/general/ethics/saica-code-of-conduct</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0016"><mixed-citation publication-type="journal"><person-group person-group-type="author"><collab>South African Institute of Taxation (SAIT)</collab></person-group>, <year>2025</year>, <source><italic>SAIT member code of conduct</italic></source>, <comment>viewed 31 May 2025, from <ext-link ext-link-type="uri" xlink:href="https://thesait.org.za/wp-content/uploads/2025/02/SAIT_code_of_ethics7.pdf">https://thesait.org.za/wp-content/uploads/2025/02/SAIT_code_of_ethics7.pdf</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0017"><mixed-citation publication-type="journal"><person-group person-group-type="author"><collab>South African Revenue Service (SARS)</collab></person-group>, <year>2024</year>, <source><italic>Register as a tax practitioner</italic></source>, <comment>viewed 26 October 2025, from <ext-link ext-link-type="uri" xlink:href="https://www.sars.gov.za/tax-practitioners/register-as-a-tax-practitioner/">https://www.sars.gov.za/tax-practitioners/register-as-a-tax-practitioner/</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0018"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Schneid</surname>, <given-names>S.D</given-names></string-name>., <string-name><surname>Armour</surname>, <given-names>C</given-names></string-name>. &#x0026; <string-name><surname>Brandl</surname>, <given-names>K</given-names></string-name></person-group>., <year>2025</year>, &#x2018;<article-title>Beyond right or wrong: How partial credit scoring on multiple-choice questions improves student performance and assessment perceptions</article-title>&#x2019;, <source><italic>British Journal of Clinical Pharmacology</italic></source>, <fpage>1</fpage>&#x2013;<lpage>7</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1002/bcp.70127">https://doi.org/10.1002/bcp.70127</ext-link></comment></mixed-citation></ref>
<ref id="CIT0019"><mixed-citation publication-type="journal"><person-group person-group-type="author"><collab>Tax Complexity Index</collab></person-group>, <year>2022</year>, <source><italic>Tax complexity index</italic></source>, <comment>viewed 08 May 2025, from <ext-link ext-link-type="uri" xlink:href="https://www.taxcomplexity.org/">https://www.taxcomplexity.org/</ext-link>.</comment></mixed-citation></ref>
<ref id="CIT0020"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Teubner</surname>, <given-names>T</given-names></string-name>., <string-name><surname>Flath</surname>, <given-names>C.M</given-names></string-name>., <string-name><surname>Weinhardt</surname>, <given-names>C</given-names></string-name>., <string-name><surname>Van der Aalst</surname>, <given-names>W</given-names></string-name>. &#x0026; <string-name><surname>Hinz</surname>, <given-names>O</given-names></string-name></person-group>., <year>2023</year>, &#x2018;<article-title>Welcome to the era of ChatGPT et al.</article-title>&#x2019;, <source><italic>Business &#x0026; Information Systems Engineering</italic></source> <volume>65</volume>, <fpage>95</fpage>&#x2013;<lpage>101</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/s12599-023-00795-x">https://doi.org/10.1007/s12599-023-00795-x</ext-link></comment></mixed-citation></ref>
<ref id="CIT0021"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Van Wyk</surname>, <given-names>E</given-names></string-name></person-group>., <year>2011</year>, &#x2018;<article-title>A note: The SAICA part I qualifying examinations: Factors that may influence candidates&#x2019; success</article-title>&#x2019;, <source><italic>South African Journal of Accounting Research</italic></source> <volume>25</volume>(<issue>1</issue>), <fpage>145</fpage>&#x2013;<lpage>174</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/10291954.2011.11435157">https://doi.org/10.1080/10291954.2011.11435157</ext-link></comment></mixed-citation></ref>
<ref id="CIT0022"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wood</surname>, <given-names>D.A</given-names></string-name>., <string-name><surname>Achhpilia</surname>, <given-names>M.P</given-names></string-name>., <string-name><surname>Adams</surname>, <given-names>M.T</given-names></string-name>., <string-name><surname>Aghazadeh</surname>, <given-names>S</given-names></string-name>., <string-name><surname>Akinyele</surname>, <given-names>K</given-names></string-name>., <string-name><surname>Akpan</surname>, <given-names>M</given-names></string-name>. <etal>et al</etal></person-group>, <year>2023</year>, &#x2018;<article-title>The ChatGPT artificial intelligence chatbot: How well does it answer accounting assessment questions?</article-title>&#x2019; <source><italic>Issues in Accounting Education</italic></source> <volume>38</volume>(<issue>4</issue>), <fpage>1</fpage>&#x2013;<lpage>28</lpage>. <comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.2308/ISSUES-2023-013">https://doi.org/10.2308/ISSUES-2023-013</ext-link></comment></mixed-citation></ref>
</ref-list>
<fn-group>
<fn><p><bold>How to cite this article:</bold> Ram, A.J. &#x0026; Van Zijl, W., 2026, &#x2018;Examining different artificial intelligence models&#x2019; ability to pass Certificate of Theory in Accountancy-level tax questions&#x2019;, <italic>South African Journal of Economic and Management Sciences</italic> 29(1), a6348. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.4102/sajems.v29i1.6348">https://doi.org/10.4102/sajems.v29i1.6348</ext-link></p></fn>
<fn id="FN0001"><label>1</label><p>The SAICA code of Professional Conduct (SAICA <xref ref-type="bibr" rid="CIT0015">2024</xref>) and the South African Institute of Taxation (SAIT) Code of Conduct (SAIT <xref ref-type="bibr" rid="CIT0016">2025</xref>) require professional competence and due care of members.</p></fn>
<fn id="FN0002"><label>2</label><p>Partial credit means that points are assigned to very close and moderately close answers in MCQ questions, as opposed to only awarding points to a fully correct answer (Schneid, Armour &#x0026; Brandl <xref ref-type="bibr" rid="CIT0018">2025</xref>).</p></fn>
<fn id="FN0003"><label>3</label><p>These examinations do not use a 50&#x0025; pass mark. Rather, they use thresholds for various sections and parts, and the 53.1&#x0025; does not meet the passing threshold in any section.</p></fn>
<fn id="FN0004"><label>4</label><p>Ten-shot training means that the model is trained ten times on a small dataset containing only a few examples. The aim is to assist the model in applying what it has learnt to more generalisable, unseen examples.</p></fn>
<fn id="FN0005"><label>5</label><p>Failed, in a South African university context, refers to where a mark of less than 50&#x0025; overall was achieved.</p></fn>
<fn id="FN0006"><label>6</label><p>Where a country ranked 1st represents the most complex tax system.</p></fn>
</fn-group>
</back>
</article>