As more enterprises and consumers adopt AI-based applications, it becomes increasingly critical to ensure the responsibility and trustworthiness of artificial intelligence (AI). Government agencies and intergovernmental organizations are rapidly enacting regulations and publishing frameworks to drive increased oversight, accountability, and responsibility when using AI. For example, the EU AI Act requires the developers of high-risk AI systems to adhere to several regulations, including conducting model evaluations, assessing and mitigating systemic risk, and conducting adversarial testing (including testing of generative AI systems) to comply with transparency requirements.1 In the United States, the key principle of the AI Bill of Rights is algorithmic discrimination protection, according to which algorithms and systems should be used and designed equitably. This bill requires proactive equity assessments, the use of representative data, and ongoing disparity testing and mitigation.2
While there are several elements of AI governance frameworks, the auditing of algorithms that underlie AI applications is an essential and mandatory component. It is crucial to understand the inner workings of an AI system through these algorithmic audits that will provide us with insights into various aspects like training data, model development and training, and the underlying logic of AI systems. This information will enhance our confidence in the outputs of AI systems by verifying adherence to other principles of responsible AI.
Algorithmic Auditing and Challenges
Algorithmic auditing refers to the process of evaluating the functionality of machine learning (ML) applications, including the context and purpose of the machine to assess utility and fairness. These audits help systematically understand the emergence of bias at each step in the model-building process.3 Though algorithmic audit may be the latest buzzword in the IT audit world, the audits of data analytics and ML-based systems have been studied by government regulators for some time now. For example, the Australian Competition and Consumer Commission (ACCC) conducted an audit of a popular hotel search engine. They found that the algorithm unfairly favored hotels that paid higher commissions in its ranking system.4 The Federal Court ordered the company to pay penalties for their misleading representations about hotel room rates on its website and in television advertising, in proceedings brought by the ACCC. This reinforces the benefit and importance of auditing algorithms and its ability to uncover the logic driving the output of AI-based applications. Additionally, the recent slate of regulations has made algorithmic audits a key point of interest for AI practitioners.
The biggest obstacle to auditing algorithms is the lack of a mature framework that details the AI subprocesses on which to base the audits. The lack of widely adopted precedents for how to handle AI use cases is another challenge.5 Moreover, the specific techniques used by researchers and regulators are too varied (in terms of the technical aspects audited and the audit procedures being used) and correspond to a specific set of objectives. Furthermore, the absence of baseline controls could render the audit results inconsistent and unreliable.
Despite these challenges, it is important to note that the field of algorithmic auditing is still in its beginning stages. As digital professionals continue to refine their understanding of these tools, we can expect to see a refinement in the procedures and controls established. This will not only aid the consistency of audit results, but also bolster their reliability and usefulness. However, certain key control areas must be covered in algorithmic audits to be comprehensive and useful and deliver impactful, reliable results.
System Inputs: Data Controls
Data is the fuel that drives the models that underlie algorithms. In AI development, the significance of data is paramount at each phase; having access to the relevant datasets and creating the appropriate data pipelines is highly important.6 For this reason, the audit must address controls around various data sources such as data accuracy, preparation, and protection. In addition to data quality-related attributes, auditors must ensure that the datasets are diverse, inclusive, and representative of the user populations who will use the system, as the algorithms are only as good as their data pipelines. Auditors would need to collaborate with data engineers in the AI engineering team to better understand the sources of training data and steps taken to address privacy risk. Subsequently, auditors would need to assess the data preparation techniques employed to ensure that they did not alter the key properties of the data including completeness, accuracy, and other statistical attributes.
Adversarial AI Testing
Various regulations mandate that developers of high-risk and powerful AI systems must test their models for robustness using simulated adversarial attacks such as data poisoning or prompt injection. Adversarial testing refers to the stress testing of learning models and algorithms, conducted by feeding incorrect inputs in an attempt to mislead the algorithm and thus cause it to fail. Any algorithmic audit must include an assessment of how an organization approaches adversarial testing. There should be a separate policy that outlines the entire process of how adversarial testing of algorithms will be carried out to improve the security and accuracy of the models. The policy should mention details about what the tests will entail, who will execute the tests, and when in the development pipeline these tests will be executed. Subsequently, controls around the review of these test results, such as the fine-tuning of the models to remediate any anomalies and associated documentation, should be tested. One such test is an AI red teaming exercise. Red teaming in the cybersecurity community refers to a team of hacktivists, ethical hackers, or security researchers who simulate the activities of different types of adversaries, including nation-states and malicious insiders, and attempt to break the system in question. It is also a useful method to test the efficacy of the defenses of AI algorithms to be able to react to unexpected inputs. This is an example of a control that may be employed by many organizations after a US White House-sponsored event took place that aimed to bring together the AI community, break the models of top tech enterprises, and uncover novel risk.7
AI Model Monitoring
After AI applications are deployed in production, they need to be continuously monitored to identify any shift in model performance, accuracy, and precision of outputs or security and safety. Many third-party AI model monitoring tools track and report on a set of out-of-the-metrices. An algorithmic audit must involve a review of model performance-related parameters, an in-depth analysis of the observability processes implemented by the developers, and cover the following areas: the selection of metrics for monitoring, their relevance and suitability, their capability to promptly detect issues, and the procedures for triaging and rectifying any identified problems. Stanford University’s Holistic Evaluation of Language Models (HELM)8 approach for evaluating language models looks at metrics such as accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency and can be used as a reliable benchmark. The auditor must check that the AI model is continuously monitored for any general errors or issues by analyzing if the measurements, associated metrics, and thresholds are appropriate, relevant, and sufficient. The auditor must ensure that the monitoring mechanisms in place can detect any model drift or anomalies in the performance of the system.
AI Model Development
AI model development is a concept that resonates with IT audit professionals due to the apparent similarities with the traditional system development lifecycle (SDLC). An audit of the AI model development processes would involve a review of the general IT controls surrounding the AI system development lifecycle and change management processes. However, in the context of algorithm and model development, special emphasis must be placed on the extent and detail of the design specifications, testing, and approval requirements. The auditor must evaluate the design documentation and associated threat models to understand the purpose and logic of the algorithm and ensure that issues, including concerns around the impact of the system on its users and society, are addressed. The auditor should ensure that before deployment of the algorithms, appropriate validation procedures are executed - a series of functional tests that verify the various features and functionalities of the systems must be carried out with satisfactory results. Finally, the deployment of models to production should require approvals from the individual(s) with the appropriate level of authority to drive accountability—a peer approval following a code review would not suffice. This would be a key control to test in every audit to mitigate the risk that no unauthorized models or changes were deployed and the deployment was approved by appropriate management. Organizations that leverage third-party libraries or pre-trained models from external sources as a part of the development of algorithms should consider integrating a static scan with their continuous integration/continuous delivery (CI/CD) pipelines that help identify software vulnerabilities, and thereby, reduce the risk of model supply chain attacks.
Conclusion
Algorithm audits are an effective tool to help ensure the safety, fairness, explainability, and security of AI technologies. The recently published International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC) ISO/IEC 42001:2023 Information Technology — Artificial Intelligence — Management System standard contains a set of required controls that may be used by enterprises and auditors as a reference.9 The European Union’s delegated regulation about the rules for independent audits to assess compliance of very large online platforms10 and search engines with the Digital Services Act11contains several guidelines for the planning of algorithmic audits and example audit test procedures. The audit of AI algorithms is an evolving topic. As these practices mature and gain wider adoption, we can expect to see these practices become more commonly applied.
Endnotes
1 European Parliament, Artificial Intelligence Act: Deal on Comprehensive Rules for Trustworthy AI, European Union, 12 September 2023
2 The White House, Blueprint For An AI Bill Of Rights, USA, 4 October 2022
3 Kassir, S.; Algorithmic Auditing: The Key to Making Machine Learning in the Public Interest, Business of Government, 2020
4 Australian Competition and Consumer Commission (ACCC), “Trivago Misled Consumers About Hotel Room Rates,” 21 January 2020
5 ISACA, Auditing Artificial Intelligence, USA, 2018
6 McMullen, M.; “An Overview of the Role Data Plays in AI Development,” Data Science Central, 20 April 2023
7 The White House, “Red-Teaming Large Language Models to Identify Novel AI Risks,” USA, 29 August 2023
8 Bommasani, R.; Liang, P.; et al.; “Language Models are Changing AI: The Need for Holistic Evaluation,” Stanford University, California, USA 2020
9 ISO, “AI Management Systems: What Businesses Need to Know,” 31 January 2024
10 European Commission, Delegated Regulation on Independent Audits Under the Digital Services Act, European Union, 20 October 2023
11 European Commission, “The Digital Services Act Ensuring a Safe and Accountable Online Environment,” 2024
Varun Prasad, CISA, CISM, CCSK, CIPM, PMP
Senior manager of third-party attestation at BDO USA P.C.