Skip to content

Auditing AI systems


This is still Work In Progress (WIP)


  • Djalel Benbouzid (Volkswagen Group)
  • Laura Lucaj (Volkswagen Group)
  • Aljoscha Burchardt (DFKI)
  • Marc P. Hauer (TU Kaiserslautern)
  • Mihai Maftei (DFKI)
  • Iris Merget (DFKI)
  • Christiane Plociennik (DFKI)

Why auditing?

Algorithmic auditing of AI systems has gained a considerable recognition as an opportunity to harness the potential of AI models, as well as for detecting and mitigating the problematic patterns and consequences of their deployment in sensitive contexts such as healthcare, hiring or mobility, to name a few. Audit procedures for AI systems can provide an overview of the systems present and past performance and enable monitors to preemptively address, manage and mitigate potential risks.

How are audits conducted and what is missing in the field of auditing

Auditing has been a standard practice in numerous domains and industries already relying heavily on standardized practices to assess the quality and security of the systems developed, such as finance, aerospace or healthcare. Audits can be either conducted internally in the organization that develops AI systems or by an independent third-party. However, both methodologies carry their pros and cons as internal audits can enable a deeper access in addressing the potential negative impact of systems, but such practices are rarely disclosed to the public and lack transparency around the actual outcomes of the audit and the detection of weaknesses in the systems´s performance. On the other hand, often independent organizations face many challenges to get a proper overview of a system and therefore guaranteeing that the audit can really asses its impact.

For AI systems, the auditing practices that have been explored so far, mainly focus on specific phases of the pipeline of a system. For instance, audit trails, verification and bias testing, as well as explainable user interfaces are methods that can be part of an audit to enable understanding how a system works, to collect the necessary information to assess constantly its performance as well as unveiling its weaknesses. Templates to enable documentation practices, for the data collection phase or the model training, as well as comprehensively enabling information on the entire development process have been explored. Such practices are essential for an audit by enabling the creation of logs of information in order to understand the choices that were made in the development phase and understand their potential impact at a later deployment phase.

The practices delineated above often focus on specific phases and are insufficient to analyze the entire pipeline of the auditing of an ML system on their own. Thus, auditing complex ML models requires bridging the work of many stakeholders, predicting and addressing a variety of potential harms depending on the context. Hence, the debate on what actually constitutes an audit, what tools and documentation practices are necessary is still being investigated. Hence, a general procedure for bringing consistency on all the phases of the components of an ML model is missing.

How etami addresses auditing

Thus, effective, actionable and comprehensive methodologies to understand, address and mitigate the impact of AI systems remain under-investigated. Moreover, in light of the upcoming regulation in the European Union, the AI Act, developers lack operational guidance on the practices that can enable them to assess the compliance of their system with the law. To address such gap, within etami an auditing framework based on the lifecycle model is being developed that aims to address all the practices that can assess the compliance of a system with the upcoming regulation in the EU. The auditing framework of etami aims to enable organizations to adopt the necessary practices to assess the impact of their technology throughout its entire lifecycle in a continuous process that audits the system also post market entry.

Audit pilots within etami


Audit pilot process overview

The audit process starts with the mapping of the system´s lifecycle through the lifecycle model (LCM) developed by etami. Such passage is essential in providing an initial documentation of the systems´ phases and enable all different parties working together on the audit to understand the overall architecture of the system as well as the practices that need to be adopted at each phase in order to conduct the audit.

In order to align the audit process with the risk-based approach adopted by the European Commission, to regulate AI systems, etami´s audit introduces a risk analysis mapped around the different steps of the LCM. The risk-assessment methodology is based on the Assessment List for Trustworthy Artificial Intelligence (ALTAI), where many of the relevant questions to assess the risk and impact of a system are raised. In order to make the assessment list more operational, we mapped the questions to the different phases of the LCM, to guide the audit team in addressing the relevant questions raising at different stages of the LCM.

Procedure delineation

The audit procedure of etami is the result of an ongoing process of conducting pilots within the consortium. Therefore, the current state of the art does not aim to provide a “fit-all” solution, but rather illustrates the knowledge that has been developing through the pilots conducted within the consortium.


Lifecycle Model mapping

The planning phase consists into defining the phases and processes that have to be audited. Initially the audit team collaborates in defining a clear map of the lifecycle of the AI model analysed.

Risk assessment

The next step is to map to the lifecycle model the risk assessment procedure derived by the Assessment List for Trustworthy AI (ALTAI) published by the European Commission. This passage enables to initially identify the potential harmful impact of the system, by investigating the context of deployment carefully and assessing which risks can emerge at which stage of the lifecycle.

Audit team requirements and resource planning

This passage is fundamental in distributing tasks and responsibilities among the different stakeholders. Moreover, this step enables to define the timeline o the audit and to individuate which tools and documentation practices are needed throughout the lifecycle in order to understand if the team is able to address every step of the

This phase is optional, it can however, provide the audit team with initial evidence of the potential risks they need to unveil and understand how other companies dealt with them and what can be done to mitigate them.

Fieldwork and documentation phase

The audits usually start by defining the description of the system, where the different parties exchange documentation and information around the systems design and development details as well as the understanding of it´s exact intended purpose. Such phase is essential in defining the goal of the audit and delineate the necessary practices along the lifecycle. This phase is already challenging as currently no standardized practice is available in the field that enables an audit team to document each phase of the lifecycle of a system. Hence, there is a tremendous need to develop such processes and provide them to the developers early on in the lifecycle of the system, as compiling such documentation later in the pipeline can be challenging and would likely lead to a significant loss of information about important decisions made at specific stages that might impact the overall performance of a system at later stages.

Collection of evidence

The first step of this phase is the collection of evidence, where the necessary documentation is gathered and processes by the audit team to enable a deep understanding of how the system operated for its intended purpose.

Regulatory compliance assessment

This step enables the audit team to have a clear overview of the landscape of regulations within the markets where the system is deployed. Regulatory compliance is a fundamental process within an audit, as it can assess the impact of a system, it can address the risk and mitigate harmful consequences as well as producing the documentation necessary for the developing company to prove compliance with the law.

Compliance testing

This phase follows the collection of the necessary documentation along the development of the system to analyze the potential mis-alignment of the specifications set during the formalization of an ML system and their actual implementation. In this phase several challenges are addressed, such as, for instance, the insufficient documentation retrieved during the data collection and processing phase, the insufficient transparency and explainability on the exact impact of the system on the user, the quality-assessment measures developed on metrics that do not fit the context of deployment, to name a few. Such phase is essential in ensuring not only the quality of an ML model, but also assessing whether the model is fit for deployment in the chosen domain, which is one of the major goals of an audit procedure. In this phase the audit team verifies the definition of quality adopted by the developing team and assesses whether it is reasonable. Sometimes some systems are developed with the best intentions and yet the lack of processes to assess their fitness to the domain as well as the quality of the data they were trained on, can determine significant negative impact once deployed. Researchers might use some data thinking that it can add to the accuracy of the system such, but once deployed such data can determine the disproportional discrimination of individuals based on their gender or ethnic background.

Custom Testing

This phase enables the audit team to develop test batteries specifically tailored for the system and task that is being analysed. This step is highly dependent on the deployment context, the type of technology used, and the underlying data.


Such phase takes place at the end pf the development phase, where the results of the different tests as well as the implemented controls are collected and provide guidance on the necessary information that must be considered at later stages such as post-market analysis. The reporting phase is an ongoing procedure that takes place on a regular basis as the system might be deployed in a different context form the initial one or new feedback for the users and many other factors might determine the necessity to understand the potential negative impacts and re-iterate on adapting the mitigation measures.

Recommendations and mitigation measures

The auditee is provided with documentation on mitigating the risks that were unveiled throughout the audit as well as tools and practices to enable such actions. Moreover, recommendations are provided to enable the developing team to be aware of potential risks that might emerge upon deployment due to the sensitive contexts in which some AI systems operate.

Follow-up Schedule

As auditing is an iterative process throughout the entire lifecycle of an AI system, the audit team has to stay constantly in contact and meet regularly to monitor the performance of the algorithm and proactively work on risk mitigation.

Future testing preparation

This step is optional, depending on the complexity of the algorithm as well as the necessity to develop new practices to address the challenges that have emerged during the initial phases of the audit.


Akula, R., & Garibay, I. (2021). Audit and Assurance of AI Algorithms: A framework to ensure ethical algorithmic practices in Artificial Intelligence. arXiv preprint arXiv:2107.14046.

Arnold, M., Bellamy, R. K., Hind, M., Houde, S., Mehta, S., Mojsilović, A., … & Varshney, K. R. (2019). FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. IBM Journal of Research and Development, 63(⅘), 6-1.

Brundage, M., Avin, S., Wang, J., Belfield, H., Krueger, G., Hadfield, G., … & Anderljung, M. (2020). Toward trustworthy AI development: mechanisms for supporting verifiable claims. arXiv preprint arXiv:2004.07213.

Buolamwini, J., & Gebru, T. (2018, January). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency (pp. 77-91). PMLR.

Cheong, S. M., Sankaran, K., & Bastani, H. (2022). Artificial intelligence for climate change adaptation. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1459.

Dattner, B., Chamorro-Premuzic, T., Buchband, R., & Schettler, L. (2019). The legal and ethical implications of using AI in hiring. Harvard Business Review, 25.

Dattner, B., Chamorro-Premuzic, T., Buchband, R., & Schettler, L. (2019). The legal and ethical implications of using AI in hiring. Harvard Business Review, 25.

Eastwood, N., Stubbings, W. A., Abdallah, M. A. A. E., Durance, I., Paavola, J., Dallimer, M., … & Orsini, L. (2021). The Time Machine framework: monitoring and prediction of biodiversity loss. Trends in ecology & evolution.

Englund, C., Aksoy, E. E., Alonso-Fernandez, F., Cooney, M. D., Pashami, S., & Åstrand, B. (2021). AI perspectives in Smart Cities and Communities to enable road vehicle automation and smart traffic control. Smart Cities, 4(2), 783-802.

European Commission, Content Directorate-General for Communications Networks, and Technology. 2019. Ethics guidelines for trustworthy AI. PublicationsOffice.

European Commission, Directorate-General for Communications Networks, Content and Technology, (2020). The Assessment List for Trustworthy Artificial Intelligence (ALTAI) for self assessment, Publications Office.

European Commission. 2020. White Paper on Artificial Intelligence-A European approach to excellence and trust. Com (2020) 65 Final (2020).

European Commission. 2021. Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts (COM (2021) 206 Final).

Falco, G., Shneiderman, B., Badger, J., Carrier, R., Dahbura, A., Danks, D., … & Yeong, Z. K. (2021). Governing AI safety through independent audits. Nature Machine Intelligence, 3(7), 566-571.

Hagendorff, T. (2020). The ethics of AI ethics: An evaluation of guidelines. Minds and Machines, 30(1), 99-120.

Jiang, F., Jiang, Y., Zhi, H., Dong, Y., Li, H., Ma, S., … & Wang, Y. (2017). Artificial intelligence in healthcare: past, present and future. Stroke and vascular neurology, 2(4).

Knowles, B., & Richards, J. T. (2021, March). The sanction of authority: Promoting public trust in ai. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(pp. 262-271).

Koshiyama, A., Kazim, E., Treleaven, P., Rai, P., Szpruch, L., Pavey, G., … & Lomas, E. (2021). Towards algorithm auditing: A survey on managing legal, ethical and technological risks of AI, ML and associated algorithms.

Manyika, J., & Sneader, K. (2018). AI, automation, and the future of work: Ten things to solve for.

McKay, C. (2020). Predicting risk in criminal procedure: actuarial tools, algorithms, AI and judicial decision-making. Current Issues in Criminal Justice, 32(1), 22-39.

Morley, J., Machado, C. C., Burr, C., Cowls, J., Joshi, I., Taddeo, M., & Floridi, L. (2020). The ethics of AI in health care: a mapping review. Social Science & Medicine, 260, 113172.

O’neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books.

Panch, T., Mattie, H., & Celi, L. A. (2019). The “inconvenient truth” about AI in healthcare. NPJ digital medicine, 2(1), 1-3.

Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., … & Barnes, P. (2020, January). Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 conference on fairness, accountability, and transparency (pp. 33-44).

Richards, J., Piorkowski, D., Hind, M., Houde, S., & Mojsilović, A. (2020). A methodology for creating AI FactSheets. arXiv preprint arXiv:2006.13796.

Shneiderman, B. (2020). Bridging the gap between ethics and practice: guidelines for reliable, safe, and trustworthy human-centered AI systems. ACM Transactions on Interactive Intelligent Systems (TiiS), 10(4), 1-31.

Van Wynsberghe, A., & Guimarães Pereira, Â. (2021). Mobility Imaginaries: The Social & Ethical Issues of

Last update: 2022.09.04, v0.1