Skip to content

AI Documentation Practices

Contributors:

  • Chiara Gei (AVL)
  • Håkan Jonsson (Zalando)

Introduction

AI models are being used in an increasingly broad variety of fields, often supporting, or even substituting humans in assessment and decision-making tasks. The extensive use of AI models in critical tasks and the raising awareness of their impact on the final decision and their consequences has led to a growing interest in AI documentation practices with the objective of making AI models more transparent and increasing trust in them. Recent work in this direction focuses on the documentation of the datasets used for training and evaluating AI models, on the AI models themselves, as well as on more complex services that might comprise several AI models trained and tested on different datasets.

Some useful insight is provided by the following resources, which focus on different aspect of the outlined problem:

These resources provide a good starting point for the creation of a template for these documentation artifacts by suggesting important questions that should be answered by all the actors involved in the AI lifecycle. However, not much focus is given to the process of creating these documents, to the actors involved, their responsibilities, the value of these artifacts for the actors involved, the level of overlapping in the information provided (for example between a DataSheet and a Model Card) and the degree to which they fulfil the AI Act provisions. Our main objective is to support the actors involved in the creation of these artifacts and propose an answer to some of the open questions.

What documents should be created?

As AI Models performance highly depend on the data used to develop them, the literature suggests to create documentation for the dataset considered and the model itself. Furthermore, in presence of a more complex service which comprises several models and datasets, documentation about the whole service should be provided along with the documentation about all the separate datasets and models.

In the literature, they are called respectively DataSheets, Model Cards and FactSheets.

This distinction is reasonable, especially considering that the same dataset can be used for developing different models and the same model can be trained or applied to different datasets. However, to avoid redundant information we would suggest to:

  • Provide details about the dataset in the Datasheet only
  • Refer to the Datasheet of the dataset used to train and test the model in the Model Card (no more information about the dataset in the Model Card is needed)
  • Mention in the Model Card the train/test split of the dataset described in the related Datasheet

In case the service comprises more datasets and models, provide a collection of the above artifacts for all datasets and models involved and additionally considerations, tests and caveats that are specific to the combination of the different machine learning models and the interfaces between them.

What format should this documentation have?

Some papers present them as static documents that describe the current status of the dataset or developed model, some treat them as documents that get updated and grow over time or as interactive dashboards.

In order to ensure reproducibility, we believe it is important to document the status of the AI Model at every point in time separately to make sure that everything can be tracked back, and past results can be easily reconstructed. Hence, specific version of a Model Card and DataSheets at a certain point in time should be treated as immutable artifacts.

However, for the purpose of monitoring and benchmarking the information included in all these separate versions could be displayed in dashboards that would, for example, show the evolution of the dataset and/or model over time or compare the performance of different models, depending on the objective and needs of this revision.

All needed information for a certain version of a Model Card or DataSheet could be entered into a database where the user would run different queries depending on the needs.

Where to start? Template definition and actors involved

Although there is general information about the dataset and the model that should always be documented, there are going to be, in many cases, some organization/domain specific questions to be answered in these documents. Hence, in an initial phase, a template tailored to the specific organization or use case should be defined. Literature provides a good starting point for determining the right questions that should be answered in these documents. In the next two section, we have reported some useful topics that could be considered when defining the template.

The template definition phase would also be the right moment to define possible access restrictions to some parts of these documents depending on the final consumer as some consumers should not have access to or might not be interested in all aspects covered. Hence, in this phase it would also be beneficial to define what information should be shared with the different final consumers like auditors, final users of the AI model or other developers.

Many actors are usually involved in the AI model specification, development, and monitoring phases. We believe that the creation of such documents should not be the responsibility of a single person, but it should include inputs from ideally all actors involved. This would reduce the burden of creating the documentation and naturally promote communication inside the team. In this initial phase it would also be important to identify the actors that could be involved in the creation and review of these documents, like:

  • Product owner:

    • Could support in the context, goal, metrics and boundaries definition.
    • Could support in the definition of a template and the access restrictions to some final consumers.
    • Could review the obtained performance and compliance with the specification.
    • Could track the evolution over time and benchmark the performance with the ones of existing solutions.
  • Model developer:

    • Could support in the context, goal, metrics and boundaries definition.
    • Could support in the definition of a template and the access restrictions to some final consumers.
    • Could provide all the model development related information.
    • Could query model cards dealing with similar problems to have a starting point and compare results.
    • Could review the quality of the dataset and trigger the implementation of needed changes.
  • Test designer:

    • Could support in the context and boundaries definition.
    • Could define the dataset specifications.
    • Could monitor the evolution over time of the dataset.
  • Data collector:

    • Could support in the context and boundaries definition.
    • Could review the dataset specification.
    • Could provide all dataset collection related information.
  • Domain expert:

    • Could support in the context, goal, metrics and boundaries definition.
    • Could review the obtained performance and compliance with the specification.
  • Final user:

    • Could support in the context, goal, metrics and boundaries definition.
  • Legal/ethics expert:

    • Could support in the creation of the template by highlighting, depending on the use case, critical topics from an ethical/legal point of view.
    • Could support defining the dataset specifications.
    • Could review the compliance of the dataset with the defined legal/ethics specifications
    • Could review the performance of the model from a legal/ethics perspective

What should be documented about the dataset? DataSheet content

Datasheet Metadata

  • Unique Datasheet identifier
  • Version of the Datasheet (to track evolution)
  • Dataset creation start date and expected/actual end date
  • Dataset collection start date and expected/actual end date
  • Measurements IDs of data included in the dataset at that point in time
  • Files naming convention
  • Files/measurements format
  • Size (number of measurements, size, duration)
  • Model Cards IDs of models using this dataset
  • Datasheets IDs of datasets used in combination with this dataset
  • Location where the data can be found

Intended use and background

  • Dataset requirements:

    • Why does this dataset need to be collected?
    • What is its primary intended use?
    • What properties should it have (e.g. amount of data, features that need to be extracted, variety, …)
  • Dataset design:

    • What tests/data collection campaings should be run?
    • In which conditions the test should be run?
    • Which instrumentation should be used?
  • Dataset implementation:

    • How were the design decision implemented and why?
    • Were there major changes with respect to the design prescription?
    • Important information about the collection process
  • Dataset status summary:

    • Notes regarding the quality of the data collected (e.g. data meets the requirements, something is different than expected, limitations, …)
  • Keywords to query the dataset

  • Subjects/objects/components considered

Provenance

  • People involved in the collection of the data, contact details and organizations

Variables description

  • Overview per variable:

    • Name
    • Description
    • Units
    • Frequency
    • Plausibility limits
    • Aggregations/KPIs (to monitor evolution)
    • Plots (e.g. distribution, box/violin plots, histograms, to monitor the evolution)
  • Overview entire dataset

    • Correlation among variables (to monitor changes over time)
  • Evolution monitoring:

    • KPIs over time
    • Correlations over time
    • Distribution shifts
  • Issues encountered in the dataset:

    • Plausibility limits exceeded
    • Verbal description of other observed problems in the dataset if any (e.g. naming convention, units, …- to trigger changes)
  • Changes with respect to previous version

    • Only new data
    • Also changes to old data (e.g. corrupted or wrong measurements)

Labelling

  • If data is labelled, information about the labelling procedure

What should be documented about the model? Model Card content

Model Card Metadata

  • Unique Model Card identifier.
  • Version and release date of the Model Card (to track evolution).
  • The AI model is usually developed in the scope of a single use case/project. It might be beneficial to include the unique project identifier.
  • The same model (same training set and hyperparameters) might be re-used for other use cases/projects. It might be beneficial to include the identifiers of the projects that use the same model (to keep track of related activities that would be affected by changes to this model).
  • More models might be developed in the scope of a single use case/project. It might be beneficial to include the unique identifiers of the related Model Cards.
  • The model depends on the dataset used for training and evaluation. The same model (model type and hyperparameters) could be trained and evaluated with different datasets so the unique Datasheet identifier and version for every dataset used should be included here. More details are provided in the data sections.
  • Location where the actual model can be found (repository).
  • People involved in the development of the model, contact details and organizations.
  • Short model verbal description.
  • Keywords (for querying).
  • Other useful resources (e.g. papers, books, …).
  • Changes with respect to the previous version (e.g. commit message and automatic comparisons).
  • Model type and parameters.

Intended use and background

  • Description of the use case and objective.
  • Intended use and users involved.
  • Need for machine learning approach and benchmarking with traditional approaches.
  • Subjects/objects/components considered.
  • Out of scope use cases and limitations, examples of not suitable use cases.
  • Caveats and recommendations.

Performance across relevant factors

  • Relevant factors for which model performance might vary
  • Evaluation factors for which model performance was evaluated

Metrics and parameters

  • Model performance measures (defined in the initial phases).
  • Hyperparameters chosen and reason behind choice.
  • Decision thresholds and reason behind choice.

Evaluation Data

  • No actual dataset description. Only reference to Datasheet which includes information about the location of the data and the content. As the models are trained with a specific dataset which varies over time the unique ID of the corresponding Datasheet should be included as well as the version of the Datasheet (changing over time).
  • Details of the files/measurements IDs that describe the portion of the dataset used for evaluation.
  • Preprocessing:
    • Basic data preparation: included here in the Model Card (e.g. scaling, imputation, …)
    • If the input of the ML model are aggregations coming from different developers/tools (simulation) the ID and version of the Model Card that describes the performed calculation should be included. The details about the tool used and the calculation performed should be available in that Model Card.

Training Data

  • Same entries as in the “Evaluation Data” section.

Quantitative analysis

  • Metric depending on factors (unitary and intersectional results)

Ethical Considerations

  • Depending on the subjects involved.

Process creation: actors involved and lifecycle

etami lifecycle model

This documentation artifacts should not only provide all information needed to reproduce the steps taken, the results obtained and assess the quality and limitation of the AI model, but also promote interaction among all the actors involved. This would give the chance to the people involved in the problem definition, data collection and model development to align about the goal and technical limitations at early stages, have a better overview about the whole process and if needed steer the goal, collect more/different data or limit the scope to a more restricted set of scenarios. The information needed should be collected by different actors and at different phases of the lifecycle.

System formulation phase

In the system formulation phase:

  • The template for these documents should be adapted according to the needs of the use case. For example, the metrics to assess the model and monitor its performance in later phases should be defined.
  • Some useful information could be collected already in this phase like:
    • The objective of the model (Model Card)
    • The type of data that should be collected (DataSheet)
    • The intended use and background information (Model Card and DataSheet)
    • General use case description (Model Card)
    • Keywords (Model Card and DataSheet)
    • Paths where data can be found (DataSheet)
    • Repository where the model can be found (Model Card)
    • Subjects/components considered (Model Card and DataSheet)
    • Users (Model Card)
    • Out of scope use cases and limitations/boundaries of the model (Model Card)
    • Ethical considerations (Model Card and DataSheet)

The actors involved in this phase could be:

  • Product owner
  • Model developer
  • Test/data collection designer
  • Data collector
  • Domain expert
  • Final user
  • Legal/ethics expert

The data phase

In this phase data related information would be collected and included in both the DataSheet and Model Card, like:

  • Files/measurements naming convention (DataSheet)
  • Dataset design (DataSheet)
  • Dataset implementation (DataSheet)
  • Dataset status summary (DataSheet)
  • Contact details of the people involved in the data collection campaign (DataSheet)
  • Description of problems encountered or differences with respect to the design (DataSheet)
  • Changes with respect to previous versions (DataSheet)
  • Instrumentation used (DataSheet)
  • Basic preprocessing, features extraction, cleaning (Model Card)

The actors involved in this phase could be:

  • Model developer
  • Test/data collection designer
  • Data collector
  • Domain expert

The model management phase

In this phase model related information would be collected and included in both the DataSheet and Model Card, like:

  • Labelling (DataSheet)
  • Need for ML model and benchmarking (Model Card)
  • Model type and hyperparameters selected and reason behind choice (Model Card)
  • Quantitative performance analysis (Model Card)
  • Contact details (Model Card)
  • Caveats, recommendations, limitations (Model Card)

The actors involved in this phase could be:

  • Product owner
  • Model developer
  • Domain expert

The deployment and operations phase

In this phase the performance of the model and evolution of the dataset could still be monitored by automatically extracting the predefined metrics.

What is the value of this documentation artifacts?

The implementation of these AI Documentation practices should help achieving:

  • Reproducibility and Explainability: Make it easier to reproduce and justify the results obtained by collecting in dedicated documents all variables that define the model like: data used to train and evaluate, features considered, hyperparameters, model type.

  • Transparency and Boundaries definition: The objective is not only to provide all information needed to reproduce the results but also to make clear the context in which the model was developed, the intended use, the scenarios that it cannot cover and its limitation. Furthermore, users subjected to AI have the right to be informed about how their data is processed, which could be made available through these documents.

  • Reduction of technical debt: These documents give the chance to the people involved in the whole AI lifecycle to align about the goal and technical limitations at early stages, have a better overview about the whole process, make clear the need for steering the goal, collect more/different data or limit the scope to a more restricted set of scenarios.

  • Standardization: although a certain level of flexibility in terms of content is needed to accommodate the different needs of different use cases and organizations, literature provides suggestions on the content of these documents which can help defining a standardized template and party automate the creation process.

  • Accountability: DataSheets and Model Cards include information about all actors involved in the creation, update and review of the documentation and hence in the test design, data collection and model development.

  • Queryability: It makes it easier to query and compare already developed solutions for similar or the same use case (on different datasets), and consequently reduce the effort of implementing new solutions and provide benchmarks.

  • Monitoring: all these documents provide information about the performance of the model at a certain point in time when trained and evaluated with a certain dataset. The evolution of the performance over time and the cause of changes can be more easily monitored. The information reported allows also benchmarking between different models.

  • Auditability: Auditors can use Model Cards as the starting point to find all information regarding a model, thus minimizing audit effort for both auditor and auditee.


Last update: 2022.01.06, v0.1