Quality-centric lifecycle models¶

Contributors:

Djalel Benbouzid (Volkswagen Group)
John Hall (Atos SE)

Introduction¶

Lifecycle models (LCM) help visualise the entire development process of a system. They serve as a common language that aligns practitioners and managers and, consequently, offers a level of standardisation across teams and projects.

Despite the rapid evolution of ML algorithms, ML lifecycle models did not change much recently. They are still commonly depicted as a quasi-linear process comprised of four phases,

Such a representation conveys the idea that the four phases are mostly sequential. Back-arrows are occasional, if not the exception. Reality however does not match this description. The iterations are frequent and the interdependencies between the phases are complex. Worse, proof-of-concepts that never reach the deployment phase are almost the norm.

This motivated us to rethink lifecycle models for ML systems not only to reflect what practitioners experience in real-life but more importantly, to integrate quality assurance practices into each phase. We draw inspiration from the rich literature of lifecycle models for traditional software systems, where V-Model, Spiral Model, Incremental Model were among the plethora of attempts to solve software practices organisationally. In this regard, ML systems are still following the deprecated waterfall model.

Quality-centric lifecycle models¶

Without reinventing the wheel, we advocate for a more quality-centric lifecycle model. The aforementioned four phases are in fact important to highlight, i.e. every ML system needs precise specifications, data and model management processes, and an explicit deployment procedure; however, each phase is treated as an iterative process that starts with specifications and concludes with quality assessment.

We also promote the following principles to be followed throughout the lifecycle,

Specs before data or code: unlike a common belief during the beginning of the data science era, not all data is useful and hence not to be harvested systematically. Data often as an expiry date, comes with errors, and is sampled with biases. Also, data is regulated. Specifying the business needs that in turn translate into data specifications is a way to pro-actively avoid bad quality data and ensure compliance with law. Furthermore, such a practice make the different actors in a project explicitly exchange information and hence reduce communication errors.
Design with social, legal, and UX experts: as part of the specifications, social and legal considerations should explicitly studied. Offering the environment and conditions for actors from different background to collaborate and co-design the system is part of what reduces the risk of failure down the road. The user experience (UX) is furthermore central. Good systems can fail not because their underlying function is flawed but because user interactions modalities were understudied.
Privilege interpretable models over explainability methods: Explainability methods are useful for debugging specific parts of Machine Learning, however, they come with their share of uncertainties and hence risk. Many methods addressing the same questions can provide different answers, their level of reliability is in many cases unspecified, and more fundamentally, some methods resort to a proxy (simplified) model in lieu of the actually deployed one, raising questions about the reliability of the proxy itself.
Quantify uncertainty when relevant: part of what makes ML systems reliable is their ability to inform about the uncertainty of their predictions. This can be achieve via various technical ways however, assessing the quality of the uncertainty estimates should also be part of the design and development process.
Care about privacy: when it comes to privacy, algorithm solutions such as differential privacy exist and should be privileged whenever the context allows for it, but non-technical solutions should also be put in place, such as clear data access policies and logging, and
Last but not least, put quality at the center: as nobody considers undocumented and untested code to be of acceptable quality, ML systems should also enjoy the same level of requirements. If documentation is mostly straightforward, testing properly remains highly reliant on the use-case at hand and the expertise of the practitioners. Testing also often benefit from exchanging from the domain and business experts to collect all the hypotheses to be tested during and after the development.

How to orchestrate the lifecycle model¶

In its simplest instantiation, the etami lifecycle boils down to the common “waterfall”-like model: the four phases are executed with no inner iterations. For most high-risk applications, one should expect each of the phases to iterate a number of times before moving to the subsequent one. Importantly, iterating implies being rigorous about versioning and to cover all of the specifications, the data (and features) versions, and model different versions.

Each iteration, as described in the diagrams below comprises the most common tasks of a ML project in addition to a duality specs/quality-assessment that is particularly inherent to the data and models management phases. Being strict about specifications before the data collection and model development allows to specify boundaries and eventually to define go/no-go schemes that should be explicit among the development team as well as properly documented.

System specification & evaluationData managementModel managementDeployment, Operations, monitoring

The link to auditing¶

etami advocates auditing as part of the development. The moto is audit early and audit often. This contrasts with the more common usage of auditing, closer to conformity assessment. Auditing for ML systems is in fact part of its quality assurance. We advocate an audit session at the end of each cycle, whether it is one of the four inner phases or, crucially, the outer larger development cycle.

Last update: 2022.11.17, v0.1