Machine learning and artificial intelligence in physiologically based pharmacokinetic (PBPK) modeling

Artificial Intelligence (AI) and Machine Learning (ML) are two rapidly advancing fields of research, which will clearly play important roles in gene-environment interactions (GxE), as well as almost every venue of scientific study. Hence, GEITP is introducing this topic here.

AI is a subset of computer science, which seeks to develop machines or computational approaches that can solve various cognitive tasks — at a level similar to (or even exceeding) human intelligence (and by far exceeding the intelligence of politicians).

ML (a subset of AI) applies mathematical or computational algorithms to perform complex tasks by automatically learning from past data or knowledge. Three main types of ML methods include: [a] supervised learning (train a model on known input and output data with the goal to predict new ‘outputs’ based on new ‘inputs’); [b] unsupervised learning (allow the trained model to cluster data in meaningful ways to identify intrinsic patterns or structures based on unknown input and output data relationships); and [c] reinforcement learning (a feedback-based learning approach used to learn optimal actions in an environment to receive the maximum reward).

A new class of ML, called deep learning, enables one to create more complex models with a logic structure similar to the human brain. The ML and deep-learning algorithms establish essential blocking of AI systems; These algorithms provide a data-driven approach to the evaluation of chemical/drug ADME (absorption, distribution, metabolism, and excretion) and toxicity properties.

Physiologically-based pharmacokinetic (PBPK) models are useful tools in drug development and risk assessment of environmental chemicals. PBPK model development requires the collection of species-specific physiological-, and chemical-specific ADME parameters; this can be very time-consuming and expensive. This raises a need to create computational models capable of predicting input parameter values for PBPK models (especially for new compounds). In this review [see attached], authors summarize an emerging paradigm for integrating PBPK modeling with AI- and ML-based computational methods. This paradigm includes three steps: [a] extract time-concentration PK data and/or ADME parameters from publicly available databases; [b] develop AI/ML-based approaches to predict ADME parameters; and [c] incorporate the AI/ML models into PBPK models to predict PK summary statistics (e.g., areas-under-the-curve and maximum plasma concentrations).

Other areas in which AI/ML methodology is starting to be used — are in the fields of population genomics and genome-wide association studies (GWAS). For example, much attention has been paid to the utility of polygenic risk scores (PRS) — which represent the genetic burden of a given trait; the long-term (highly optimistic) plan is to develop strategies for risk-based intervention through lifestyle modification, screening, and drug therapy. A PRS for a given trait is typically defined as “a weighted sum of a set of germline SNVs, in which the weight for each SNV corresponds to an estimate of the strength of association between the SNV and the trait.” But these topics will be covered in future email blogs. 😊


Toxicol Sci Jan 2023; 191: 1-14

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.