Chris Presents Research Seminar

Chris Stubbs recently presented a research seminar “Advancing Solubility Prediction Through Machine Learning” at Colorado State University. Chris discusses work from two recent papers from the group “Predicting homopolymer and copolymer solubility through machine learning” and Enhancing Predictive Models for Solubility in Multicomponent Solvent Systems using Semi-Supervised Graph Neural Networks.

Congrats Chris!

Abstract:

Solubility is a fundamental chemical property with wide-ranging applications including reaction optimization, waste recycling, and manufacturing. As measuring solubility can be time or resource-intensive, predicting solubility through computational methods has received significant attention in recent work. In particular, solubility prediction through machine learning (ML) has been heavily studied due to its speed and accessibility advantages over alternative methods using quantum mechanical or semi-empirical formulations. In this talk, we discuss two recent advances in solubility prediction for polymers and small molecules respectively. We first discuss our recent work to predict polymer solubility in single solvents, which is of interest for applications in plastic recycling and polymer design. We found that simple tree-based models with low-dimensional features can achieve over 80% prediction accuracy on homopolymer and copolymer solubility, and that these predictions can be rationalized using explainable AI methods such as Shapley Additive Explanations (SHAP). Following our discussion of polymer solubility prediction, we next examine ML predictions of small molecule solubility in multiple solvents (multicomponent solubility). In comparison to single solvent solubility, multicomponent solubility has increased complexity but allows for greater control over solute separation and processing, leading to uses in biomass upgrading and recycling. To accelerate these applications, we curated a new multicomponent solubility database (MixSolDB) which we used to train two graph neural network (GNN) models to predict solute solubility in up to three solvents. We find that our novel subgraph architecture for solubility prediction outperforms the more common concatenation architecture, achieving a mean absolute error (MAE) of 0.67 kcal/mol on ΔG_solv prediction. In summary, we demonstrate that ML-based predictions of solubility are chemically accurate while remaining useful for sustainable applications.

Kim Research Group