Materials for ML Workshops at ACS MARM: June 1-4, 2022
Overview
The materials provided herein are prepared for a trio of ~three-hour workshops that provide a “crash course” on machine learning in the context of chemistry. Within the taxonomy of machine learning algorithms, the content focuses primarily on supervised learning tasks coupled to more chemically oriented examples. A high-level summary of the topics/goals of each workshop “day” is provided below.
You may wish to examine the agenda for each day, which should provide a fair sketch of the planned content coverage for each day.
Day 1: Deep Learning in Chemistry
We cover essential elements of machine learning (in general) as well as practical elements on working with deep learning/neural networks and chemical data. At the conclusion of this workshop, attendees should be able to…
generally characterize ML algorithms in terms of their objectives
understand the basic mechanics of parameter optimization as model “training”
understand and implement simple neural networks using the Keras API
appreciate the importance and implementation of techniques related to feature scaling and cross-validation, with practical implementation facilitated by scikit-learn
Day 2: Chemical Representations and Modern Architectures
On this day, we cover topics that are essential to understanding and deploying deep learning in chemistry. At the conclusion of this workshop, attendees should be able to…
categorize chemical data
identify and appreciate equivariances/invariances in chemical data
select data transformations and architectures according to equivariances
understand graph neural networks
convey at a high-level sequence and geometric learning
Day 3: Applied Machine Learning in Chemistry and Materials Design
On this day, we primarily cover two areas of emerging interest in chemistry and materials design: (i) techniques for active learning/design of experiments and (ii) explainable AI or model interpretability. At the conclusion of this day, attendees should be able to
describe the fundamental components of an active learning paradigm
understand the basic mechanics of Gaussian process regression and its utility in Bayesian optimization
implement and assess and simple virtual active learning experiment
distinguish between justification, interpretation, and explanation
perform some level of feature importance quantification
know when models are interpretable
Organizers
The organizers of the workshop and associated website were are Michael A. Webb (Princeton University) and Andrew D. White (University of Rochester).
Additional References
Workshop content heavily draws upon Andrew’s book “Deep Learning for Molecules and Materials” (https://dmol.pub/intro.html). Some content has been prepared using lecture material from Mike’s course “Machine Learning in Chemical Science and Engineering,” which was first taught in Fall 2021 at Princeton University.