Materials for ML Workshops at ACS MARM: June 1-4, 2022

See course materials here

Overview

The materials provided herein are prepared for a trio of ~three-hour workshops that provide a “crash course” on machine learning in the context of chemistry. Within the taxonomy of machine learning algorithms, the content focuses primarily on supervised learning tasks coupled to more chemically oriented examples. A high-level summary of the topics/goals of each workshop “day” is provided below.

You may wish to examine the agenda for each day, which should provide a fair sketch of the planned content coverage for each day.

Day 1: Deep Learning in Chemistry

We cover essential elements of machine learning (in general) as well as practical elements on working with deep learning/neural networks and chemical data. At the conclusion of this workshop, attendees should be able to…

  • generally characterize ML algorithms in terms of their objectives

  • understand the basic mechanics of parameter optimization as model “training”

  • understand and implement simple neural networks using the Keras API

  • appreciate the importance and implementation of techniques related to feature scaling and cross-validation, with practical implementation facilitated by scikit-learn

Day 2: Chemical Representations and Modern Architectures

On this day, we cover topics that are essential to understanding and deploying deep learning in chemistry. At the conclusion of this workshop, attendees should be able to…

  • categorize chemical data

  • identify and appreciate equivariances/invariances in chemical data

  • select data transformations and architectures according to equivariances

  • understand graph neural networks

  • convey at a high-level sequence and geometric learning

Day 3: Applied Machine Learning in Chemistry and Materials Design

On this day, we primarily cover two areas of emerging interest in chemistry and materials design: (i) techniques for active learning/design of experiments and (ii) explainable AI or model interpretability. At the conclusion of this day, attendees should be able to

  • describe the fundamental components of an active learning paradigm

  • understand the basic mechanics of Gaussian process regression and its utility in Bayesian optimization

  • implement and assess and simple virtual active learning experiment

  • distinguish between justification, interpretation, and explanation

  • perform some level of feature importance quantification

  • know when models are interpretable

Organizers

The organizers of the workshop and associated website were are Michael A. Webb (Princeton University) and Andrew D. White (University of Rochester).

Additional References

Workshop content heavily draws upon Andrew’s book “Deep Learning for Molecules and Materials” (https://dmol.pub/intro.html). Some content has been prepared using lecture material from Mike’s course “Machine Learning in Chemical Science and Engineering,” which was first taught in Fall 2021 at Princeton University.