Tutorial

We’ll show here how to explain molecular property prediction tasks without access to the gradients or any properties of a molecule. To set-up this activity, we need a black box model. We’ll use something simple here – the model is classifier that says if a molecule as an alcohol (1) or not (0). Let’s implement this model first

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole

# set-up rdkit drawing preferences
IPythonConsole.ipython_useSVG = True
IPythonConsole.drawOptions.drawMolsSameScale = False

def model(smiles):
    mol = Chem.MolFromSmiles(smiles)
    match = mol.GetSubstructMatches(Chem.MolFromSmarts('[O;!H0]'))
    return 1 if match else 0

Let’s now try it out on some molecules

smi = 'CCCCCCO'
print('f(s)', model(smi))
Chem.MolFromSmiles(smi)
f(s) 1
smi = 'OCCCCCCO'
print('f(s)', model(smi))
Chem.MolFromSmiles(smi)
f(s) 1
smi = 'c1ccccc1'
print('f(s)', model(smi))
Chem.MolFromSmiles(smi)
f(s) 0

Counterfacutal explanations

Let’s now explain the model - pretending we don’t know how it works - using counterfactuals

import exmol

instance = 'CCCCCCO'
space = exmol.sample_space(instance, model, batched=False)
cfs = exmol.cf_explain(space, 1)
exmol.plot_cf(cfs)
../_images/395cc6725e68fee3191d2fda5ffc98a0612b81cdd366859f21307491ef91e4a3.png

We can see that removing the alcohol is the smallest change to affect the prediction of this molecule. Let’s see the space and look at where these counterfactuals are.

exmol.plot_space(space, cfs)
../_images/57ea76bc24946f3866462b22155767afee719d7705acd65797b5f87cc14afc38.png

Explain using substructures

Now we’ll try to explain our model using substructures.

exmol.lime_explain(space)
exmol.plot_descriptors(space)
../_images/128cff0cf3739ffa7ec8acdf05ee8032c901fd4287ba98bef061f81f121a34f6.png

This seems like a pretty clear explanation. Let’s take a look at using substructures that are present in the molecule

import skunk
exmol.lime_explain(space, descriptor_type='ECFP')
svg = exmol.plot_descriptors(space, return_svg=True)
skunk.display(svg)
exmol.plot_utils.similarity_map_using_tstats(space[0])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [9], in <cell line: 4>()
      2 exmol.lime_explain(space, descriptor_type='ECFP')
      3 svg = exmol.plot_descriptors(space, return_svg=True)
----> 4 skunk.display(svg)
      5 exmol.plot_utils.similarity_map_using_tstats(space[0])

File /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/skunk/__init__.py:64, in display(svg)
     62 import IPython.display as display
     63 import base64
---> 64 data = base64.b64encode(svg.encode('utf8'))
     65 display.display(display.HTML(
     66     '<img src=data:image/svg+xml;base64,' + data.decode() + '>'))

AttributeError: 'NoneType' object has no attribute 'encode'
../_images/751270d7f61e8f113acee095dfcf47d80e276e5a7a92f33a3f700b253c2a6dea.png

We can see that most of the model is explained from the presence of the alcohol group - as expected