ANN assisted TPS on capped alanine dipeptide 2#
In this notebook you will learn
how to perform input importance analyses for models with transformed atomistic coordinates, i.e. how to find which are the important ANN inputs and how to relate them to atomistic coordinates
how to create visualizizations of the transitions colored by gradient
This notebook uses files created in 1_setup_and_TPS.ipynb, please do this notebook first.
%matplotlib inline
import os
import arcd
import torch
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = 9, 6 # make the figures a bit bigger
import openpathsampling as paths
# change to the working directory of choice
# same as for the first notebook
wdir = '/home/think/scratch/arcd_ala/IC_only/'
#wdir = None
if wdir is not None:
os.chdir(wdir)
storage = paths.Storage('ala_low_barrier_IC_only_TPS.nc', 'a')
arcd_store = arcd.Storage('arcd_storage.h5', 'a')
model = arcd_store.rcmodels['most_recent']
trainset = arcd_store.load_trainset()
model = model.complete_from_ops_storage(storage)
sampler = storage.pathsimulators[0]
trainhook = arcd.ops.TrainingHook(model, trainset)
storehook = arcd.ops.ArcdStorageHook(arcd_store, model, trainset)
densityhook = arcd.ops.DensityCollectionHook(model)
sampler.attach_hook(trainhook)
sampler.attach_hook(storehook)
sampler.attach_hook(densityhook)
sampler.restart_at_step(storage.steps[-1], storage=storage)
Restoring RCModelSelector without model.Please take care of resetting the model yourself.
arcd.ops.utils.set_rcmodel_in_all_selectors(model, sampler)
HIPR analysis#
We will have a look at the most important inputs and which atoms contribute to them.
hipr = arcd.analysis.HIPRanalysis(model, trainset)
final_hipr_losses = hipr.do_hipr()
final_hipr_losses_plus = hipr.do_hipr_plus()
# lets load the model after 500 MCsteps to compare
from arcd.base.rcmodel import RCModel
fname = storage.abspath + '_RCmodel_at_step500.pckl'
state, cls = RCModel.load_state(fname, storage)
state = cls.fix_state(state)
model_at_step500 = cls.set_state(state)
# use the complete trainset for HIPR
# this includes point the ANN has never trained on but makes it comparable to the previous HIPR
hipr_at_step500 = arcd.analysis.HIPRanalysis(model_at_step500, trainhook.trainset)
step500_hipr_losses = hipr_at_step500.do_hipr()
step500_hipr_losses_plus = hipr_at_step500.do_hipr_plus()
final_loss_diff = final_hipr_losses[:-1] - final_hipr_losses[-1]
plt.bar(np.arange(len(final_loss_diff)), final_loss_diff)
plt.xlabel('coordinate index', size=15)
plt.ylabel('Loss difference', size=15)
# get the associated coordinates for the maximally important inputs
max_idx = np.argsort(final_loss_diff)[::-1]
# this bit below only works for IC only
# figuring out how to get the info for SF CVs is left as an excersise to the reader :)
ic_parms = trainhook.model.descriptor_transform.kwargs['ic_parms']
print('Reference loss: ', final_hipr_losses[-1])
for idx in max_idx[:5]:
# this little helper function gets you the involved atoms given the input index and the inputparameters to the CV of question
# there is similar functions for the symmetry functions @ arcd.coords.symmetry.get_involved
print()
print('coordinate: ', arcd.coords.internal.get_involved(idx, **ic_parms))
print('with associated loss: ', final_hipr_losses[idx])
Reference loss: 0.5318849262839068
coordinate: ('cos', [6, 8, 14, 15])
with associated loss: 1.0222091827088011
coordinate: ('sin', [6, 8, 14, 15])
with associated loss: 0.5665803623770526
coordinate: ('cos', [1, 4, 6, 8])
with associated loss: 0.5388207342334373
coordinate: ('sin', [8, 14, 16, 18])
with associated loss: 0.5386478797166409
coordinate: ('sin', [1, 4, 6, 8])
with associated loss: 0.5351947312345523
# same for the model after 500 steps
step500_loss_diff = step500_hipr_losses[:-1] - step500_hipr_losses[-1]
plt.bar(np.arange(len(step500_loss_diff)), step500_loss_diff, label='after step 100')
plt.xlabel('coordinate index', size=15)
plt.ylabel('Loss difference', size=15)
max_idx = np.argsort(step500_loss_diff)[::-1]
print('Reference loss: ', step500_hipr_losses[-1])
for idx in max_idx[:5]:
print()
print('coordinate: ', arcd.coords.internal.get_involved(idx, **ic_parms))
print('with associated loss: ', step500_hipr_losses[idx])
Reference loss: 0.5410692020804583
coordinate: ('cos', [6, 8, 14, 15])
with associated loss: 1.028923017916803
coordinate: ('sin', [6, 8, 14, 15])
with associated loss: 0.5620001094307966
coordinate: ('cos', [1, 4, 6, 8])
with associated loss: 0.5549061560107325
coordinate: ('cos', [0, 1, 4, 5])
with associated loss: 0.5476359209376657
coordinate: ('sin', [1, 4, 6, 8])
with associated loss: 0.5457021648536423
Now we know that the dihedral between the atoms 6, 8, 14 and 15 seems to be important…what are these atoms?#
# lets get a snapshot from trajectory, such that we can ask its topology object for the atom names
snap = storage.snapshots[-1]
topol = snap.topology.mdtraj
for at in [6, 8, 14, 15, 16]:
print('atom: ', topol.atom(at), ' with index: ', at)
atom: ALA2-N with index: 6
atom: ALA2-CA with index: 8
atom: ALA2-C with index: 14
atom: ALA2-O with index: 15
atom: NME3-N with index: 16
It is a dihedral which is equivalent to psi (atoms 6, 8, 14 and 16) since atom 15 and atom 16 are always in the same plane, they are the O and the N of a peptide bond.
Can we get a more intuitive feeling for what is happening?#
Lets make a movie of a transition colored by gradient of the learned reaction coordinate!#
# get some TPs from the last steps of the storage
tras = []
for step in storage.steps[-50:]:
if step.change.canonical.accepted:
tras.append(step.change.canonical.trials[0].trajectory)
# instantiate the movie maker
gmovie = arcd.analysis.GradientMovieMaker(trainhook.model, trainhook.model.descriptor_transform, topol)
# lets choose a trajectory that is not too long to save some time
# (especially important if using symmetry functions)
[len(t) for t in tras]
[42, 42, 95, 60, 71, 33, 47, 64, 36, 70, 74, 47, 40, 94, 108]
# Note: works only for internal coordinates only
# choose the atoms we want to calculate gradients for
# just take all the ala atoms, as they are the only ones that can contribute if using internal coordinates only
atoms = topol.select('not resname HOH')
# this will write out two pdb trajectories with the gradient info in the Bfactors
# one with gradient magnitudes normalized per frame and one with just the gradient magnitudes
gmovie.color_by_gradient(tras[0], 'ala_movie_0.pdb', atom_indices=atoms)
look at the movie in VMD#
Unfortunately just loading the trajectory will not work, since VMD reads only the Bfactors of the first frame, the way to get the movie is:
start VMD
source arcd/examples/resources/pdbbfactor.tclin the VMD TK consolepdbbfactor $OUTFILENAMEloads the trajectory and writes the Bfactors of every frame into the VMD user field for every framechoose color by user and be amazed :)
storage.close()
arcd_store.close()