Bayesian Virtual Site Fitting¶
Virtual sites are useful dummy charge-carrying particles that are not constrained to atomic nuclei, allowing for additional positional flexibility. Here, we implement a Bayesian virtual site fitting algorithm using Pyro with the OpenFF Recharge virtual site formalism.
The full runnable script is at examples/tutorials/bayesian_vsite_fitting.py.
1. Molecule, Conformer, and Multipoles¶
As in the previous examples, we generate a GDMA record for a molecule. In this case, chloromethane.
from openff.recharge.utilities.molecule import extract_conformers
from openff.toolkit import Molecule
from pympfit import GDMASettings, MoleculeGDMARecord, Psi4GDMAGenerator
molecule = Molecule.from_smiles("CCl")
molecule.generate_conformers(n_conformers=1)
[conformer] = extract_conformers(molecule)
settings = GDMASettings()
coords, multipoles = Psi4GDMAGenerator.generate(
molecule, conformer, settings, minimize=True
)
record = MoleculeGDMARecord.from_molecule(molecule, coords, multipoles, settings)
2. Virtual Site Collection¶
We instantiate a VirtualSiteCollection object with placeholder
charge_increments and distance values. These will be sampled during
Bayesian inference.
from openff.recharge.charges.vsite import BondChargeSiteParameter, VirtualSiteCollection
vsite_collection = VirtualSiteCollection(parameters=[
BondChargeSiteParameter(
smirks="[#17:1]-[#6:2]", name="EP", distance=0.0,
charge_increments=(0.0, 0.0), sigma=0.0, epsilon=0.0, match="all-permutations",
)
])
3. Objective Term¶
Next, create an objective term that specifies the vsite position being
constrained to somewhere along the C–Cl bond ([#17:1]-[#6:2]). The
BondCharge specification places the virtual site along the bond axis.
from pympfit.optimize import MPFITObjective
[objective_term] = list(MPFITObjective.compute_objective_terms(
multipole_records=[record],
vsite_collection=vsite_collection,
_vsite_charge_parameter_keys=[("[#17:1]-[#6:2]", "BondCharge", "EP", 0)],
_vsite_coordinate_parameter_keys=[("[#17:1]-[#6:2]", "BondCharge", "EP", "distance")],
))
4. Bayesian Model and MCMC Sampling¶
Following standard Pyro recipes, we wrap a forward model function that
sets priors for trainable parameters. The model is then passed to the
MCMC NUTS algorithm for sampling.
import numpy as np
import pyro
import pyro.distributions as dist
import torch
from pyro.infer import MCMC, NUTS
n_atoms = molecule.n_atoms
targets = [
torch.from_numpy(r.astype(np.float64).flatten())
for r in objective_term.reference_values
]
def model():
# Priors
free_charges = pyro.sample("free_charges", dist.Normal(
torch.zeros(n_atoms - 1, 1, dtype=torch.float64), 0.5))
charge_inc = pyro.sample("charge_inc", dist.Normal(
torch.zeros(1, 1, dtype=torch.float64), 0.2))
distance = pyro.sample("distance", dist.Normal(
torch.tensor([[1.5]], dtype=torch.float64), 0.5))
sigma = pyro.sample("sigma", dist.HalfCauchy(
torch.tensor([[0.1]], dtype=torch.float64)))
preds = objective_term.predict_from_free_charges(free_charges, charge_inc, distance)
# Likelihood
for i, (pred, target) in enumerate(zip(preds, targets)):
pyro.sample(f"obs_{i}", dist.Normal(pred.flatten(), sigma), obs=target)
print("\nRunning NUTS (200 warmup, 500 samples)...")
mcmc = MCMC(NUTS(model), num_samples=500, warmup_steps=200, num_chains=1)
mcmc.run()
Example output (click to expand)
Running NUTS (200 warmup, 500 samples)...
Warmup: 0%| | 1/700 [00:00, 8.91it/s, step size=2.25e-01, acc. prob=1
...
Sample: 100%|██| 700/700 [00:24, 28.87it/s, step size=4.11e-01, acc. prob=0.933]
Atom charges:
C1: -0.5736 +/- 0.3150
Cl2: -0.5850 +/- 0.2218
H3: +0.3084 +/- 0.3073
H4: +0.3169 +/- 0.2937
H5: +0.5332 +/- 0.3753
Vsite charge increment: -0.0041 +/- 0.1894
Vsite distance: 1.512 +/- 0.493 Å
5. Visualization¶
The distribution of the vsite’s charge increment and position along the bond can be visualized using ArviZ trace plots. The trace plot shows both the sampling history (left) and posterior distribution (right).
import arviz as az
import matplotlib.pyplot as plt
idata = az.from_pyro(mcmc, log_likelihood=False)
idata.posterior["vsite_charge_increment"] = idata.posterior["charge_inc"].squeeze()
idata.posterior["vsite_distance_A"] = idata.posterior["distance"].squeeze()
az.plot_trace(idata, var_names=["vsite_charge_increment", "vsite_distance_A"])
plt.tight_layout()
plt.savefig("bayesian_vsite_trace.png", dpi=150)

As can be seen, the distribution of the vsite’s charge and position along the bond results in a normally distributed posterior centered near the prior means.