Interfacing with Syndirella

Syndirella is a python package used to enumerate synthetically accessible chemical space around scaffold compounds by first performing retrosynthesis calculations, superstructure searches of reactants, cartesian multiplication of reactant superstructure sets, and placement with Fragmenstein. Syndirella is used extensively for fragment progression at XChem, and HIPPO has been co-developed to interface with it.

Generating Syndirella inputs

Syndirella requires a CSV input with details for each scaffold. Since scaffolds have to be placed relative to inspiration structures, and into a protein template. The HIPPO Pose object is most suitable as it encodes both Pose.inspirations and Pose.reference properties.

Syndirella can be run as a single process which sequentially processes scaffolds, or distributed across SLURM jobs.

Within the specified directory, PoseSet.to_syndirella() will generate:

a templates subdirectory with apo-structures for each reference protein
an input CSV compatible with syndirella with a row for each scaffold
an SDF containing all inspiration ligands

Processing all scaffolds in a single syndirella run

scaffolds = animal.poses[...]
scaffolds.to_syndirella("elabs")

Creating separate inputs for each scaffold

scaffolds = animal.poses[...]
for i,pose in enumerate(scaffolds):
    mrich.h3(f"{i}/{len(scaffolds)} {pose.id}")
    pset = animal.poses[pose.id,]
    pset.to_syndirella(f"elabs/P{pose.id}")

Running Syndirella

Syndirella will need to be run on the files generated in the previous section. An example script which will need to be run for each set of inputs specified as a command-line argument:

#!/bin/bash

set -e

KEY=$1

echo --input $(pwd)/$KEY"_syndirella_input.csv"
echo --hits_path $(pwd)/$KEY"_syndirella_inspiration_hits.sdf"
echo --output $(pwd)/$KEY"_elabs"
echo --metadata METADATA_CSV_PATH
echo --templates "$(pwd)/templates"

/opt/xchem-fragalysis-2/maxwin/conda/bin/syndirella \
    --input $(pwd)/$KEY"_syndirella_input.csv" \
    --hits_path $(pwd)/$KEY"_syndirella_inspiration_hits.sdf" \
    --output $(pwd)/$KEY"_elabs" \
    --templates "$(pwd)/templates" \
    --metadata METADATA_CSV_PATH \
    --no_scaffold_place

The METADATA_CSV_PATH placeholder will need to be replaced.

Loading Syndirella outputs

Syndirella has been developed to produce a HIPPO-friendly output in the syntax {inchikey}_{reaction_uuid}_to_hippo.pkl.gz. This can be parsed using the HIPPO.add_syndirella_elabs() method. In the case where many such files are loaded it is recommended to use a shell script run via SLURM, such as the one below:

Example script for loading syndirella elaborations and their routes

from pathlib import Path
import hippo
import mrich

mrich.var("hippo", hippo.__file__)

animal = hippo.HIPPO(PROJECT_NAME, DATABASE_PATH)
animal.db.backup()

output_root = Path("../syndirella/elabs/")

files = list(output_root.glob("*/*-*-?/*to_hippo*"))

for i,file in enumerate(files):
    mrich.h2(f"{i+1}/{len(files)}")
    try:
        animal.add_syndirella_elabs(file)
    except Exception as e:
        mrich.error(file)
        mrich.error(e)
        continue

animal.db.close()

The above script can be submitted to the DLS / IRIS cluster as follows:

sbatch --job-name load_elabs --exclusive --no-requeue /opt/xchem-fragalysis-2/maxwin/slurm/run_python.sh 4_load_elabs.py