Interfacing with Syndirella

Syndirella is a python package used to enumerate synthetically accessible chemical space around scaffold compounds by first performing retrosynthesis calculations, superstructure searches of reactants, cartesian multiplication of reactant superstructure sets, and placement with Fragmenstein. Syndirella is used extensively for fragment progression at XChem, and HIPPO has been co-developed to interface with it.

Generating Syndirella inputs

Syndirella requires a CSV input with details for each scaffold. Since scaffolds have to be placed relative to inspiration structures, and into a protein template. The HIPPO Pose object is most suitable as it encodes both Pose.inspirations and Pose.reference properties.

Syndirella can be run as a single process which sequentially processes scaffolds, or distributed across SLURM jobs.

Within the specified directory, PoseSet.to_syndirella() will generate:

  • a templates subdirectory with apo-structures for each reference protein

  • an input CSV compatible with syndirella with a row for each scaffold

  • an SDF containing all inspiration ligands

Processing all scaffolds in a single syndirella run

scaffolds = animal.poses[...]
scaffolds.to_syndirella("elabs")

Creating separate inputs for each scaffold

scaffolds = animal.poses[...]
for i,pose in enumerate(scaffolds):
    mrich.h3(f"{i}/{len(scaffolds)} {pose.id}")
    pset = animal.poses[pose.id,]
    pset.to_syndirella(f"elabs/P{pose.id}")

Running Syndirella

Syndirella will need to be run on the files generated in the previous section. An example script which will need to be run for each set of inputs specified as a command-line argument:

#!/bin/bash

set -e

KEY=$1

echo --input $(pwd)/$KEY"_syndirella_input.csv"
echo --hits_path $(pwd)/$KEY"_syndirella_inspiration_hits.sdf"
echo --output $(pwd)/$KEY"_elabs"
echo --metadata METADATA_CSV_PATH
echo --templates "$(pwd)/templates"

/opt/xchem-fragalysis-2/maxwin/conda/bin/syndirella \
    --input $(pwd)/$KEY"_syndirella_input.csv" \
    --hits_path $(pwd)/$KEY"_syndirella_inspiration_hits.sdf" \
    --output $(pwd)/$KEY"_elabs" \
    --templates "$(pwd)/templates" \
    --metadata METADATA_CSV_PATH \
    --no_scaffold_place

The METADATA_CSV_PATH placeholder will need to be replaced.

Loading Syndirella outputs

Syndirella has been developed to produce a HIPPO-friendly output in the syntax {inchikey}_{reaction_uuid}_to_hippo.pkl.gz. This can be parsed using the HIPPO.add_syndirella_elabs() method. In the case where many such files are loaded it is recommended to use a shell script run via SLURM, such as the one below:

Example script for loading syndirella elaborations and their routes
 1from pathlib import Path
 2import hippo
 3import mrich
 4
 5mrich.var("hippo", hippo.__file__)
 6
 7animal = hippo.HIPPO(PROJECT_NAME, DATABASE_PATH)
 8animal.db.backup()
 9
10output_root = Path("../syndirella/elabs/")
11
12files = list(output_root.glob("*/*-*-?/*to_hippo*"))
13
14for i,file in enumerate(files):
15    mrich.h2(f"{i+1}/{len(files)}")
16    try:
17        animal.add_syndirella_elabs(file)
18    except Exception as e:
19        mrich.error(file)
20        mrich.error(e)
21        continue
22
23animal.db.close()

The above script can be submitted to the DLS / IRIS cluster as follows:

sbatch --job-name load_elabs --exclusive --no-requeue /opt/xchem-fragalysis-2/maxwin/slurm/run_python.sh 4_load_elabs.py