Interfacing with Syndirella
Syndirella is a python package used to enumerate synthetically accessible chemical space around scaffold compounds by first performing retrosynthesis calculations, superstructure searches of reactants, cartesian multiplication of reactant superstructure sets, and placement with Fragmenstein. Syndirella is used extensively for fragment progression at XChem, and HIPPO has been co-developed to interface with it.
Generating Syndirella inputs
Syndirella requires a CSV input with details for each scaffold. Since scaffolds have to be placed relative to inspiration structures, and into a protein template. The HIPPO Pose object is most suitable as it encodes both Pose.inspirations and Pose.reference properties.
Syndirella can be run as a single process which sequentially processes scaffolds, or distributed across SLURM jobs.
Within the specified directory, PoseSet.to_syndirella() will generate:
a
templatessubdirectory with apo-structures for each reference proteinan input CSV compatible with syndirella with a row for each scaffold
an SDF containing all inspiration ligands
Processing all scaffolds in a single syndirella run
scaffolds = animal.poses[...]
scaffolds.to_syndirella("elabs")
Creating separate inputs for each scaffold
scaffolds = animal.poses[...]
for i,pose in enumerate(scaffolds):
mrich.h3(f"{i}/{len(scaffolds)} {pose.id}")
pset = animal.poses[pose.id,]
pset.to_syndirella(f"elabs/P{pose.id}")
Running Syndirella
Syndirella will need to be run on the files generated in the previous section. An example script which will need to be run for each set of inputs specified as a command-line argument:
#!/bin/bash
set -e
KEY=$1
echo --input $(pwd)/$KEY"_syndirella_input.csv"
echo --hits_path $(pwd)/$KEY"_syndirella_inspiration_hits.sdf"
echo --output $(pwd)/$KEY"_elabs"
echo --metadata METADATA_CSV_PATH
echo --templates "$(pwd)/templates"
/opt/xchem-fragalysis-2/maxwin/conda/bin/syndirella \
--input $(pwd)/$KEY"_syndirella_input.csv" \
--hits_path $(pwd)/$KEY"_syndirella_inspiration_hits.sdf" \
--output $(pwd)/$KEY"_elabs" \
--templates "$(pwd)/templates" \
--metadata METADATA_CSV_PATH \
--no_scaffold_place
The METADATA_CSV_PATH placeholder will need to be replaced.
Loading Syndirella outputs
Syndirella has been developed to produce a HIPPO-friendly output in the syntax {inchikey}_{reaction_uuid}_to_hippo.pkl.gz. This can be parsed using the HIPPO.add_syndirella_elabs() method. In the case where many such files are loaded it is recommended to use a shell script run via SLURM, such as the one below:
1from pathlib import Path
2import hippo
3import mrich
4
5mrich.var("hippo", hippo.__file__)
6
7animal = hippo.HIPPO(PROJECT_NAME, DATABASE_PATH)
8animal.db.backup()
9
10output_root = Path("../syndirella/elabs/")
11
12files = list(output_root.glob("*/*-*-?/*to_hippo*"))
13
14for i,file in enumerate(files):
15 mrich.h2(f"{i+1}/{len(files)}")
16 try:
17 animal.add_syndirella_elabs(file)
18 except Exception as e:
19 mrich.error(file)
20 mrich.error(e)
21 continue
22
23animal.db.close()
The above script can be submitted to the DLS / IRIS cluster as follows:
sbatch --job-name load_elabs --exclusive --no-requeue /opt/xchem-fragalysis-2/maxwin/slurm/run_python.sh 4_load_elabs.py