Poses

class hippo.pose.Pose(db: Database, id: int, inchikey: str | None, alias: str | None, smiles: str, reference: int, path: str, compound: int, target: int, mol: Mol | bytes | None, fingerprint: int, energy_score: float | None = None, distance_score: float | None = None, inspiration_score: float | None = None, metadata: dict | None = None)[source]

A Pose is a particular conformer of a Compound within a protein environment. A pose will have its own (stereochemical) smiles string, and must have a path to a coordinate file. Poses can have inspirations that can be used to trace fragment-derived scaffolds in merges and expansions.

Attention

Pose objects should not be created directly. Instead use HIPPO.register_pose() or HIPPO.poses()

self + other: Pose | PoseSet PoseSet[source]

Add a PoseSet to this pose

self == other: Pose bool[source]

Compare this pose with another instance

repr(self) str[source]

ANSI Formatted string representation

str(self) str[source]

Unformatted string representation

add_subsite(name: str, commit: bool = True) SubsiteTag[source]

Tag this pose with a protein subsite

Parameters:
  • name – the name of the subsite

  • commit – commit the insertion to the database

Returns:

SubsiteTag

property alias: str

Returns the pose’s alias

property base_ids: list[int] | None

Get the base Compound IDs

calculate_classic_fingerprint(debug: bool = False) <property object at 0x74862ad95c10>[source]

Calculate the pose’s interaction fingerprint

calculate_interactions(resolve: bool = True, distance_padding: float = 0.0, angle_padding: float = 0.0, force: bool = False, debug: bool = False, commit: bool = True, mutation_warnings: bool = True, delete_temp_table: bool = True) None[source]

Enumerate all valid interactions between this ligand and the protein

Parameters:
  • resolve – Cull duplicate / less-significant interactions

  • distance_padding – Apply a padding in Angstrom to all distance cutoffs

  • angle_padding – Apply a padding in degrees to all angle cutoffs

  • force – Force a recalculation even if the pose has already been fingerprinted

  • debug – Increase verbosity for debugging

  • commit – commit the changes to the database (Default value = True)

  • mutation_warnings – warn when there has been a mutation in the protein (Default value = True)

  • delete_temp_table – delete the temporary interaction table created during interaction resolution (Default value = True)

calculate_prolif_interactions(return_all: bool = False, max_retry: int = 5, use_mda: bool = False, force: bool = False, clear_existing: bool = True, debug: bool = False, resolve: bool = True) prolif.Fingerprint[source]

Use ProLIF to populate the interactions table

property classic_fingerprint: <property object at 0x74862ad95c10>

Classic HIPPO fingerprint dictionary, mapping protein Feature ID’s to the number of corresponding ligand features (from any Pose)

property compound: Compound

Returns the pose’s associated compound

property compound_id: int

Returns the pose’s associated compound ID

property db: Database

Returns a pointer to the parent database

property derivatives: PoseSet

Returns the pose’s derivatives

property dict: dict

Serialised dictionary representing the pose

property distance_score: float | None

Distance score of the Pose (w.r.t. its inspirations), in Angstroms

draw(inspirations: bool = True, protein: bool = False, **kwargs) None[source]

Render this pose (and its inspirations)

Parameters:
  • inspirations – Render the inspirations? (Default value = True)

  • protein – Render the protein? This wraps Pose.render() (Default value = False)

draw2d() None[source]

Draw a 2D drawing of this pose

property energy_score: float | None

Energy score of the Pose (kcal/mol)

property features: list[molparse.rdkit.Feature]

Returns the pose’s features

get_compound() Compound[source]

Get the Compound that this pose is a conformer of

get_derivative_ids() list[int][source]

Get the Pose IDs of this pose’s derivatives

get_derivatives() PoseSet[source]

Get a PoseSet of this pose’s derivatives

get_dict(mol: bool = False, inspirations: bool | str = True, subsites: bool | str = True, reference: bool | str = True, metadata: bool = True, duplicate_name: str | bool = False, sanitise_null_metadata_values: bool = False, skip_metadata: list[str] | None = None, sanitise_tag_list_separator: str | None = None, sanitise_metadata_list_separator: str | None = ';', tags: bool = True) <property object at 0x74862ad95c10>[source]

Returns a dictionary representing this Pose. Arguments:

Parameters:
  • mol – Include a rdkit.Chem.Mol in the output? (Default value = False)

  • inspirations – Include inspirations? [True, False, 'names'] Specify names to format as a comma-separated string (Default value = True)

  • subsites – Include subsites? [True, False, 'names'] Specify names to format as a comma-separated string (Default value = True)

  • reference – Include reference? [True, False, 'name'] Specify name to include the Pose name rather than it’s ID (Default value = True)

  • metadata – Include metadata? (Default value = True)

  • duplicate_name – Specify the name of a new column duplicating the pose name column (Default value = False)

  • tags – bool: Include tags? (Default value = True)

get_inspiration_ids() list[int][source]

Get the Pose IDs of this pose’s inspirations

get_inspirations() PoseSet[source]

Get a PoseSet of this pose’s inspirations

get_tags() TagSet[source]

Get this Pose’s tags

grid() None[source]

Draw a grid of this pose with its inspirations

property has_complex_pdb_path: bool

Does this pose have a PDB file?

property has_fingerprint: bool

Does the pose have a fingerprint?

property id: int

Returns the pose’s database ID

property inchikey: str

Returns the pose’s inchikey

property inspiration_score: float | None

inspiration score of the Pose in range 0.00-1.00

property inspirations: PoseSet

Returns the pose’s inspirations

property interactions: InteractionSet

Get a InteractionSet for this Pose

property metadata: MetaData

Returns the pose’s metadata

property mol: rdkit.Chem.Mol

Returns a pose’s rdkit.Chem.Mol

property name: str

Returns the pose’s name

property num_atoms_added: int

Calculate the number of atoms added relative to the base or inspirations

property num_atoms_added_wrt_bases: int | list[int] | None

Calculate the number of atoms added relative to the base

property num_atoms_added_wrt_inspirations: int | None

Calculate the number of atoms added relative to its inspirations

property num_bases: int

Get the number of base scaffolds

property num_heavy_atoms: int

Number of heavy atoms

property path: str

Returns the pose’s path

plain_repr() str[source]

Unformatted detailed string representation

plot3d(features: bool = False, **kwargs) plotly.graph_objects.Figure[source]

Use Molparse/Plotly to create a 3d figure of this pose

Parameters:

features – include the features in the figure

Returns:

a plotly Figure object

posebusters(debug: bool = False) bool[source]

Run a posebusters ligand check on this pose’s molecule

property protein_system: molparse.System

Returns the pose’s protein molparse.System

property protonated_mol: rdkit.Chem.Mol

Guess hydrogen positions

property reference: Pose

Returns the pose’s protein reference (another pose)

property reference_id: int

Returns the pose’s protein reference ID

render(protein='cartoon', ligand='stick', protein_color='spectrum', interactions: bool = True, file: str | None = None) None[source]

Render this pose with the protein using py3Dmol

Parameters:
  • protein – protein representation, default = ‘cartoon’

  • ligand – ligand representation, default = ‘stick’

  • protein_color – color of protein representation, default = ‘spectrum’

score_inspiration(debug: bool = False, draw: bool = False, return_all: bool = False) float[source]

Score how well this Pose recapitulates the pharmacophoric features of its inspirations.

Parameters:
  • debug – Increased verbosity for debugging (Default value = False)

  • draw – Render each inspiration pose with it’s features, the derivative with the combined features of the inspirations, and the derivative with it’s features. (Default value = False)

set_has_fingerprint(fp: bool, commit: bool = True) None[source]

Update the database to reflect this pose’s has_fingerprint property

showcase() None[source]

Print and render this pose as if you were using PoseSet.interactive()

property smiles: str

Returns the pose’s smiles

summary(metadata: bool = True, tags: bool = True, subsites: bool = True) None[source]

Print a summary of this pose

Parameters:

metadata – include metadata (Default value = True)

property table: str

Get the name of the database table

property tags: TagSet

Returns the pose’s tags

property target: Target

Returns the pose’s associated target

to_syndirella(out_key: str | Path) DataFrame[source]

Create syndirella inputs. See PoseSet.to_syndirella()

class hippo.pset.PoseTable(db: Database)[source]

Class representing all Pose objects in the ‘pose’ table of the Database.

Attention

PoseTable objects should not be created directly. Instead use the HIPPO.poses() property. See Getting started with HIPPO and Adding data into HIPPO.

Use as an iterable

Iterate through Pose objects in the table:

for pose in animal.poses:
    ...

Selecting poses in the table

The PoseTable can be indexed with Pose IDs, names, aliases, or list/sets/tuples/slices thereof:

ptable = animal.poses

# indexing individual compounds
pose = ptable[13]                            # using the ID
pose = ptable["BSYNRYMUTXBXSQ-UHFFFAOYSA-N"] # using the InChIKey
pose = ptable["Ax0310a"]                     # using the alias

# getting a subset of compounds
pset = ptable[13,15,18]      # using IDs (tuple)
pset = ptable[[13,15,18]]    # using IDs (list)
pset = ptable[set(13,15,18)] # using IDs (set)
pset = ptable[13:18]         # using a slice

Tags and target IDs can also be used to filter:

pset = animal.poses(tag='hits') # select compounds tagged with 'hits'
pset = animal.poses(target=1)   # select poses from the first target
self(*, tag: str | None = None, target: int | None = None, subsite: int | None = None, smiles: str | None = None) PoseSet[source]

Filter poses by a given tag, subsite ID, or target ID. See PoseTable.get_by_tag(), PoseTable.get_by_target(), amd PoseTable.get_by_subsite()

self[key: int | str | tuple | list | set | slice] Pose[source]

Get a member Pose object or subset PoseSet thereof.

Parameters:

key – Can be an integer ID, negative integer index, alias or inchikey string, list/set/tuple of IDs, or slice of IDs

iter(self)[source]

Iterate through all compounds

len(self) int[source]

Total number of compounds

repr(self) str[source]

ANSI Formatted string representation

str(self)[source]

Unformatted string representation

property aliases: list[str]

Returns the aliases of child poses

property db: Database

Returns the associated Database

draw(max_draw: int = 100) None[source]

Render the poses

Parameters:

max_draw – show a warning if trying to draw more than this number of poses (Default value = 100)

get_by_metadata(key: str, value: str | None = None) PoseSet[source]

Get all child poses by their metadata. If no value is passed, then simply containing the key in the metadata dictionary is sufficient

Parameters:
  • key – metadata key to match

  • value – metadata value to match, if None any pose with the key present will be returned (Default value = None)

Returns:

a PoseSet of the subset

get_by_metadata_substring_match(substring: str) PoseSet[source]

Get PoseSet of poses with metadata JSON containing substring

get_by_smiles(smiles: str) Pose | PoseSet | None[source]

Get a member pose by it’s smiles

get_by_subsite(*, id: int) PoseSet[source]

Get all child poses with a certain Subsite ID:

Parameters:

idSubsite ID

Returns:

a PoseSet of the subset

get_by_tag(tag: str, inverse: bool = False) PoseSet[source]

Get all child poses with a certain tag

Parameters:
  • tag – tag to search for

  • inverse – invert the selection

Returns:

a PoseSet of the subset

get_by_target(*, id: int) PoseSet[source]

Get all child poses with a certain Target ID:

Parameters:

idTarget ID

Returns:

a PoseSet of the subset

property id_name_dict: dict[int, str]

Return a dictionary mapping pose ID’s to their name

property ids: list[int]

Returns the IDs of child poses

property inchikeys: list[str]

Returns the inchikeys of child poses

property interactions: InteractionSet

Get a InteractionSet

interactive() None[source]

Interactive widget to navigate poses in the table

Attention

This method instantiates a PoseSet containing all poses, it is recommended to instead select a subset for display. This method is only intended for use within a Jupyter Notebook.

property name: str | None

Returns the name of set

property names: list[str]

Returns the aliases of child poses

property num_fingerprinted: int

Count the number of fingerprinted poses

summary() None[source]

Print a summary of this pose set

property table: str

Returns the name of the Database table

property tags: set[str]

Returns the set of unique tags present in this pose set

class hippo.pset.PoseSet(db: Database, indices: list = None, *, sort: bool = True, name: str | None = None)[source]

Object representing a subset of the ‘pose’ table in the Database.

Attention

PoseSet objects should not be created directly. Instead use the HIPPO.poses() property. See Getting started with HIPPO and Adding data into HIPPO.

Use as an iterable

Iterate through Pose objects in the set:

pset = animal.poses[:100]

for pose in pset:
        ...

Check membership

To determine if a Pose is present in the set:

is_member = pose in cset

Selecting compounds in the set

The PoseSet can be indexed like standard Python lists by their indices

pset = animal.poses[1:100]

# indexing individual compounds
pose = pset[0]  # get the first pose
pose = pset[1]  # get the second pose
pose = pset[-1] # get the last pose

# getting a subset of compounds using a slice
pset2 = pset[13:18] # using a slice
self + other: PoseSet PoseSet[source]

Add a PoseSet to this set

self(*, tag: str = None, target: int = None, subsite: int = None) PoseSet[source]

Filter poses by a given tag, Subsite ID, or target ID. See PoseSet.get_by_tag(), PoseSet.get_by_target(), amd PoseSet.get_by_subsite()

self[key: int | slice] Pose | PoseSet[source]

Get poses or subsets thereof from this set

Parameters:

key – integer index or slice of indices

iter(self)[source]

Iterate through poses in this set

len(self) int[source]

The number of poses in this set

repr(self) str[source]

ANSI Formatted string representation

str(self)[source]

Unformatted string representation

self - other: PoseSet PoseSet[source]

Substract a PoseSet from this set

add_tag(tag: str) None[source]

Add this tag to every member of the set

property aliases: list[str]

Returns the aliases of child poses

append_to_metadata(key, value) None[source]

Append a specific item to list-like values associated with a given key for all member’s metadata dictionaries

Parameters:
  • key – the Metadata key to match

  • value – the value to append to the list

property avg_distance_score: float

Average distance score of poses in this set

property avg_energy_score: float

Average energy score of poses in this set

property best_placed_pose: Pose

Returns the pose with the best distance_score in this subset

property best_placed_pose_id: int

Get the id of the pose with the best distance_score in this subset

calculate_inspiration_scores(alpha: float = 0.95, beta: float = 0.05, score_type: str = 'combo') pd.DataFrame[source]

Set inspiration_score values using MoCASSIn.calculate_mocassin_tversky

Parameters:
  • alpha – Tversky alpha parameter

  • beta – Tversky beta parameter

  • score_type – Score type to add to database, choose from “combo”, “shape”, “colour”

Returns:

Pandas DataFrame with molecules and scores

property compounds: CompoundSet

Get the compounds associated to this set of poses

property db: Database

Returns the associated Database

property df: pandas.DataFrame

Get a DataFrame of the poses in this set

draw() None[source]

Render this pose set with Py3Dmol

filter(function=None, *, key: str = None, value: str = None, operator='=', inverse: bool = False)[source]

Filter this PoseSet by selecting members where function(pose) is truthy or pass a key, value, and optional operator to search by database values

Parameters:
  • function – callable object

  • key – database field for ‘pose’ table (’pose_’ prefix not needed)

  • value – value to compare to

  • operator – comparison operator (default = “=”)

  • inverse – invert the selection (Default value = False)

property fraction_fingerprinted: float

Return the fraction of fingerprinted poses in this set

get_best_placed_poses_per_compound()[source]

Choose the best placed pose (best distance_score) grouped by compound

get_by_compound(*, compound: int | Compound) PoseSet | None[source]

Select a subset of this PoseSet by the associated Compound.

Parameters:

compoundCompound object or ID

Returns:

a PoseSet of the selection

get_by_inspiration(inspiration: int | Pose, inverse: bool = False)[source]

Get all child poses with with this inspiration.

Parameters:
  • inspiration – inspiration Pose ID or object

  • inverse – invert the selection (Default value = False)

get_by_metadata(key: str, value: str | None = None, debug: bool = False) PoseSet[source]

Get all child poses with by their metadata. If no value is passed, then simply containing the key in the metadata dictionary is sufficient

Parameters:
  • key – metadata key to search for

  • value – metadata value, if None return poses with the metadata key regardless of value (Default value = None)

get_by_reference(ref_id: int) PoseSet | None[source]

Get poses with a certain reference id

Parameters:

ref_id – reference Pose ID

get_by_subsite(*, id: int) PoseSet | None[source]

Select a subset of this PoseSet by the associated Subsite.

Parameters:

idSubsite ID

Returns:

a PoseSet of the selection

get_by_tag(tag: str, inverse: bool = False) PoseSet[source]

Get all child poses with a certain tag

Parameters:
  • tag – tag to filter by

  • inverse – return all poses not tagged with tag (Default value = False)

get_by_target(*, id: int) PoseSet | None[source]

Select a subset of this PoseSet by the associated Target.

Parameters:

idTarget ID

Returns:

a PoseSet of the selection

get_df(smiles: bool = True, inchikey: bool = True, alias: bool = True, name: bool = True, compound_id: bool = False, target_id: bool = False, reference_id: bool = False, path: bool = False, mol: bool = False, energy_score: bool = False, distance_score: bool = False, inspiration_score: bool = False, metadata: bool = False, expand_metadata: bool = True, debug: bool = True, inspiration_ids: bool = False, inspiration_aliases: bool = False, derivative_ids: bool = False, tags: bool = False, subsites: bool = False) pandas.DataFrame[source]

Get a DataFrame of the poses in this set.

Parameters:
  • smiles – include SMILES column (Default value = True)

  • inchikey – include InChIKey column (Default value = True)

  • alias – include alias column (Default value = True)

  • name – include name column (Default value = True)

  • compound_id – include Compound ID column (Default value = False)

  • reference_id – include reference Pose ID column (Default value = False)

  • target_id – include reference Target ID column (Default value = False)

  • path – include path column (Default value = False)

  • mol – include rdkit.Chem.Mol in output (Default value = False)

  • energy_score – include energy_score column (Default value = False)

  • distance_score – include distance_score column (Default value = False)

  • inspiration_score – include inspiration_score column (Default value = False)

  • metadata – include metadata in output (Default value = False)

  • expand_metadata – create separate column for each metadata key (Default value = True)

  • inspiration_ids – include inspiration Pose ID column

  • inspiration_aliases – include inspiration Pose alias column

  • derivative_ids – include derivative Pose ID column

  • tags – include tags column

  • subsites – include subsites column

get_interaction_clusters() dict[int, PoseSet][source]

Cluster poses based on shared interactions.

grid() None[source]

Draw a grid of all contained molecules

property id_name_dict: dict

Return a dictionary mapping pose ID’s to their name

property ids: list[int]

Returns the ids of poses in this set

property inchikeys: list[str]

Returns the inchikeys of child poses

property indices: list[int]

Returns the ids of poses in this set

property inspiration_sets: list[set[int]]

Return a list of unique sets of inspiration Pose IDs

property inspirations: int

Return the number of unique inspirations for poses in this set

property interaction_overlap_score: int

Count the number of member pose pairs which share at least one but not all interactions

property interactions: InteractionSet

Get a InteractionSet for this Pose

interactive(print_name: str = True, method: str | None = None, function: Callable | None = None, **kwargs)[source]

Interactive widget to navigate compounds in the table

Parameters:
  • print_name – print the Pose name (Default value = True)

  • method – pass the name of a Pose method to interactively display. Keyword arguments to interactive() will be passed through (Default value = None)

  • function – pass a callable which will be called as function(pose)

property mols: list[rdkit.Chem.mol]

Get the rdkit Molecules contained in this set

property name: str | None

Returns the name of set

property names: list[str]

Returns the aliases of poses in this set

property num_compounds: int

Count the compounds associated to this set of poses

property num_fingerprinted: int

Count the number of fingerprinted poses in this set

property num_inspiration_sets: int

Return the number of unique sets of inspirations

property num_inspirations: int

Return the number of unique inspirations for poses in this set

property num_subsites: int

Count the number of subsites that poses in this set come into contact with

property reference

Bulk set the references for poses in this set

property reference_ids: set[int]

Return a set of Pose ID’s of the all the distinct references in this PoseSet

property references: PoseSet

Return a PoseSet of the all the distinct references in this PoseSet

set_subsites_from_metadata_field(field='CanonSites alias') None[source]

Create and assign subsite entries from a metadata field

Parameters:

field – the metadata field to use

property smiles: list[str]

Returns the smiles of poses in this set

split_by_inspirations(single_set: bool = False) dict[int, PoseSet] | PoseSet[source]

Split this PoseSet into subsets grouped by inspirations

Parameters:

single_set – Return a single PoseSet with members sorted by inspirations (Default value = False)

Returns:

a dictionary with tuples of inspiration Pose IDs as keys and PoseSet subsets as values

split_by_reference() dict[int, PoseSet][source]

Split this PoseSet into subsets grouped by reference ID

Returns:

a dictionary with reference Pose IDs as keys and PoseSet subsets as values

property str_ids: str

Return an SQL formatted tuple string of the Pose IDs

property subsite_balance: float

Measure of how evenly subsite counts are distributed across poses in this set

property subsite_ids: set[int]

Return a list of subsite id’s of member poses

subsite_summary() pd.DataFrame[source]

Print a table counting poses by subsite

summary() None[source]

Print a summary of this pose set

property table: str

Returns the name of the Database table

property tags: set[str]

Returns the set of unique tags present in this pose set

property target_ids: list[int]

Returns the Target objects ID’s of poses in this set

property target_names: list[str]

Returns the Target objects of poses in this set

property targets: list[Target]

Returns the Target objects of poses in this set

to_fragalysis(out_path: str, *, method: str, ref_url: str = 'https://hippo.winokan.com', submitter_name: str, submitter_email: str, submitter_institution: str, metadata: bool = True, sort_by: str | None = None, sort_reverse: bool = False, generate_pdbs: bool = False, copy_reference_pdbs: bool = False, skip_no_reference: bool = True, skip_no_inspirations: bool = True, skip_metadata: list[str] | None = None, tags: bool = True, subsites: bool = True, extra_cols: dict[str, list] = None, **kwargs)[source]

Prepare an SDF for upload to the RHS of Fragalysis.

Parameters:
  • out_path – the file path to write to

  • method – method used to generate the compounds

  • ref_url – reference URL for the method

  • submitter_name – name of the person submitting the compounds

  • submitter_email – email of the person submitting the compounds

  • submitter_institution – institution name of the person submitting the compounds

  • metadata – include metadata in the output? (Default value = True)

  • skipmetadata – exclude metadata keys from output

  • sort_by – if set will sort the SDF by this column/field (Default value = None)

  • sort_reverse – reverse the sorting (Default value = False)

  • generate_pdbs – generate accompanying protein-ligand complex PDBs (Default value = False)

  • ingredients – get procurement and amount information from this IngredientSet (Default value = None)

  • tags – include a column for tags in the output (Default value = True)

  • subsites – include a column for subsites in the output (Default value = True)

  • extra_cols – extra_cols should be a dictionary with a key for each column name, and list values where the first element is the field description, and all subsequent elements are values for each pose.

  • name – How to determine the molecule name, see PoseSet.get_df()

to_knitwork(out_path: str, path_root: str = '.', aligned_files_dir: str | None = None) None[source]

Knitwork takes a CSV input with:

  • observation shortcode

  • smiles

  • path_to_ligand_mol

  • path_to_pdb

Parameters:
  • out_path – path to output CSV

  • path_root – paths in CSV will be relative to here

to_pymol(prefix: str | None = None) None[source]

Group the poses by reference protein and inspirations and output relevant PDBs and SDFs.

Parameters:

prefix – prefix to give all output subdirectories (Default value = None)

to_syndirella(out_key: str | Path, separate: bool = False) DataFrame[source]

Create syndirella inputs

write_sdf(out_path: str, name_col: str = 'alias', inspiration_ids: bool = False, inspiration_aliases: bool = False, **kwargs) None[source]

Write an SDF

Parameters:
  • out_path – filepath of the output

  • name_col – pose property to use as the name column, can be ["name", "alias", "inchikey", "id"] (Default value = ‘name’)

  • inspiration_ids – include inspiration Pose ID column

  • inspiration_aliases – include inspiration Pose alias column

  • fragalysis_inspirations – create inspirations column “ref_mols”