Poses
- class hippo.pose.Pose(db: Database, id: int, inchikey: str | None, alias: str | None, smiles: str, reference: int, path: str, compound: int, target: int, mol: Mol | bytes | None, fingerprint: int, energy_score: float | None = None, distance_score: float | None = None, inspiration_score: float | None = None, metadata: dict | None = None)[source]
A
Poseis a particular conformer of aCompoundwithin a protein environment. A pose will have its own (stereochemical) smiles string, and must have a path to a coordinate file. Poses can have inspirations that can be used to trace fragment-derived scaffolds in merges and expansions.Attention
Poseobjects should not be created directly. Instead useHIPPO.register_pose()orHIPPO.poses()- add_subsite(name: str, commit: bool = True) SubsiteTag[source]
Tag this pose with a protein subsite
- Parameters:
name – the name of the subsite
commit – commit the insertion to the database
- Returns:
SubsiteTag
- property alias: str
Returns the pose’s alias
- calculate_classic_fingerprint(debug: bool = False) <property object at 0x74862ad95c10>[source]
Calculate the pose’s interaction fingerprint
- calculate_interactions(resolve: bool = True, distance_padding: float = 0.0, angle_padding: float = 0.0, force: bool = False, debug: bool = False, commit: bool = True, mutation_warnings: bool = True, delete_temp_table: bool = True) None[source]
Enumerate all valid interactions between this ligand and the protein
- Parameters:
resolve – Cull duplicate / less-significant interactions
distance_padding – Apply a padding in Angstrom to all distance cutoffs
angle_padding – Apply a padding in degrees to all angle cutoffs
force – Force a recalculation even if the pose has already been fingerprinted
debug – Increase verbosity for debugging
commit – commit the changes to the database (Default value = True)
mutation_warnings – warn when there has been a mutation in the protein (Default value = True)
delete_temp_table – delete the temporary interaction table created during interaction resolution (Default value = True)
- calculate_prolif_interactions(return_all: bool = False, max_retry: int = 5, use_mda: bool = False, force: bool = False, clear_existing: bool = True, debug: bool = False, resolve: bool = True) prolif.Fingerprint[source]
Use ProLIF to populate the interactions table
- property classic_fingerprint: <property object at 0x74862ad95c10>
Classic HIPPO fingerprint dictionary, mapping protein
FeatureID’s to the number of corresponding ligand features (from anyPose)
- property compound_id: int
Returns the pose’s associated compound ID
- property dict: dict
Serialised dictionary representing the pose
- property distance_score: float | None
Distance score of the Pose (w.r.t. its inspirations), in Angstroms
- draw(inspirations: bool = True, protein: bool = False, **kwargs) None[source]
Render this pose (and its inspirations)
- Parameters:
inspirations – Render the inspirations? (Default value = True)
protein – Render the protein? This wraps
Pose.render()(Default value = False)
- property energy_score: float | None
Energy score of the Pose (kcal/mol)
- property features: list[molparse.rdkit.Feature]
Returns the pose’s features
- get_dict(mol: bool = False, inspirations: bool | str = True, subsites: bool | str = True, reference: bool | str = True, metadata: bool = True, duplicate_name: str | bool = False, sanitise_null_metadata_values: bool = False, skip_metadata: list[str] | None = None, sanitise_tag_list_separator: str | None = None, sanitise_metadata_list_separator: str | None = ';', tags: bool = True) <property object at 0x74862ad95c10>[source]
Returns a dictionary representing this Pose. Arguments:
- Parameters:
mol – Include a
rdkit.Chem.Molin the output? (Default value = False)inspirations – Include inspirations?
[True, False, 'names']Specifynamesto format as a comma-separated string (Default value = True)subsites – Include subsites?
[True, False, 'names']Specifynamesto format as a comma-separated string (Default value = True)reference – Include reference?
[True, False, 'name']Specifynameto include thePosename rather than it’s ID (Default value = True)metadata – Include metadata? (Default value = True)
duplicate_name – Specify the name of a new column duplicating the pose name column (Default value = False)
tags – bool: Include tags? (Default value = True)
- property has_complex_pdb_path: bool
Does this pose have a PDB file?
- property has_fingerprint: bool
Does the pose have a fingerprint?
- property id: int
Returns the pose’s database ID
- property inchikey: str
Returns the pose’s inchikey
- property inspiration_score: float | None
inspiration score of the Pose in range 0.00-1.00
- property interactions: InteractionSet
Get a
InteractionSetfor thisPose
- property mol: rdkit.Chem.Mol
Returns a pose’s rdkit.Chem.Mol
- property name: str
Returns the pose’s name
- property num_atoms_added: int
Calculate the number of atoms added relative to the base or inspirations
- property num_atoms_added_wrt_bases: int | list[int] | None
Calculate the number of atoms added relative to the base
- property num_atoms_added_wrt_inspirations: int | None
Calculate the number of atoms added relative to its inspirations
- property num_bases: int
Get the number of base scaffolds
- property num_heavy_atoms: int
Number of heavy atoms
- property path: str
Returns the pose’s path
- plot3d(features: bool = False, **kwargs) plotly.graph_objects.Figure[source]
Use Molparse/Plotly to create a 3d figure of this pose
- Parameters:
features – include the features in the figure
- Returns:
a plotly Figure object
- posebusters(debug: bool = False) bool[source]
Run a posebusters ligand check on this pose’s molecule
- property protein_system: molparse.System
Returns the pose’s protein molparse.System
- property protonated_mol: rdkit.Chem.Mol
Guess hydrogen positions
- property reference_id: int
Returns the pose’s protein reference ID
- render(protein='cartoon', ligand='stick', protein_color='spectrum', interactions: bool = True, file: str | None = None) None[source]
Render this pose with the protein using py3Dmol
- Parameters:
protein – protein representation, default = ‘cartoon’
ligand – ligand representation, default = ‘stick’
protein_color – color of protein representation, default = ‘spectrum’
- score_inspiration(debug: bool = False, draw: bool = False, return_all: bool = False) float[source]
Score how well this Pose recapitulates the pharmacophoric features of its inspirations.
- Parameters:
debug – Increased verbosity for debugging (Default value = False)
draw – Render each inspiration pose with it’s features, the derivative with the combined features of the inspirations, and the derivative with it’s features. (Default value = False)
- set_has_fingerprint(fp: bool, commit: bool = True) None[source]
Update the database to reflect this pose’s has_fingerprint property
- showcase() None[source]
Print and render this pose as if you were using
PoseSet.interactive()
- property smiles: str
Returns the pose’s smiles
- summary(metadata: bool = True, tags: bool = True, subsites: bool = True) None[source]
Print a summary of this pose
- Parameters:
metadata – include metadata (Default value = True)
- property table: str
Get the name of the database table
- to_syndirella(out_key: str | Path) DataFrame[source]
Create syndirella inputs. See
PoseSet.to_syndirella()
- class hippo.pset.PoseTable(db: Database)[source]
Class representing all
Poseobjects in the ‘pose’ table of theDatabase.Attention
PoseTableobjects should not be created directly. Instead use theHIPPO.poses()property. See Getting started with HIPPO and Adding data into HIPPO.Use as an iterable
Iterate through
Poseobjects in the table:for pose in animal.poses: ...
Selecting poses in the table
The
PoseTablecan be indexed withPoseIDs, names, aliases, or list/sets/tuples/slices thereof:ptable = animal.poses # indexing individual compounds pose = ptable[13] # using the ID pose = ptable["BSYNRYMUTXBXSQ-UHFFFAOYSA-N"] # using the InChIKey pose = ptable["Ax0310a"] # using the alias # getting a subset of compounds pset = ptable[13,15,18] # using IDs (tuple) pset = ptable[[13,15,18]] # using IDs (list) pset = ptable[set(13,15,18)] # using IDs (set) pset = ptable[13:18] # using a slice
Tags and target IDs can also be used to filter:
pset = animal.poses(tag='hits') # select compounds tagged with 'hits' pset = animal.poses(target=1) # select poses from the first target
- self(*, tag: str | None = None, target: int | None = None, subsite: int | None = None, smiles: str | None = None) PoseSet[source]
Filter poses by a given tag, subsite ID, or target ID. See
PoseTable.get_by_tag(),PoseTable.get_by_target(), amdPoseTable.get_by_subsite()
- self[key: int | str | tuple | list | set | slice] Pose[source]
Get a member
Poseobject or subsetPoseSetthereof.- Parameters:
key – Can be an integer ID, negative integer index, alias or inchikey string, list/set/tuple of IDs, or slice of IDs
- property aliases: list[str]
Returns the aliases of child poses
- draw(max_draw: int = 100) None[source]
Render the poses
- Parameters:
max_draw – show a warning if trying to draw more than this number of poses (Default value = 100)
- get_by_metadata(key: str, value: str | None = None) PoseSet[source]
Get all child poses by their metadata. If no value is passed, then simply containing the key in the metadata dictionary is sufficient
- Parameters:
key – metadata key to match
value – metadata value to match, if
Noneany pose with the key present will be returned (Default value = None)
- Returns:
a
PoseSetof the subset
- get_by_metadata_substring_match(substring: str) PoseSet[source]
Get
PoseSetof poses with metadata JSON containing substring
- get_by_subsite(*, id: int) PoseSet[source]
Get all child poses with a certain
SubsiteID:- Parameters:
id –
SubsiteID- Returns:
a
PoseSetof the subset
- get_by_tag(tag: str, inverse: bool = False) PoseSet[source]
Get all child poses with a certain tag
- Parameters:
tag – tag to search for
inverse – invert the selection
- Returns:
a
PoseSetof the subset
- property id_name_dict: dict[int, str]
Return a dictionary mapping pose ID’s to their name
- property ids: list[int]
Returns the IDs of child poses
- property inchikeys: list[str]
Returns the inchikeys of child poses
- property interactions: InteractionSet
Get a
InteractionSet
- interactive() None[source]
Interactive widget to navigate poses in the table
Attention
This method instantiates a
PoseSetcontaining all poses, it is recommended to instead select a subset for display. This method is only intended for use within a Jupyter Notebook.
- property name: str | None
Returns the name of set
- property names: list[str]
Returns the aliases of child poses
- property num_fingerprinted: int
Count the number of fingerprinted poses
- property tags: set[str]
Returns the set of unique tags present in this pose set
- class hippo.pset.PoseSet(db: Database, indices: list = None, *, sort: bool = True, name: str | None = None)[source]
Object representing a subset of the ‘pose’ table in the
Database.Attention
PoseSetobjects should not be created directly. Instead use theHIPPO.poses()property. See Getting started with HIPPO and Adding data into HIPPO.Use as an iterable
Iterate through
Poseobjects in the set:pset = animal.poses[:100] for pose in pset: ...
Check membership
To determine if a
Poseis present in the set:is_member = pose in cset
Selecting compounds in the set
The
PoseSetcan be indexed like standard Python lists by their indicespset = animal.poses[1:100] # indexing individual compounds pose = pset[0] # get the first pose pose = pset[1] # get the second pose pose = pset[-1] # get the last pose # getting a subset of compounds using a slice pset2 = pset[13:18] # using a slice
- self(*, tag: str = None, target: int = None, subsite: int = None) PoseSet[source]
Filter poses by a given tag, Subsite ID, or target ID. See
PoseSet.get_by_tag(),PoseSet.get_by_target(), amdPoseSet.get_by_subsite()
- self[key: int | slice] Pose | PoseSet[source]
Get poses or subsets thereof from this set
- Parameters:
key – integer index or slice of indices
- property aliases: list[str]
Returns the aliases of child poses
- append_to_metadata(key, value) None[source]
Append a specific item to list-like values associated with a given key for all member’s metadata dictionaries
- Parameters:
key – the
Metadatakey to matchvalue – the value to append to the list
- property avg_distance_score: float
Average distance score of poses in this set
- property avg_energy_score: float
Average energy score of poses in this set
- property best_placed_pose_id: int
Get the id of the pose with the best distance_score in this subset
- calculate_inspiration_scores(alpha: float = 0.95, beta: float = 0.05, score_type: str = 'combo') pd.DataFrame[source]
Set inspiration_score values using MoCASSIn.calculate_mocassin_tversky
- Parameters:
alpha – Tversky alpha parameter
beta – Tversky beta parameter
score_type – Score type to add to database, choose from “combo”, “shape”, “colour”
- Returns:
Pandas DataFrame with molecules and scores
- property compounds: CompoundSet
Get the compounds associated to this set of poses
- property df: pandas.DataFrame
Get a DataFrame of the poses in this set
- filter(function=None, *, key: str = None, value: str = None, operator='=', inverse: bool = False)[source]
Filter this
PoseSetby selecting members wherefunction(pose)is truthy or pass a key, value, and optional operator to search by database values- Parameters:
function – callable object
key – database field for ‘pose’ table (’pose_’ prefix not needed)
value – value to compare to
operator – comparison operator (default = “=”)
inverse – invert the selection (Default value = False)
- property fraction_fingerprinted: float
Return the fraction of fingerprinted poses in this set
- get_best_placed_poses_per_compound()[source]
Choose the best placed pose (best distance_score) grouped by compound
- get_by_inspiration(inspiration: int | Pose, inverse: bool = False)[source]
Get all child poses with with this inspiration.
- Parameters:
inspiration – inspiration
PoseID or objectinverse – invert the selection (Default value = False)
- get_by_metadata(key: str, value: str | None = None, debug: bool = False) PoseSet[source]
Get all child poses with by their metadata. If no value is passed, then simply containing the key in the metadata dictionary is sufficient
- Parameters:
key – metadata key to search for
value – metadata value, if
Nonereturn poses with the metadata key regardless of value (Default value = None)
- get_by_reference(ref_id: int) PoseSet | None[source]
Get poses with a certain reference id
- Parameters:
ref_id – reference
PoseID
- get_by_subsite(*, id: int) PoseSet | None[source]
Select a subset of this
PoseSetby the associatedSubsite.- Parameters:
id –
SubsiteID- Returns:
a
PoseSetof the selection
- get_by_tag(tag: str, inverse: bool = False) PoseSet[source]
Get all child poses with a certain tag
- Parameters:
tag – tag to filter by
inverse – return all poses not tagged with
tag(Default value = False)
- get_df(smiles: bool = True, inchikey: bool = True, alias: bool = True, name: bool = True, compound_id: bool = False, target_id: bool = False, reference_id: bool = False, path: bool = False, mol: bool = False, energy_score: bool = False, distance_score: bool = False, inspiration_score: bool = False, metadata: bool = False, expand_metadata: bool = True, debug: bool = True, inspiration_ids: bool = False, inspiration_aliases: bool = False, derivative_ids: bool = False, tags: bool = False, subsites: bool = False) pandas.DataFrame[source]
Get a DataFrame of the poses in this set.
- Parameters:
smiles – include SMILES column (Default value = True)
inchikey – include InChIKey column (Default value = True)
alias – include alias column (Default value = True)
name – include name column (Default value = True)
compound_id – include
CompoundID column (Default value = False)reference_id – include reference
PoseID column (Default value = False)target_id – include reference
TargetID column (Default value = False)path – include path column (Default value = False)
mol – include
rdkit.Chem.Molin output (Default value = False)energy_score – include energy_score column (Default value = False)
distance_score – include distance_score column (Default value = False)
inspiration_score – include inspiration_score column (Default value = False)
metadata – include metadata in output (Default value = False)
expand_metadata – create separate column for each metadata key (Default value = True)
inspiration_ids – include inspiration
PoseID columninspiration_aliases – include inspiration
Posealias columnderivative_ids – include derivative
PoseID columntags – include tags column
subsites – include subsites column
- property id_name_dict: dict
Return a dictionary mapping pose ID’s to their name
- property ids: list[int]
Returns the ids of poses in this set
- property inchikeys: list[str]
Returns the inchikeys of child poses
- property indices: list[int]
Returns the ids of poses in this set
- property inspirations: int
Return the number of unique inspirations for poses in this set
- property interaction_overlap_score: int
Count the number of member pose pairs which share at least one but not all interactions
- property interactions: InteractionSet
Get a
InteractionSetfor thisPose
- interactive(print_name: str = True, method: str | None = None, function: Callable | None = None, **kwargs)[source]
Interactive widget to navigate compounds in the table
- property mols: list[rdkit.Chem.mol]
Get the rdkit Molecules contained in this set
- property name: str | None
Returns the name of set
- property names: list[str]
Returns the aliases of poses in this set
- property num_compounds: int
Count the compounds associated to this set of poses
- property num_fingerprinted: int
Count the number of fingerprinted poses in this set
- property num_inspiration_sets: int
Return the number of unique sets of inspirations
- property num_inspirations: int
Return the number of unique inspirations for poses in this set
- property num_subsites: int
Count the number of subsites that poses in this set come into contact with
- property reference
Bulk set the references for poses in this set
- property reference_ids: set[int]
Return a set of
PoseID’s of the all the distinct references in thisPoseSet
- set_subsites_from_metadata_field(field='CanonSites alias') None[source]
Create and assign subsite entries from a metadata field
- Parameters:
field – the metadata field to use
- property smiles: list[str]
Returns the smiles of poses in this set
- split_by_inspirations(single_set: bool = False) dict[int, PoseSet] | PoseSet[source]
Split this
PoseSetinto subsets grouped by inspirations
- split_by_reference() dict[int, PoseSet][source]
Split this
PoseSetinto subsets grouped by reference ID
- property subsite_balance: float
Measure of how evenly subsite counts are distributed across poses in this set
- property subsite_ids: set[int]
Return a list of subsite id’s of member poses
- property tags: set[str]
Returns the set of unique tags present in this pose set
- to_fragalysis(out_path: str, *, method: str, ref_url: str = 'https://hippo.winokan.com', submitter_name: str, submitter_email: str, submitter_institution: str, metadata: bool = True, sort_by: str | None = None, sort_reverse: bool = False, generate_pdbs: bool = False, copy_reference_pdbs: bool = False, skip_no_reference: bool = True, skip_no_inspirations: bool = True, skip_metadata: list[str] | None = None, tags: bool = True, subsites: bool = True, extra_cols: dict[str, list] = None, **kwargs)[source]
Prepare an SDF for upload to the RHS of Fragalysis.
- Parameters:
out_path – the file path to write to
method – method used to generate the compounds
ref_url – reference URL for the method
submitter_name – name of the person submitting the compounds
submitter_email – email of the person submitting the compounds
submitter_institution – institution name of the person submitting the compounds
metadata – include metadata in the output? (Default value = True)
skipmetadata – exclude metadata keys from output
sort_by – if set will sort the SDF by this column/field (Default value = None)
sort_reverse – reverse the sorting (Default value = False)
generate_pdbs – generate accompanying protein-ligand complex PDBs (Default value = False)
ingredients – get procurement and amount information from this
IngredientSet(Default value = None)tags – include a column for tags in the output (Default value = True)
subsites – include a column for subsites in the output (Default value = True)
extra_cols – extra_cols should be a dictionary with a key for each column name, and list values where the first element is the field description, and all subsequent elements are values for each pose.
name – How to determine the molecule name, see
PoseSet.get_df()
- to_knitwork(out_path: str, path_root: str = '.', aligned_files_dir: str | None = None) None[source]
Knitwork takes a CSV input with:
observation shortcode
smiles
path_to_ligand_mol
path_to_pdb
- Parameters:
out_path – path to output CSV
path_root – paths in CSV will be relative to here
- to_pymol(prefix: str | None = None) None[source]
Group the poses by reference protein and inspirations and output relevant PDBs and SDFs.
- Parameters:
prefix – prefix to give all output subdirectories (Default value = None)
- to_syndirella(out_key: str | Path, separate: bool = False) DataFrame[source]
Create syndirella inputs
- write_sdf(out_path: str, name_col: str = 'alias', inspiration_ids: bool = False, inspiration_aliases: bool = False, **kwargs) None[source]
Write an SDF
- Parameters:
out_path – filepath of the output
name_col – pose property to use as the name column, can be
["name", "alias", "inchikey", "id"](Default value = ‘name’)inspiration_ids – include inspiration
PoseID columninspiration_aliases – include inspiration
Posealias columnfragalysis_inspirations – create inspirations column “ref_mols”