HIPPO “animal” object
- class hippo.animal.HIPPO(name: str, db_path: str | Path, copy_from: str | Path | None = None, overwrite_existing: bool = False, update_legacy: bool = False)[source]
The
HIPPOanimal class. Instantiating aHIPPOobject will create or link aHIPPODatabase.from hippo import HIPPO animal = HIPPO(project_name, db_path)
Attention
In addition to this API reference please see the tutorial pages Getting started with HIPPO and insert_elaborations.
- Parameters:
- Returns:
HIPPOobject
HIPPO initialisation
- getattr(self, key: str)[source]
Get a
Compound,Pose, orReactionby its ID. SeeHIPPO.get_by_shorthand()
- add_enamine_quote(path: str | Path, *, orig_name_col: str = 'Customer Code', price_col: str | None = None, fixed_amount: float | None = None, fixed_lead_time: float | None = False, fixed_purity: float | None = False, entry_col: str = 'Catalog ID', catalogue_col: str = 'Collection', smiles_col: str = 'SMILES', amount_col: str = 'Amount, mg', purity_col: str = 'Purity, %', lead_time_col: str | None = 'Lead time', stop_after: None | int = None, orig_name_is_hippo_id: bool = False, allow_no_catalogue_col: bool = False, delete_unavailable: bool = True, overwrite_existing_quotes: bool = False, supplier_name: str = 'Enamine', currency: str = None, dry_run: bool = False)[source]
Load an Enamine quote provided as an excel file
- Parameters:
path – Path to the excel file
orig_name_col – Column name of the original alias, defaults to ‘Customer Code’
entry_col – Column name of the catalogue ID/entry, defaults to ‘Catalog ID’
price_col – Column name of the price, defaults to ‘Price, EUR’ or ‘Price, USD’ if present
catalogue_col – Column name of the price, defaults to ‘Price, EUR’ or ‘Price, USD’ if present
fixed_amount – Optionally use a fixed amount for all quotes (in mg)
fixed_lead_time – Optionally use a fixed lead time for all quotes (in days)
stop_after – Stop after given number of rows, defaults to
Noneorig_name_is_hippo_id – Set to
Trueiforig_name_colis the original HIPPO :class:hippo.compound.CompoundID, defaults toFalsedelete_unavailable – Delete existing Enamine database quotes for compounds that are unavailable in the quote being loaded
overwrite_existing_quotes – Delete existing Enamine database quotes for compounds that are available in the quote being loaded
dry_run – Stop before any database modification, return first quote data to be inserted
currency – Specify currency if non-standard price column
- Returns:
An
IngredientSetof the quoted molecules
- add_hits(target_name: str, metadata_csv: str | Path, aligned_directory: str | Path, tags: list | None = None, skip: list | None = None, debug: bool = False, load_pose_mols: bool = False) DataFrame[source]
Load in crystallographic hits from a Fragalysis download or XChemAlign alignment.
For a Fragalysis download aligned_directory and metadata_csv should point to the aligned_files and metadata.csv at the root of the extracted download. For an XChemAlign dataset the aligned_directory should point to the aligned_files.
- Parameters:
target_name – Name of this protein
Targetmetadata_csv – Path to the metadata.csv from the Fragalysis download
aligned_directory – Path to the aligned_files directory from the Fragalysis download
skip – optional list of observation names to skip
debug – bool: (Default value = False)
- Returns:
a DataFrame of metadata
- add_mcule_quote(path: str | Path)[source]
Load an MCule quote provided as an excel file
- Parameters:
path – Path to the excel file
- Returns:
An
IngredientSetof the quoted molecules
- add_soakdb_compounds(path: str | Path, smiles_col: str = 'CompoundSMILES', alias_col: str = 'CompoundCode', update_aliases: bool = True, soak_count_to_metadata: bool = True, sanitisation_verbosity: bool = False, stop_after: int | None = None) CompoundSet[source]
Registers compounds with aliases and metadata from a SoakDB file
- Parameters:
path – Path to SoakDB CSV or SQLite file
- Returns:
CompoundSetof registered/matched compounds
- add_syndirella_elabs(df_path: str | Path, max_energy_score: float | None = 0.0, max_distance_score: float | None = 2.0, require_intra_geometry_pass: bool = True, reject_flags: list[str] | None = None, register_reactions: bool = True, dry_run: bool = False, scaffold_route: Route | None = None, scaffold_compound: Compound | None = None, pose_tags: list[str] | None = None, product_tags: list[str] | None = None) pd.DataFrame[source]
Load Syndirella elaboration compounds and poses from a pickled DataFrame
- Parameters:
df_path – Path to the pickled DataFrame
max_energy_score – Filter out poses with ∆∆G above this value
max_distance_score – Filter out poses with comRMSD above this value
require_intra_geometry_pass – Filter out poses with falsy intra_geometry_pass values
reject_flags – Filter out rows flagged with strings from this list (default = [“one_of_multiple_products”, “selectivity_issue_contains_reaction_atoms_of_both_reactants”])
scaffold_route – Supply a known single-step route to the scaffold product to use if scaffold placements are missing
scaffold_compound – Supply a
Compoundfor the scaffold product to use if scaffold placements are missingdry_run – Don’t insert new records into the database (for debugging/testing)
pose_tags – Add these tags to all inserted poses, defaults to [“syndirella_product”, “syndirella_placed”]
product_tags – Add these tags to all inserted product compounds, defaults to [“syndirella_product”]
- Returns:
annotated DataFrame
- add_syndirella_routes(pickle_path: str | Path, CAR_only: bool = True, pick_first: bool = True, check_chemistry: bool = True, register_routes: bool = True) DataFrame[source]
Add routes found from syndirella –just_retro query
- add_syndirella_scaffolds(output_directory: str | Path, *, pattern: str = '*-*-?-scaffold-check/scaffold-*', tags: None | list[str] = None, target: int | str = 1, debug: bool = False) None[source]
Load Poses from Syndirella “scaffold-check” outputs
- Parameters:
df_path – Path to the pickled DataFrame or SDF.
tags – list of tags to assign to compounds and poses, defaults to
Nonetarget –
TargetID or namepattern – UNIX pattern by which to search for subdirectories
debug – Increase verbosity of output, defaults to
False
- Returns:
None
- property compounds: CompoundTable
Access compounds in the Database
- property db_path: str
Returns the database path
- property elabs: CompoundSet
Returns compounds that are an based on another
- get_scaffold_network(compounds: CompoundSet | None = None, scaffolds: CompoundSet | None = None, notebook: bool = True, depth: int = 5, scaffold_tag: str | None = None, exclude_tag: str | None = None, physics: bool = True, arrows: bool = True) pyvis.network.Network[source]
Use PyVis to display a network of molecules connected by scaffold relationships in the database
- property interactions: InteractionTable
Access Interactions in the Database
- property intermediates: CompoundSet
Returns all compounds that are products and reactants of
Reaction
- load_sdf(*, target: str, path: str | Path, reference: int | Pose | None = None, inspirations: list[int] | PoseSet | None = None, compound_tags: None | list[str] = None, pose_tags: None | list[str] = None, mol_col: str = 'ROMol', name_col: str | None = 'ID', inspiration_col: str | None = 'ref_mols', reference_col: str = 'ref_pdb', energy_score_col: str = 'energy_score', distance_score_col: str = 'distance_score', inspiration_map: None | dict = None, convert_floats: bool = True, skip_equal_dict: dict | None = None, skip_not_equal_dict: dict | None = None) None[source]
Add posed virtual hits from an SDF into the database.
- Parameters:
target – Name of the protein
Targetpath – Path to the SDF
reference – Optional single reference
Poseto use as the protein conformation for all poses, defaults toNonereference_col – Column that contains reference
Posealiases or ID’scompound_tags – List of string Tags to assign to all created compounds, defaults to
Nonepose_tags – List of string Tags to assign to all created poses, defaults to
Nonemol_col – Name of the column containing the
rdkit.ROMolligands, defaults to"ROMol"name_col – Name of the column containing the ligand name/alias, defaults to
"ID"inspirations – Optional single set of inspirations
PoseSetobject or list of IDs to assign as inspirations to all inserted poses, defaults toNoneinspiration_col – Name of the column containing the list of inspiration
Posenames or ID’s, defaults to"ref_mols"inspiration_map – Optional dictionary or callable mapping between inspiration strings found in
inspiration_colandPoseidsenergy_score_col – Name of the column containing the list of energy scores
"energy_score"distance_score_col – Name of the column containing the list of distance scores, defaults to
"distance_score"convert_floats – Try to convert all values to
float, defaults toTrueskip_equal_dict – Skip rows where
any(row[key] == value for key, value in skip_equal_dict.items()), defaults toNoneskip_not_equal_dict – Skip rows where
any(row[key] != value for key, value in skip_not_equal_dict.items()), defaults toNone
All non-name columns are added to the Pose metadata. N.B. separate .mol files are not created. The molecule binary will only be stored in the .sqlite file and fake paths are added to the database.
- property name: str
Returns the project name
- Returns:
project name
- property num_compounds: int
Total number of Compounds in the Database
- property num_elabs: int
Number of compounds that are an elaboration of an existing scaffold
- property num_intermediates: int
Returns the number of intermediates (see
intermediates())
- property num_poses: int
Total number of Poses in the Database
- property num_products: int
Returns the number of products (see
products())
- property num_reactants: int
Returns the number of reactants (see
reactants())
- property num_reactions: int
Total number of Reactions in the Database
- property num_scaffolds: int
Number of compounds that are the basis for elaborations
- property num_tags: int
Number of unique Tags in the Database
- plot_compound_availability(compounds=None, **kwargs) plotly.graph_objects.Figure[source]
Plot a bar chart of compound availability by supplier/catalogue, see
hippo.plotting.plot_compound_availability()
- plot_compound_availability_venn(compounds, **kwargs) plotly.graph_objects.Figure[source]
Plot a venn diagram of compound availability by supplier/catalogue, see
hippo.plotting.plot_compound_availability()
- plot_compound_price(min_amount, compounds=None, plot_lead_time=False, style='histogram', **kwargs) plotly.graph_objects.Figure[source]
Plot a bar chart of minimum compound price for a given minimum amount, see
hippo.plotting.plot_compound_price()
- plot_compound_property(prop, **kwargs) plotly.graph_objects.Figure[source]
Plot an arbitrary compound property across the whole dataset, see
hippo.plotting.plot_compound_property()
- plot_interaction_punchcard(poses=None, subtitle=None, opacity=1.0, **kwargs) plotly.graph_objects.Figure[source]
Plot an interaction punchcard for a set of poses, see
hippo.plotting.plot_interaction_punchcard()
- plot_interaction_punchcard_by_tags(tags: dict[str, str] | list[str], **kwargs) plotly.graph_objects.Figure[source]
Plot an interaction punchcard for a set of poses associated to given tags, see
hippo.plotting.plot_interaction_punchcard_by_tags()
- plot_pose_interactions(pose: Pose, **kwargs) plotly.graph_objects.Figure[source]
3d figure showing the interactions between a
Poseand the protein. seehippo.plotting.plot_pose_interactions()
- plot_pose_property(prop, **kwargs) plotly.graph_objects.Figure[source]
Plot an arbitrary pose property across the whole dataset, see
hippo.plotting.plot_pose_property()
- plot_reaction_funnel(**kwargs) plotly.graph_objects.Figure[source]
Plot a funnel chart of the reactants, intermediates, and products across the whole dataset, see
hippo.plotting.plot_reaction_funnel()
- plot_residue_interactions(residue_number: int, poses: str | None = None, **kwargs) plotly.graph_objects.Figure[source]
Plot an interaction punchcard for a set of poses, see
hippo.plotting.plot_residue_interactions()
- plot_tag_statistics(*args, **kwargs) plotly.graph_objects.Figure[source]
Plot an overview of the number of compounds and poses for each tag, see
hippo.plotting.plot_tag_statistics()
- property products: CompoundSet
Returns all compounds that are products of at least one
Reaction(and not reactants of others)
- quote_compounds(ref_animal: HIPPO, compounds: CompoundSet | None = None, *, debug: bool = False) CompoundSet, CompoundSet[source]
Transfer quotes from another reference
HIPPOanimal object (e.g. the one from https://github.com/mwinokan/EnamineCatalogs)- Parameters:
ref_animal – The reference
HIPPOanimal to fetch quotes fromcompounds – A
CompoundSetcontaining the compounds to be quoted
- quote_intermediates(ref_animal: HIPPO) None[source]
Get batch quotes for all reactants in the database
- Parameters:
ref_animal – The reference
HIPPOanimal to fetch quotes from (e.g. the one from https://github.com/mwinokan/EnamineCatalogs)unquoted_only – Only request quotes for unquoted compouds, defaults to
False
- quote_reactants(ref_animal: HIPPO, *, unquoted_only: bool = False, supplier: str = 'any', debug: bool = False) None[source]
Get batch quotes for all reactants in the database
- Parameters:
ref_animal – The reference
HIPPOanimal to fetch quotes from (e.g. the one from https://github.com/mwinokan/EnamineCatalogs)unquoted_only – Only request quotes for unquoted compouds, defaults to
False
- property reactants: CompoundSet
Returns all compounds that are reactants for at least one
Reaction(and not products of others)
- property reactions: ReactionTable
Access Reactions in the Database
- register_compound(*, smiles: str, scaffolds: list[Compound] | list[int] | None = None, tags: None | list = None, metadata: None | dict = None, return_compound: bool = True, commit: bool = True, alias: str | None = None, return_duplicate: bool = False, register_scaffold_if_duplicate: bool = True, radical: str = 'warning', debug: bool = False) Compound[source]
Use a smiles string to add a compound to the database. If it already exists return the compound
- Parameters:
smiles – The SMILES string of the compound
bases – A list of
Compoundobjects or IDs that this compound is based on, defaults toNonetags – A list of tags to assign to this compound, defaults to
Nonemetadata – A dictionary of metadata to assign to this compound, defaults to
Nonereturn_compound – return the
Compoundobject instead of the integer ID, defaults toTruecommit – Commit the changes to the
Database, defaults toTruealias – The string alias of this compound, defaults to
Nonereturn_duplicate – If
Truereturns a boolean indicating if this compound previously existed, defaults toFalseregister_scaffold_if_duplicate – If this compound exists in the
Databasemodify it’sbaseproperty, defaults toTrueradical – Define the behaviour for dealing with radical atoms in the SMILES. See
sanitise_smiles. Defaults to'warning'debug – Increase verbosity of output, defaults to
False
- Returns:
The registered/existing
Compoundobject or its ID (depending onreturn_compound), and optionally a boolean to indicate duplication seereturn_duplicate
- register_compounds(*, smiles: list[str], radical: str = 'warning', sanitisation_verbosity: bool = True, debug: bool = False) list[tuple[str, str]][source]
Insert many compounds at once
- Parameters:
smiles – list of smiles strings
- Returns:
list of sanitised inchikey and smiles string pairs
- register_pose(*, compound: Compound | int, target: str, path: str, inchikey: str | None = None, alias: str | None = None, reference: int | None = None, tags: None | list = None, metadata: None | dict = None, inspirations: None | list[int | Pose] = None, return_pose: bool = True, energy_score: float | None = None, distance_score: float | None = None, commit: bool = True, overwrite_metadata: bool = True, warn_duplicate: bool = True, check_RMSD: bool = False, RMSD_tolerance: float = 1.0, split_PDB: bool = False, duplicate_alias: str = 'modify', resolve_path: bool = True, load_mol: bool = False) Pose[source]
Add a
Poseto theDatabase. If it already exists return the pose- Parameters:
compound – The
Compoundobject or ID that thisPoseis a conformer oftarget – The
Targetname or IDpath – Path to the
Pose’s conformer file (.pdb or .mol)alias – The string alias of this
Pose, defaults toNonereference – Reference
Poseto use as the protein conformation for all poses, defaults toNonetags – A list of tags to assign to this compound, defaults to
Nonemetadata – A dictionary of metadata to assign to this compound, defaults to
Noneinspirations – a list of inspiration
Poseobjects or ID’s, defaults toNoneenergy_score – assign an energy score to this
Pose, defaults toNonedistance_score – assign a distance score to this
Pose, defaults toNonecommit – Commit the changes to the
Database, defaults toTrueoverwrite_metadata – If a duplicate is found, overwrite its metadata, defaults to
Truewarn_duplicate – Warn if a duplicate
Poseexists, defaults toTruecheck_RMSD – Check the RMSD against existing
Pose, defaults toFalseRMSD_tolerance – Tolerance for
check_RMSDin Angstrom, defaults to1.0split_PDB – Register a
Posefor every ligand residue in the PDB, defaults toFalseduplicate_alias – In the case of a duplicate, define the behaviour for the
aliasproperty, defaults to'modify'which appends_copyto the alias. Set toerrorto raise an Exception.resolve_path – Resolve to an absoltue path, default = True.
load_mol – Parse the input file and load the ligand rdkit.Chem.Mol
- Returns:
The registered/existing
Poseobject or its ID (depending onreturn_pose)
- register_reaction(*, type: str, product: Compound | int, reactants: list[Compound | int], commit: bool = True, product_yield: float = 1.0, check_chemistry: bool = False) Reaction[source]
Add a
Reactionto theDatabase. If it already exists return the existing one- Parameters:
type – string indicating the type of reaction
product – The
Compoundobject or ID of the productreactants – A list of
Compoundobjects or IDs of the reactantscommit – Commit the changes to the
Database, defaults toTrueproduct_yield – The fraction of product yielded from this reaction
0 < product_yield <= 1.0, defaults to1.0check_chemistry – check the reaction chemistry, defaults to
True
- Returns:
The registered
Reaction
- register_reactions(*, types: list[str], product_ids: list[list[int]], reactant_id_lists: list[list[int]])[source]
Insert many reactions at once
- Parameters:
types – list of reaction type strings
reactant_id_lists – list of reactant compound id lists
product_ids – list of product compound ids
- Returns:
list of reaction ids
- register_target(name: str, warn_duplicate: bool = True) Target[source]
Register a new protein :class:`` to the Database
- Parameters:
param1 – this is a first param
param2 – this is a second param
- Returns:
this is a description of what is returned
- Raises:
keyError – raises an exception
- property scaffolds: CompoundSet
Returns compounds that are the basis for one or more elaborations