HIPPO “animal” object

class hippo.animal.HIPPO(name: str, db_path: str | Path, copy_from: str | Path | None = None, overwrite_existing: bool = False, update_legacy: bool = False)[source]

The HIPPO animal class. Instantiating a HIPPO object will create or link a HIPPO Database.

from hippo import HIPPO
animal = HIPPO(project_name, db_path)

Attention

In addition to this API reference please see the tutorial pages Getting started with HIPPO and insert_elaborations.

Parameters:
  • project_name – give this HIPPO a name

  • db_path – path where the Database will be stored

  • copy_from – optionally initialise this animal by copying the Database at this given path, defaults to None

Returns:

HIPPO object

HIPPO initialisation

getattr(self, key: str)[source]

Get a Compound, Pose, or Reaction by its ID. See HIPPO.get_by_shorthand()

self[key: str][source]

Get a Compound, Pose, or Reaction by its ID. See HIPPO.get_by_shorthand()

repr(self) str[source]

Returns a command line representation of this HIPPO

str(self) str[source]

Unformatted string representation of this HIPPO

add_enamine_quote(path: str | Path, *, orig_name_col: str = 'Customer Code', price_col: str | None = None, fixed_amount: float | None = None, fixed_lead_time: float | None = False, fixed_purity: float | None = False, entry_col: str = 'Catalog ID', catalogue_col: str = 'Collection', smiles_col: str = 'SMILES', amount_col: str = 'Amount, mg', purity_col: str = 'Purity, %', lead_time_col: str | None = 'Lead time', stop_after: None | int = None, orig_name_is_hippo_id: bool = False, allow_no_catalogue_col: bool = False, delete_unavailable: bool = True, overwrite_existing_quotes: bool = False, supplier_name: str = 'Enamine', currency: str = None, dry_run: bool = False)[source]

Load an Enamine quote provided as an excel file

Parameters:
  • path – Path to the excel file

  • orig_name_col – Column name of the original alias, defaults to ‘Customer Code’

  • entry_col – Column name of the catalogue ID/entry, defaults to ‘Catalog ID’

  • price_col – Column name of the price, defaults to ‘Price, EUR’ or ‘Price, USD’ if present

  • catalogue_col – Column name of the price, defaults to ‘Price, EUR’ or ‘Price, USD’ if present

  • fixed_amount – Optionally use a fixed amount for all quotes (in mg)

  • fixed_lead_time – Optionally use a fixed lead time for all quotes (in days)

  • stop_after – Stop after given number of rows, defaults to None

  • orig_name_is_hippo_id – Set to True if orig_name_col is the original HIPPO :class:hippo.compound.Compound ID, defaults to False

  • delete_unavailable – Delete existing Enamine database quotes for compounds that are unavailable in the quote being loaded

  • overwrite_existing_quotes – Delete existing Enamine database quotes for compounds that are available in the quote being loaded

  • dry_run – Stop before any database modification, return first quote data to be inserted

  • currency – Specify currency if non-standard price column

Returns:

An IngredientSet of the quoted molecules

add_hits(target_name: str, metadata_csv: str | Path, aligned_directory: str | Path, tags: list | None = None, skip: list | None = None, debug: bool = False, load_pose_mols: bool = False) DataFrame[source]

Load in crystallographic hits from a Fragalysis download or XChemAlign alignment.

For a Fragalysis download aligned_directory and metadata_csv should point to the aligned_files and metadata.csv at the root of the extracted download. For an XChemAlign dataset the aligned_directory should point to the aligned_files.

Parameters:
  • target_name – Name of this protein Target

  • metadata_csv – Path to the metadata.csv from the Fragalysis download

  • aligned_directory – Path to the aligned_files directory from the Fragalysis download

  • skip – optional list of observation names to skip

  • debug – bool: (Default value = False)

Returns:

a DataFrame of metadata

add_mcule_quote(path: str | Path)[source]

Load an MCule quote provided as an excel file

Parameters:

path – Path to the excel file

Returns:

An IngredientSet of the quoted molecules

add_soakdb_compounds(path: str | Path, smiles_col: str = 'CompoundSMILES', alias_col: str = 'CompoundCode', update_aliases: bool = True, soak_count_to_metadata: bool = True, sanitisation_verbosity: bool = False, stop_after: int | None = None) CompoundSet[source]

Registers compounds with aliases and metadata from a SoakDB file

Parameters:

path – Path to SoakDB CSV or SQLite file

Returns:

CompoundSet of registered/matched compounds

add_syndirella_elabs(df_path: str | Path, max_energy_score: float | None = 0.0, max_distance_score: float | None = 2.0, require_intra_geometry_pass: bool = True, reject_flags: list[str] | None = None, register_reactions: bool = True, dry_run: bool = False, scaffold_route: Route | None = None, scaffold_compound: Compound | None = None, pose_tags: list[str] | None = None, product_tags: list[str] | None = None) pd.DataFrame[source]

Load Syndirella elaboration compounds and poses from a pickled DataFrame

Parameters:
  • df_path – Path to the pickled DataFrame

  • max_energy_score – Filter out poses with ∆∆G above this value

  • max_distance_score – Filter out poses with comRMSD above this value

  • require_intra_geometry_pass – Filter out poses with falsy intra_geometry_pass values

  • reject_flags – Filter out rows flagged with strings from this list (default = [“one_of_multiple_products”, “selectivity_issue_contains_reaction_atoms_of_both_reactants”])

  • scaffold_route – Supply a known single-step route to the scaffold product to use if scaffold placements are missing

  • scaffold_compound – Supply a Compound for the scaffold product to use if scaffold placements are missing

  • dry_run – Don’t insert new records into the database (for debugging/testing)

  • pose_tags – Add these tags to all inserted poses, defaults to [“syndirella_product”, “syndirella_placed”]

  • product_tags – Add these tags to all inserted product compounds, defaults to [“syndirella_product”]

Returns:

annotated DataFrame

add_syndirella_routes(pickle_path: str | Path, CAR_only: bool = True, pick_first: bool = True, check_chemistry: bool = True, register_routes: bool = True) DataFrame[source]

Add routes found from syndirella –just_retro query

add_syndirella_scaffolds(output_directory: str | Path, *, pattern: str = '*-*-?-scaffold-check/scaffold-*', tags: None | list[str] = None, target: int | str = 1, debug: bool = False) None[source]

Load Poses from Syndirella “scaffold-check” outputs

Parameters:
  • df_path – Path to the pickled DataFrame or SDF.

  • tags – list of tags to assign to compounds and poses, defaults to None

  • targetTarget ID or name

  • pattern – UNIX pattern by which to search for subdirectories

  • debug – Increase verbosity of output, defaults to False

Returns:

None

property compounds: CompoundTable

Access compounds in the Database

property db: Database

Returns the Database object

property db_path: str

Returns the database path

property elabs: CompoundSet

Returns compounds that are an based on another

get_by_shorthand(key) Compound | Pose | Reaction[source]

Get a Compound, Pose, or Reaction by its ID

Parameters:

key – shortname of the object, e.g. C100 for Compound with id=100

Returns:

Compound, Pose, or Reaction object

get_scaffold_network(compounds: CompoundSet | None = None, scaffolds: CompoundSet | None = None, notebook: bool = True, depth: int = 5, scaffold_tag: str | None = None, exclude_tag: str | None = None, physics: bool = True, arrows: bool = True) pyvis.network.Network[source]

Use PyVis to display a network of molecules connected by scaffold relationships in the database

property interactions: InteractionTable

Access Interactions in the Database

property intermediates: CompoundSet

Returns all compounds that are products and reactants of Reaction

load_sdf(*, target: str, path: str | Path, reference: int | Pose | None = None, inspirations: list[int] | PoseSet | None = None, compound_tags: None | list[str] = None, pose_tags: None | list[str] = None, mol_col: str = 'ROMol', name_col: str | None = 'ID', inspiration_col: str | None = 'ref_mols', reference_col: str = 'ref_pdb', energy_score_col: str = 'energy_score', distance_score_col: str = 'distance_score', inspiration_map: None | dict = None, convert_floats: bool = True, skip_equal_dict: dict | None = None, skip_not_equal_dict: dict | None = None) None[source]

Add posed virtual hits from an SDF into the database.

Parameters:
  • target – Name of the protein Target

  • path – Path to the SDF

  • reference – Optional single reference Pose to use as the protein conformation for all poses, defaults to None

  • reference_col – Column that contains reference Pose aliases or ID’s

  • compound_tags – List of string Tags to assign to all created compounds, defaults to None

  • pose_tags – List of string Tags to assign to all created poses, defaults to None

  • mol_col – Name of the column containing the rdkit.ROMol ligands, defaults to "ROMol"

  • name_col – Name of the column containing the ligand name/alias, defaults to "ID"

  • inspirations – Optional single set of inspirations PoseSet object or list of IDs to assign as inspirations to all inserted poses, defaults to None

  • inspiration_col – Name of the column containing the list of inspiration Pose names or ID’s, defaults to "ref_mols"

  • inspiration_map – Optional dictionary or callable mapping between inspiration strings found in inspiration_col and Pose ids

  • energy_score_col – Name of the column containing the list of energy scores "energy_score"

  • distance_score_col – Name of the column containing the list of distance scores, defaults to "distance_score"

  • convert_floats – Try to convert all values to float, defaults to True

  • skip_equal_dict – Skip rows where any(row[key] == value for key, value in skip_equal_dict.items()), defaults to None

  • skip_not_equal_dict – Skip rows where any(row[key] != value for key, value in skip_not_equal_dict.items()), defaults to None

All non-name columns are added to the Pose metadata. N.B. separate .mol files are not created. The molecule binary will only be stored in the .sqlite file and fake paths are added to the database.

property name: str

Returns the project name

Returns:

project name

property num_compounds: int

Total number of Compounds in the Database

property num_elabs: int

Number of compounds that are an elaboration of an existing scaffold

property num_intermediates: int

Returns the number of intermediates (see intermediates())

property num_poses: int

Total number of Poses in the Database

property num_products: int

Returns the number of products (see products())

property num_reactants: int

Returns the number of reactants (see reactants())

property num_reactions: int

Total number of Reactions in the Database

property num_scaffolds: int

Number of compounds that are the basis for elaborations

property num_tags: int

Number of unique Tags in the Database

plot_compound_availability(compounds=None, **kwargs) plotly.graph_objects.Figure[source]

Plot a bar chart of compound availability by supplier/catalogue, see hippo.plotting.plot_compound_availability()

plot_compound_availability_venn(compounds, **kwargs) plotly.graph_objects.Figure[source]

Plot a venn diagram of compound availability by supplier/catalogue, see hippo.plotting.plot_compound_availability()

plot_compound_price(min_amount, compounds=None, plot_lead_time=False, style='histogram', **kwargs) plotly.graph_objects.Figure[source]

Plot a bar chart of minimum compound price for a given minimum amount, see hippo.plotting.plot_compound_price()

plot_compound_property(prop, **kwargs) plotly.graph_objects.Figure[source]

Plot an arbitrary compound property across the whole dataset, see hippo.plotting.plot_compound_property()

plot_interaction_punchcard(poses=None, subtitle=None, opacity=1.0, **kwargs) plotly.graph_objects.Figure[source]

Plot an interaction punchcard for a set of poses, see hippo.plotting.plot_interaction_punchcard()

plot_interaction_punchcard_by_tags(tags: dict[str, str] | list[str], **kwargs) plotly.graph_objects.Figure[source]

Plot an interaction punchcard for a set of poses associated to given tags, see hippo.plotting.plot_interaction_punchcard_by_tags()

plot_pose_interactions(pose: Pose, **kwargs) plotly.graph_objects.Figure[source]

3d figure showing the interactions between a Pose and the protein. see hippo.plotting.plot_pose_interactions()

plot_pose_property(prop, **kwargs) plotly.graph_objects.Figure[source]

Plot an arbitrary pose property across the whole dataset, see hippo.plotting.plot_pose_property()

plot_reaction_funnel(**kwargs) plotly.graph_objects.Figure[source]

Plot a funnel chart of the reactants, intermediates, and products across the whole dataset, see hippo.plotting.plot_reaction_funnel()

plot_residue_interactions(residue_number: int, poses: str | None = None, **kwargs) plotly.graph_objects.Figure[source]

Plot an interaction punchcard for a set of poses, see hippo.plotting.plot_residue_interactions()

plot_tag_statistics(*args, **kwargs) plotly.graph_objects.Figure[source]

Plot an overview of the number of compounds and poses for each tag, see hippo.plotting.plot_tag_statistics()

property poses: PoseTable

Access Poses in the Database

property products: CompoundSet

Returns all compounds that are products of at least one Reaction (and not reactants of others)

quote_compounds(ref_animal: HIPPO, compounds: CompoundSet | None = None, *, debug: bool = False) CompoundSet, CompoundSet[source]

Transfer quotes from another reference HIPPO animal object (e.g. the one from https://github.com/mwinokan/EnamineCatalogs)

Parameters:
  • ref_animal – The reference HIPPO animal to fetch quotes from

  • compounds – A CompoundSet containing the compounds to be quoted

quote_intermediates(ref_animal: HIPPO) None[source]

Get batch quotes for all reactants in the database

Parameters:
quote_reactants(ref_animal: HIPPO, *, unquoted_only: bool = False, supplier: str = 'any', debug: bool = False) None[source]

Get batch quotes for all reactants in the database

Parameters:
property reactants: CompoundSet

Returns all compounds that are reactants for at least one Reaction (and not products of others)

property reactions: ReactionTable

Access Reactions in the Database

register_compound(*, smiles: str, scaffolds: list[Compound] | list[int] | None = None, tags: None | list = None, metadata: None | dict = None, return_compound: bool = True, commit: bool = True, alias: str | None = None, return_duplicate: bool = False, register_scaffold_if_duplicate: bool = True, radical: str = 'warning', debug: bool = False) Compound[source]

Use a smiles string to add a compound to the database. If it already exists return the compound

Parameters:
  • smiles – The SMILES string of the compound

  • bases – A list of Compound objects or IDs that this compound is based on, defaults to None

  • tags – A list of tags to assign to this compound, defaults to None

  • metadata – A dictionary of metadata to assign to this compound, defaults to None

  • return_compound – return the Compound object instead of the integer ID, defaults to True

  • commit – Commit the changes to the Database, defaults to True

  • alias – The string alias of this compound, defaults to None

  • return_duplicate – If True returns a boolean indicating if this compound previously existed, defaults to False

  • register_scaffold_if_duplicate – If this compound exists in the Database modify it’s base property, defaults to True

  • radical – Define the behaviour for dealing with radical atoms in the SMILES. See sanitise_smiles. Defaults to 'warning'

  • debug – Increase verbosity of output, defaults to False

Returns:

The registered/existing Compound object or its ID (depending on return_compound), and optionally a boolean to indicate duplication see return_duplicate

register_compounds(*, smiles: list[str], radical: str = 'warning', sanitisation_verbosity: bool = True, debug: bool = False) list[tuple[str, str]][source]

Insert many compounds at once

Parameters:

smiles – list of smiles strings

Returns:

list of sanitised inchikey and smiles string pairs

register_pose(*, compound: Compound | int, target: str, path: str, inchikey: str | None = None, alias: str | None = None, reference: int | None = None, tags: None | list = None, metadata: None | dict = None, inspirations: None | list[int | Pose] = None, return_pose: bool = True, energy_score: float | None = None, distance_score: float | None = None, commit: bool = True, overwrite_metadata: bool = True, warn_duplicate: bool = True, check_RMSD: bool = False, RMSD_tolerance: float = 1.0, split_PDB: bool = False, duplicate_alias: str = 'modify', resolve_path: bool = True, load_mol: bool = False) Pose[source]

Add a Pose to the Database. If it already exists return the pose

Parameters:
  • compound – The Compound object or ID that this Pose is a conformer of

  • target – The Target name or ID

  • path – Path to the Pose’s conformer file (.pdb or .mol)

  • alias – The string alias of this Pose, defaults to None

  • reference – Reference Pose to use as the protein conformation for all poses, defaults to None

  • tags – A list of tags to assign to this compound, defaults to None

  • metadata – A dictionary of metadata to assign to this compound, defaults to None

  • inspirations – a list of inspiration Pose objects or ID’s, defaults to None

  • energy_score – assign an energy score to this Pose, defaults to None

  • distance_score – assign a distance score to this Pose, defaults to None

  • commit – Commit the changes to the Database, defaults to True

  • overwrite_metadata – If a duplicate is found, overwrite its metadata, defaults to True

  • warn_duplicate – Warn if a duplicate Pose exists, defaults to True

  • check_RMSD – Check the RMSD against existing Pose, defaults to False

  • RMSD_tolerance – Tolerance for check_RMSD in Angstrom, defaults to 1.0

  • split_PDB – Register a Pose for every ligand residue in the PDB, defaults to False

  • duplicate_alias – In the case of a duplicate, define the behaviour for the alias property, defaults to 'modify' which appends _copy to the alias. Set to error to raise an Exception.

  • resolve_path – Resolve to an absoltue path, default = True.

  • load_mol – Parse the input file and load the ligand rdkit.Chem.Mol

Returns:

The registered/existing Pose object or its ID (depending on return_pose)

register_reaction(*, type: str, product: Compound | int, reactants: list[Compound | int], commit: bool = True, product_yield: float = 1.0, check_chemistry: bool = False) Reaction[source]

Add a Reaction to the Database. If it already exists return the existing one

Parameters:
  • type – string indicating the type of reaction

  • product – The Compound object or ID of the product

  • reactants – A list of Compound objects or IDs of the reactants

  • commit – Commit the changes to the Database, defaults to True

  • product_yield – The fraction of product yielded from this reaction 0 < product_yield <= 1.0, defaults to 1.0

  • check_chemistry – check the reaction chemistry, defaults to True

Returns:

The registered Reaction

register_reactions(*, types: list[str], product_ids: list[list[int]], reactant_id_lists: list[list[int]])[source]

Insert many reactions at once

Parameters:
  • types – list of reaction type strings

  • reactant_id_lists – list of reactant compound id lists

  • product_ids – list of product compound ids

Returns:

list of reaction ids

register_route(*, recipe: Recipe, commit: bool = True) int[source]

Insert a single-product Recipe into the Database.

Parameters:
  • recipe – The Recipe object to be registered

  • commit – Commit the changes to the Database, defaults to True

Returns:

The Route ID

register_target(name: str, warn_duplicate: bool = True) Target[source]

Register a new protein :class:`` to the Database

Parameters:
  • param1 – this is a first param

  • param2 – this is a second param

Returns:

this is a description of what is returned

Raises:

keyError – raises an exception

property scaffolds: CompoundSet

Returns compounds that are the basis for one or more elaborations

summary() None[source]

Print a text summary of this HIPPO

property tags: TagTable

Access Tags in the Database

property targets: list[Target]

Access Targets in the Database