Database

HIPPO stores data in a SQLite Database file that structures information across multiple cross-referenced tables. For general use it is not necessary to understand the database schema in detail.

HIPPO database schema
class hippo.db.Database(path: Path, animal: HIPPO, update_legacy: bool = False, auto_compute_bfps: bool = True)[source]

Wrapper to connect to the HIPPO sqlite database.

Attention

Database objects should not be created directly. Instead use the methods in HIPPO to interact with data in the database. See Getting started with HIPPO and insert_elaborations.

Database initialisation

repr(self)[source]

ANSI Formatted string representation

str(self)[source]

Unformatted string representation

property auto_compute_bfps: bool

Automatically compute compound binary fingerprints on insertion

backup(destination: Path | str | None = None, pages: int = 10000) None[source]

Create a backup of the database

calculate_all_murcko_scaffolds(generic: bool = True)[source]

Determine Murcko and optionally generic Murcko scaffolds for all Compounds in the Database and add relevant records.

Parameters:

generic – Calculate generic (single bonds and all carbon) scaffolds as well

calculate_all_scaffolds() None[source]

Determine and insert records for all substructure/superstructure relationships in the Compound table

close() None[source]

Close the connection

column_names(table: str) list[str][source]

Get the column names of the given table

commit(*, retry: float | None = 1) None[source]

Commit changes to the database

Parameters:

retry – If truthy, keep trying to execute every retry seconds if the Database is locked

connect() None[source]

Connect to the database

property connection: sqlite3.connection

Returns a sqlite3.connection to the database

classmethod copy_from(source: Path, destination: Path, animal: HIPPO, update_legacy: bool = False, overwrite_existing: bool = False, pages: int = 10000) None[source]

Create a Database from an existing one

copy_interactions_to_temp(pose_id: int) int[source]

Copy the records from the ‘temp_interaction’ table to the ‘interaction’ table

Returns:

ID of the last inserted Interaction

copy_temp_interactions() int[source]

Copy the records from the ‘temp_interaction’ table to the ‘interaction’ table

Returns:

ID of the last inserted Interaction

count(table: str) int[source]

Count all entries in a table

Parameters:

table – table to count entries from

count_where(table: str, key: str, value=None)[source]

Count all entries in a table where key==value

Parameters:
  • table – table to count entries from

  • key – the key to match as {table}_{key} = {value} or the SQL string if value == None

  • value – the value to match (Default value = None)

create_blank_db() None[source]

Create a blank database

create_metadata_id_map(*, table: str, key: str) dict[str, int][source]

Create a mapping between metadata[key] values to their respective parent record ID’s

Returns:

dictionary mapping metadata[key] values to integer ID’s

create_table_component() None[source]

Create the component table

create_table_compound() None[source]

Create the compound table

create_table_feature() None[source]

Create the feature table

create_table_inspiration() None[source]

Create the inspiration table

create_table_interaction(table: str = 'interaction', debug: bool = True) None[source]

Create an interaction table

create_table_pattern_bfp() None[source]

Create the pattern_bfp table

create_table_pose() None[source]

Create the pose table

create_table_quote() None[source]

Create the quote table

create_table_reactant() None[source]

Create the reactant table

create_table_reaction() None[source]

Create the reaction table

create_table_route() None[source]

Create the route table

create_table_scaffold() None[source]

Create the scaffold table

create_table_subsite() None[source]

Create the subsite table

create_table_subsite_tag() None[source]

Create the subsite_tag table

create_table_tag() None[source]

Create the tag table

create_table_target() None[source]

Create the target table

property cursor: sqlite3.cursor

Returns a sqlite3.cursor

delete_features() None[source]

Delete all protein features

delete_interactions() None[source]

Delete all calculated interactions and set pose_fingerprint appropriately

delete_reactions() None[source]

Delete all reaction data

delete_subsites() None[source]

Delete all protein subsites

delete_tag(tag: str) None[source]

Delete all tag entries with the matching name

Parameters:

tag – tag name to match

delete_where(table: str, key: str, value: str | None = None, commit: bool = True) None[source]

Delete entries where key==value

Parameters:
  • table – the table from which to delete

  • key – column name to match to value, if no value is provided this

  • value – the value to match (Default value = None)

  • commit – commit the changes (Default value = True)

execute(sql, payload=None, *, retry: float | None = 1, debug: bool = False)[source]

Execute arbitrary SQL with retry if database is locked.

executemany(sql, payload, *, retry: float | None = 1) None[source]

Execute arbitrary SQL

Parameters:
  • sql – SQL query

  • retry – If truthy, keep trying to executemany every retry seconds if the Database is locked

  • payload – Payload for insertion, etc. (Default value = None)

fix_incorrect_pose_compound_assignments()[source]

Fix pose_compound values that reference incorrect chemical structures

get_compound(*, id: int | None = None, inchikey: str | None = None, alias: str | None = None, smiles: str | None = None, none: str = 'error', **kwargs) Compound[source]

Get a Compound using one of the following fields: [‘id’, ‘inchikey’, ‘alias’, ‘smiles’]

Parameters:
  • id – the ID to search for (Default value = None)

  • inchikey – the InChi-Key to search for (Default value = None)

  • alias – the alias to search for (Default value = None)

  • smiles – the smiles to search for (Default value = None)

Returns:

the Compound object

get_compound_cluster_dict(cset: CompoundSet | None = None, *, fractions: bool = False, max_scaffolds: int | None = None, fraction_reference: CompoundSet | None = None) dict[tuple, set][source]

Create a dictionary grouping compounds by their scaffold/base cluster.

Parameters:
  • csetCompoundSet subset to query, defaults to all compounds

  • fractions – Calculate fractional populations for each cluster

  • max_scaffolds – Define the maximum number of compounds to use as cluster keys

  • fraction_reference – Use cset to build the cluster map and use fraction_reference to determine the fractional populations

Returns:

A dictionary mapping a tuple of scaffold Compound IDs to a set of superstructure Compound ID’s.

get_compound_computed_property(prop: str, compound_id: int) int | str[source]

Use chemicalite to calculate a property from the stored binary molecule

Parameters:
  • prop – the property to calculate [num_heavy_atoms, formula, num_rings]

  • compound_id – the compound ID to query

Returns:

the value of the computed property

get_compound_id(*, inchikey: str | None = None, alias: str | None = None, smiles: str | None = None, **kwargs) int[source]

Get a compound’s ID using one of the following fields: [‘inchikey’, ‘alias’, ‘smiles’]

Parameters:
  • inchikey – the InChi-Key to search for (Default value = None)

  • alias – the alias to search for (Default value = None)

  • smiles – the smiles to search for (Default value = None)

Returns:

the Compound ID

get_compound_id_inchikey_dict(cset: CompoundSet | None = None) dict[int, str][source]

Get a dictionary mapping Compound IDs to their inchikeys

get_compound_id_inspiration_ids_dict() dict[int, set][source]

Get a dictionary mapping Compound ID’s to a set of Pose ID’s for the inspirations for the whole database

get_compound_id_obj_dict(cset: CompoundSet) dict[id, Compound][source]

Get a dictionary mapping Compound ID’s to their objects

get_compound_id_pose_ids_dict(cset: CompoundSet) dict[int, set][source]

Get a dictionary mapping Compound ID’s to their associated Pose ID’s

get_compound_id_smiles_dict(cset: CompoundSet | None = None) dict[int, set[str]][source]

Get a dictionary mapping Compound ID’s to suppliers which stock it

get_compound_id_suppliers_dict(cset: CompoundSet) dict[int, set[str]][source]

Get a dictionary mapping Compound ID’s to suppliers which stock it

get_compound_inchikey_id_dict(inchikeys: list[str]) dict[str, int][source]

Get a dictionary mapping Compound inchikeys to their ID’s

get_compound_scaffold_dict() dict[int, set[int]][source]

Get a dictionary mapping scaffold_base compound ID’s to a set of their superstructure IDs

get_compound_tag_dict(cset: CompoundSet | None = None) dict[int, set[str]][source]

Get a dictionary mapping compound ID’s to their tags

get_feature(*, id: int) Feature[source]

Get a protein interaction Feature with a given ID

Parameters:

id – the protein interaction Feature ID to be retrieved

Returns:

Feature object

get_id_metadata_dict(*, table: str, ids: list[int]) dict[int, dict][source]

Get a dictionary mapping IDs to metadata dictionaries

get_inspiration_tuples() list[int, int][source]

Get a dictionary mapping Pose ID’s to a set of Pose ID’s for the inspirations for the whole database

get_interaction(*, id: int, table: str = 'interaction') Interaction[source]

Fetch the Interaction object with given ID

Parameters:

id – the ID of the Interaction to retrieve

Returns:

Interaction object

get_metadata(*, table: str, id: int) dict[source]

Get metadata dictionary from a specific table and ID

Parameters:
  • table – the table from which to get the entry

  • id – the ID to search for (Default value = None)

Returns:

a dictionary of metadata

get_pose(*, id: int | None = None, inchikey: str = None, alias: str = None) Pose[source]

Get a pose using one of the following fields: [‘id’, ‘inchikey’, ‘alias’]

Parameters:
  • id – the ID to search for (Default value = None)

  • inchikey – the InChi-Key to search for (Default value = None)

  • alias – the alias to search for (Default value = None)

Returns:

the Pose object

get_pose_alias_id_dict(pset: PoseSet | None = None) dict[str, int][source]

Get a dictionary mapping Pose aliases to ID’s

get_pose_alias_path_dict(pset: PoseSet | None = None) dict[str, str][source]

Get a dictionary mapping Pose aliases to paths

get_pose_id(*, inchikey: str | None = None, alias: str | None = None) int[source]

Get a pose’s ID using one of the following fields: [‘inchikey’, ‘alias’, ‘smiles’]

Parameters:
  • table – the table from which to get the entry (Default value = ‘pose’)

  • inchikey – the InChi-Key to search for (Default value = None)

  • alias – the alias to search for (Default value = None)

Returns:

the Pose ID

get_pose_id_alias_dict(pset: PoseSet | None = None) dict[str, int][source]

Get a dictionary mapping Pose aliases to ID’s

get_pose_id_inspiration_ids_dict(pset: PoseSet = None) dict[int, set][source]

Get a dictionary mapping Pose ID’s to a set of Pose ID’s for the inspirations for the whole database

get_pose_id_interaction_ids_dict(pset: PoseSet) dict[int, set][source]

Get a dictionary mapping Pose ID’s to their associated Interaction ID’s

get_pose_id_interaction_tuples_dict(pset: PoseSet) dict[int, set][source]

Get a dictionary mapping Pose ID’s to lists of (interaction_type, feature_id) tuples describing their interactions

get_pose_id_obj_dict(pset: PoseSet) dict[id, Pose][source]

Get a dictionary mapping Pose ID’s to their objects

get_pose_path_id_dict(pset: PoseSet | None = None) dict[str, int][source]

Get a dictionary mapping Pose aliases to ID’s

get_pose_subsite_names_dict() dict[int, set[str]][source]

Get a dictionary mapping pose ID’s to their subsite names

get_pose_tag_dict(pset: PoseSet | None = None) dict[int, set[str]][source]

Get a dictionary mapping pose ID’s to their tags

get_possible_reaction_ids(*, compound_ids: list[int]) list[int][source]

Given a set of reactant Compound ID’s, compute which Reaction objects are possible (all reactants present).

Parameters:

compound_ids – the list of reactant Compound IDs

Returns:

a list of Reaction ID’s that are possible with the given reactants

get_possible_reaction_product_ids(*, reaction_ids: list[int]) list[int][source]

Given a set of Reaction IDs return the Compound IDs of their synthesis products

Parameters:

reaction_idsReaction IDs

Returns:

list of Compound IDs

get_product_id_routes_dict() dict[int, set[int]][source]

Get a dictionary mapping product Compound to their route IDs

get_quote(*, id: int | None = None, none: str | None = None) Quote[source]

Get a quote using its ID

Parameters:
  • id – the ID to search for (Default value = None)

  • none – define the behaviour for no matches, any value other than 'error' will silently return empty data (Default value = ‘error’)

Returns:

the Quote object

get_quote_df(ids: list[int]) pd.DataFrame[source]

Get a pandas DataFrame representing quotes with given IDs

get_reactant_product_tuples(compound_ids: list | None = None, deduplicated: bool = True) set[tuple[int, int]][source]

Get tuples of (reactant, product) Compound IDs

get_reaction(*, id: int | None = None, none: str | None = None) Reaction[source]

Get a reaction using its ID

Parameters:
  • id – the ID to search for (Default value = None)

  • none – define the behaviour for no matches, any value other than 'error' will silently return empty data (Default value = ‘error’)

Returns:

the Reaction object

get_reaction_map_from_products(product_ids: list[int]) dict[tuple[str, int], set[int]][source]

Get a dictionary mapping (reaction_type, product_id) tuples to sets of reactant_ids

get_reaction_price_estimate(*, reaction: Reaction) float[source]

Estimate the price of a Reaction

Parameters:

reactionReaction object

Returns:

price estimate

get_route(*, id: int, debug: bool = False) Route[source]

Fetch a Route object stored in the Database.

Parameters:
  • id – the ID of the Route to be retrieved

  • debug – increase verbosity for debugging, defaults to False

Returns:

Route object

get_route_id_product_dict() dict[int, int][source]

Get a dictionary mapping route ID’s to their product Compound

get_route_products() CompoundSet | None[source]

Get a CompoundSet of all route products

get_scaffold_similarity_dict(scaffolds: CompoundSet | None = None) list[dict][source]

Get a dictionary mapping scaffold Compound IDs to their superstructure’s IDs

get_scaffold_tuples(compound_ids: list | None = None) set[tuple[int, int]][source]

Get tuples of (reactant, product) Compound IDs

get_subsite(*, id) Subsite[source]

Get protein subsite with a given ID

Parameters:

ID – the subsite ID

Returns:

Subsite object

get_subsite_id(*, name: str, **kwargs) int | None[source]

Get protein Subsite ID with a given name

Parameters:

name – the protein Subsite name

Returns:

the Subsite ID

get_subsite_name(*, id: str, **kwargs) int | None[source]

Get protein Subsite name with a given ID

Parameters:

name – the protein Subsite ID

Returns:

the Subsite ID

get_subsite_tag(*, id) SubsiteTag[source]

Get subsite_tag with a given ID

Parameters:

ID – the subsite_tag ID

Returns:

SubsiteTag object

get_target(*, id: int) Target[source]

Get target with specific ID

Parameters:

id – the ID of the target to retrieve

Returns:

Target object

get_target_id(*, name: str) int | None[source]

Get target ID with a given name

Parameters:

name – the protein target name

Returns:

the Target ID

get_target_name(*, id: int) str[source]

Get the name of a target with given ID

Parameters:

id – the ID of the target to retrieve

Returns:

target name

get_unsolved_reaction_tree(*, product_ids: list[int], debug: bool = False)[source]

Given a set of product Compound IDs, recursively solve for all the reactants (CompoundSet) and reactions (ReactionSet) that could be involved in their synthesis. N.B. This evaluates all synthesis branches.

Parameters:
  • product_ids – list of product Compound IDs

  • debug – increase verbosity for debugging, defaults to False

Returns:

a tuple of (reactants, reactions)

insert_component(*, route: int, ref: int, component_type: int, amount: float = 1.0, commit: bool = True) int[source]

component_type

table

ref

1

reaction

reaction

2

compound

reactant

3

compound

intermediate

Parameters:
  • route – associated Route ID

  • ref – ID of the Reaction or Compound

  • component_type – integer specifying the type of the component

  • commit – commit the changes to the database (Default value = True)

Returns:

the component ID

insert_compound(*, smiles: str, alias: str | None = None, tags: None | list[str] = None, warn_duplicate: bool = True, commit: bool = True, metadata: None | dict = None, inchikey: str = None) int[source]

Insert an entry into the compound table

Parameters:
  • smiles – SMILES string

  • alias – optional alias for the compound (Default value = None)

  • tags – list of string tags, (Default value = None)

  • warn_duplicate – print a warning if the compound already exists (Default value = True)

  • commit – commit the changes to the database (Default value = True)

  • metadata – dictionary of metadata (Default value = None)

  • inchikey – provide an InChI-key, otherwise it’s calculated from the SMILES, (Default value = None)

Returns:

compound ID

insert_compound_pattern_bfp(compound_id: int, commit: bool = True) int[source]

Insert a compound_pattern_bfp

Parameters:
  • compound_id – ID of the associated compound

  • commit – commit the changes to the database (Default value = True)

Returns:

binary fingerprint ID

insert_feature(*, family: str, target: int, chain_name: str, residue_name: str, residue_number: int, atom_names: list[str], warn_duplicate: bool = False, commit: bool = True) int[source]

Insert an entry into the feature table

Parameters:
  • family – feature type string

  • target – associated Target ID

  • chain_name – single character name of the chain

  • residue_name – 3-4 character string name of the residue

  • residue_number – integer residue number

  • atom_names – list of atom names

  • commit – commit the changes to the database (Default value = True)

Returns:

feature ID

insert_inspiration(*, original: Pose | int, derivative: Pose | int, warn_duplicate: bool = True, commit: bool = True) int[source]

Insert an entry into the inspiration table

Parameters:
  • originalPose object or ID of the original hit

  • derivativePose object or ID of the derivative hit

  • warn_duplicate – print a warning if the pose already exists (Default value = True)

  • commit – commit the changes to the database (Default value = True)

Returns:

the inspiration ID

insert_interaction(*, feature: Feature | int, pose: Pose | int, type: str, family: str, atom_ids: list[int], prot_coord: list[float], lig_coord: list[float], distance: float, angle: float | None = None, energy: float | None = None, warn_duplicate: bool = True, commit: bool = True, table: str = 'interaction') int[source]

Insert an entry into the interaction table

Parameters:
  • feature – associated Feature object or ID

  • pose – associated Pose object or ID

  • type – interaction type

  • family – ligand feature type

  • atom_ids – atom indices of ligand feature

  • prot_coord[x,y,z] coordinate of protein feature

  • lig_coord[x,y,z] coordinate of ligand feature

  • distance – interaction distance Angstrom

  • angle – optional interaction angle degrees

  • energy – energy score kcal/mol, defaults to None

  • warn_duplicate – print a warning if the pose already exists (Default value = True)

  • commit – commit the changes to the database (Default value = True)

  • table – the name of the table to insert into (Default value = ‘interaction’)

Returns:

the interaction ID

insert_metadata(*, table: str, id: int, payload: dict, commit: bool = True) None[source]

Insert metadata into an an existing entry in the compound or pose tables

Parameters:
  • table – table for insertions ['pose', 'compound', 'subsite', 'subsite_tag']

  • id – associated entry ID

  • payload – metadata dictionary

  • commit – commit the changes to the database (Default value = True)

insert_pose(*, compound: Compound | int, target: Target | int | str, path: str, inchikey: str | None = None, alias: str | None = None, reference: int | Pose | None = None, tags: None | list = None, energy_score: float | None = None, distance_score: float | None = None, metadata: None | dict = None, commit: bool = True, warn_duplicate: bool = True, resolve_path: bool = True) int[source]

Insert an entry into the pose table

Parameters:
  • compound – associated Compound object or ID

  • target – protein Target name or ID

  • path – path to the molecular structure (.pdb/.mol)

  • inchikey – provide an InChI-key if available, (Default value = None)

  • alias – optional alias for the compound (Default value = None)

  • reference – reference Pose object or ID to use for the protein conformation (Default value = None)

  • tags – list of string tags, (Default value = None)

  • energy_score – optional score of the ligand’s binding energy (Default value = None)

  • distance_score – optional score of the ligand’s binding position (Default value = None)

  • metadata – dictionary of metadata (Default value = None)

  • commit – commit the changes to the database (Default value = True)

  • warn_duplicate – print a warning if the pose already exists (Default value = True)

  • resolve_path – try resolving the path (Default value = True)

Returns:

the pose ID

insert_quote(*, compound: Compound | int, supplier: str, catalogue: str | None = None, entry: str | None = None, amount: float, price: float, currency: str | None = None, purity: float | None = None, lead_time: float, smiles: str | None = None, date: str | None = None, commit: bool = True) int | None[source]

Insert an entry into the quote table

Parameters:
  • compound – associated Compound object or ID

  • supplier – name of the supplier

  • catalogue – optional catalogue name

  • entry – name of the catalogue entry

  • amount – amount in mg

  • price – price of the compound

  • currency – currency string ['GBP', 'EUR', 'USD', None]

  • purity – compound purity fraction

  • lead_time – lead time in days

  • smiles – quoted SMILES string (Default value = None)

  • commit – commit the changes to the database (Default value = True)

Returns:

the quote ID

insert_reactant(*, compound: Compound | int, reaction: Reaction | int, amount: float = 1.0, commit: bool = True) int[source]

Insert an entry into the reactant table

Parameters:
  • compoundCompound object or ID of the reactant

  • reactionReaction object or ID of the reaction

  • amount – amount (in mg) needed for each unit of product (Default value = 1.0)

  • commit – commit the changes to the database (Default value = True)

Returns:

the reactant ID

insert_reaction(*, type: str, product: Compound | int, product_yield: float = 1.0, commit: bool = True) int[source]

Insert an entry into the reaction table

Parameters:
  • type – string to indicate the reaction type

  • productCompound object or ID of the reaction product

  • product_yield – yield fraction of the reaction product (Default value = 1.0)

  • commit – commit the changes to the database (Default value = True)

Returns:

the reaction ID

insert_route(*, product_id: int, commit: bool = True) int[source]

Insert an entry into the route table

Parameters:
  • product_idCompound ID of the product

  • commit – commit the changes to the database (Default value = True)

Returns:

route ID

insert_scaffold(*, scaffold: Compound | int, superstructure: Compound | int, warn_duplicate: bool = True, commit: bool = True) int[source]

Insert an entry into the scaffold table

Parameters:
  • scaffoldCompound object or ID of the scaffold hit

  • superstructureCompound object or ID of the superstructure hit

  • warn_duplicate – print a warning if the pose already exists (Default value = True)

  • commit – commit the changes to the database (Default value = True)

Returns:

the scaffold row ID

insert_subsite(target: int, name: str, commit: bool = True) int[source]

Insert an entry into the subsite table

Parameters:
  • target – protein Target ID

  • name – name of the protein subsite/subsite

Returns:

the subsite ID

insert_subsite_tag(*, pose_id: int, name: str | None, target: int | None = None, subsite_id: int | None = None, commit: bool = True) int[source]

Insert an entry into the subsite_tag table

Parameters:
  • pose_idPose ID

  • name – name of the protein subsite/pocket

  • target – protein Target ID, defaults to querying pose table

  • target – protein Subsite ID, defaults to querying Subsite table

Returns:

the Subsite ID

insert_tag(*, name: str, compound: int = None, pose: int = None, commit: bool = True) int[source]

Insert an entry into the tag table.

Attention

Exactly one of compound or pose arguments must have a value

Parameters:
  • name – name of the tag

  • compound – associated Compound ID

  • pose – associated Pose ID

  • commit – commit the changes to the database (Default value = True)

Returns:

the tag ID

insert_target(*, name: str, warn_duplicate: bool = True) int[source]

Insert an entry into the target table

Parameters:

name – name of the protein target

Returns:

the target ID

max_id(table: str) int[source]

Get the maximal entry ID from a given table

Parameters:

table – the database table to query

Returns:

the largest entry ID

migrate_legacy_scaffolds() int[source]

Migrate legacy compound_scaffold records from the ‘compound’ table to the ‘scaffold’ table

Returns:

ID of the last inserted scaffold record

min_id(table: str) int[source]

Get the minimal entry ID from a given table

Parameters:

table – the database table to query

Returns:

the smallest entry ID

property path: Path

Returns the path to the database file

print_table(table: str) None[source]

Print a table’s entries

Parameters:

table – the table to print

prune_duplicate_routes() None[source]

Remove duplicate routes from the database

prune_reactions(reactions: ReactionSet) list[Reaction][source]

Remove duplicate reactions

Parameters:

reactionsReactionSet

Returns:

list of pruned Reaction objects

query_exact(query: str, threshold: float = 0.989) CompoundSet[source]

Search for exact match compounds (default similarity > 0.989)

Parameters:
  • query – SMILES string

  • threshold – similarity threshold to exceed

query_most_similar(query: str, subset: CompoundSet, return_similarity: bool = False, none='error')[source]

Search for the most similar compound by tanimoto similarity of binary pattern fingerprints using the chemicalite function mol_pattern_bfp

Parameters:
  • query – SMILES string

  • return_similarity – return a list of similarity values together with the CompoundSet (Default value = False)

  • none – define the behaviour for no matches, any value other than 'error' will silently return empty data (Default value = ‘error’)

  • subset – optional subset of compounds to search

Returns:

Compound and optionally a similarity values

query_similarity(query: str, threshold: float, return_similarity: bool = False, subset: CompoundSet = None, none='error')[source]

Search compounds by tanimoto similarity of binary pattern fingerprints using the chemicalite function mol_pattern_bfp

Parameters:
  • query – SMILES string

  • threshold – similarity threshold to exceed

  • return_similarity – return a list of similarity values together with the CompoundSet (Default value = False)

  • none – define the behaviour for no matches, any value other than 'error' will silently return empty data (Default value = ‘error’)

  • subset – optional subset of compounds to search

Returns:

CompoundSet and optionally a list of similarity values

query_substructure(query: str, *, fast: bool = True, none: str = 'error', smarts: bool = False) CompoundSet[source]

Search for compounds by substructure

Parameters:
  • query – SMILES string of the substructure

  • fast – Use pattern binary fingerprint table to improve performance (Default value = True)

  • none – define the behaviour for no matches, any value other than 'error' will silently return empty data (Default value = ‘error’)

Returns:

CompoundSet object

register_compounds(*, smiles: list[str], radical: str = 'warning', sanitisation_verbosity: bool = True, sanitise: bool = True, debug: bool = False) list[tuple[str, str]][source]

Bulk register compounds

register_poses(dicts: list[dict]) set[int][source]

Insert or ignore a bunch of poses, also returns a set of Pose IDs

Parameters:

dicts – a list of dictionaries describing the poses to be inserted. See the expected format below:

dicts = [
dict(

alias=…, # string can be None reference_id=…, # reference pose id inchikey=…, # pre-computed inchikey smiles=…, # SMILEs path=…, # path to mol-file on disk, used for uniqueness check, can be a fake path compound_id=…, # Compound database ID target_id=…, # Target database ID mol=…, # rdkit.Chem.Mol energy_score=…, # float, can be None distance_score=…, # float, can be None metadata=…, # dictionary, can be empty

)

]

reinitialise_molecules()[source]

In the case where the Mol binaries in a database are throwing unpickling errors, run this to reinitialise them all from their smiles.

remove_metadata_list_item(*, table: str, key: str, value, remove_empty: bool = True) None[source]

Remove a specific item from list-like values associated with a given key from all metadata entries in a given table

Parameters:
  • table – the database table to query

  • key – the Metadata key to match

  • value – the value to remove from the list

  • remove_empty – remove the key from the metadata if the list is empty (Default value = True)

select(query: str, table: str, multiple: bool = False) tuple | list[tuple][source]

Wrapper for the SQL SELECT query, in the following syntax:

'SELECT {query} FROM {table}'
Parameters:
  • query – the columns to return

  • table – the table from which to select

  • multiple – fetch all results (Default value = False)

Returns:

the result of the query

select_all_where(table: str, key: str, value: str | None = None, multiple: bool = False, none: str | None = 'error') tuple | list[tuple][source]

Select entries where key==value. Similar to select_where() except the query argument is always *.

Parameters:
  • table – the table from which to select

  • key – column name to match to value, if no value is provided this

  • value – the value to match (Default value = None)

  • multiple – fetch all results (Default value = False)

  • none – define the behaviour for no matches, any value other than 'error' will silently return empty data (Default value = ‘error’)

Returns:

the result of the query

select_id_where(table: str, key: str, value: str | None = None, multiple: bool = False, none: str | None = 'error') tuple | list[tuple][source]

Select ID’s where key==value. Similar to select_where() except the query argument is always {table}_id.

Parameters:
  • table – the table from which to select

  • key – column name to match to value, if no value is provided this

  • value – the value to match (Default value = None)

  • multiple – fetch all results (Default value = False)

  • none – define the behaviour for no matches, any value other than 'error' will silently return empty data (Default value = ‘error’)

Returns:

the result of the query

select_where(query: str, table: str, key: str, value: str | None = None, multiple: bool = False, none: str | None = 'error', sort: str = None) tuple | list[tuple][source]

Select entries where key == value

Examples

Find compound alias with matching ID:

animal.db.select_where(
        query='compound_alias',
        table='compound',
        key='id',
        value='123',
)

# the above evaluates to:
'SELECT compound_id FROM compound WHERE compound_id = 123'

Find compound aliases with ID below 10 and order alphabetically:

animal.db.select_where(
        query='compound_alias',
        table='compound',
        key='compound_id < 10',
        multiple=True,
        sort='compound_alias',
)

# the above evaluates to:
'SELECT compound_id FROM compound WHERE compound_id < 10 ORDER BY compound_alias'

:param : :type : param query: the columns to return :param : :type : param table: the table from which to select :param : :type : param key: column name to match to value, if no value is provided the key argument should contain the a SQL string to select entries :param : :type : param value: the value to match (Default value = None) :param : :type : param multiple: fetch all results (Default value = False) :param : :type : param none: define the behaviour for no matches, any value other than 'error' will silently return empty data (Default value = ‘error’) :param : :type : param sort: optionally sort the output (Default value = None) :param : :type : returns: the result of the query

set_derivative_subsites(commit: bool = True) None[source]

Propagate all subsite assignments from inspirations to their derivatives

slice_ids(*, table: str, start: int | None, stop: int | None, step: int = 1, name: bool = False) list[int][source]

Retrieve ID’s matching a slice

Parameters:
  • table – the database table to query

  • start – return IDs equal to or larger than this value

  • stop – return IDs smaller than this value

  • step – return IDs in increments of this value (Default value = 1)

Returns:

matching IDs

table_info(table: str) list[tuple][source]

Print a table’s schema

Parameters:

table – the table to print

property table_names: list[str]

List of all the table names in the database

property total_changes: int

Return the total number of database rows that have been modified, inserted, or deleted since the database connection was opened.

update(*, table: str, id: int, key: str, value, commit: bool = True) int[source]

Update a field in a database entry with given ID

Parameters:
  • table – the table which to update

  • id – the ID of the entry to update

  • key – column name to update

  • value – the value to insert

  • commit – commit the changes to the database (Default value = True)

Returns:

the ID of the modified entry

update_all(*, table: str, key: str, value, commit: bool = True) None[source]

Update all fields in a table column at once

Parameters:
  • table – the table which to update

  • key – column name to update

  • value – the value to insert

  • commit – commit the changes to the database (Default value = True)

update_compound_pattern_bfp_table()[source]

Update the compound pattern BFP table

update_legacy_pose_inspiration_score() None[source]

Add pose_inspiration_score column

update_legacy_reaction_metadata() None[source]

Add reaction_metadata column

update_legacy_routes() None[source]

Update legacy component entries