DESCRIPTION

usage: rdkit2fps [-h] [--fpSize INT] [--RDK] [--minPath INT] [--maxPath INT]

  • [--nBitsPerHash INT] [--useHs 0|1] [--morgan] [--radius INT] [--useFeatures 0|1] [--useChirality 0|1] [--useBondTypes 0|1] [--torsions] [--targetSize INT] [--pairs] [--minLength INT] [--maxLength INT] [--maccs166] [--substruct] [--rdmaccs] [--id-tag NAME] [--in FORMAT] [-o FILENAME] [--errors {strict,report,ignore}] [filenames [filenames ...]]

Generate FPS fingerprints from a structure file using RDKit

positional arguments:

filenames

input structure files (default is stdin)

optional arguments:

-h, --help

show this help message and exit

--fpSize INT

number of bits in the fingerprint (applies to RDK, Morgan, topological torsion, and atom pair fingerprints (default=2048)

--id-tag NAME

tag name containing the record id (SD files only)

--in FORMAT

input structure format (default guesses from filename)

-o FILENAME, --output FILENAME

save the fingerprints to FILENAME (default=stdout)

--errors {strict,report,ignore}

how should structure parse errors be handled? (default=strict)

RDKit topological fingerprints:

--RDK

generate RDK fingerprints (default)

--minPath INT

minimum number of bonds to include in the subgraph (default=1)

--maxPath INT

maximum number of bonds to include in the subgraph (default=7)

--nBitsPerHash INT

number of bits to set per path (default=4)

--useHs 0|1

include information about the number of hydrogens on each atom (default=1)

RDKit Morgan fingerprints:

--morgan

generate Morgan fingerprints

--radius INT

radius for the Morgan algorithm (default=2)

--useFeatures 0|1

use chemical-feature invariants (default=0)

--useChirality 0|1

include chirality information (default=0)

--useBondTypes 0|1

include bond type information (default=1)

RDKit Topological Torsion fingerprints:

--torsions

generate Topological Torsion fingerprints

--targetSize INT

number of bits in the fingerprint (default=4)

RDKit Atom Pair fingerprints:

--pairs

generate Atom Pair fingerprints

--minLength INT

minimum bond count for a pair (default=1)

--maxLength INT

maximum bond count for a pair (default=30)

166 bit MACCS substructure keys:

--maccs166

generate MACCS fingerprints

881 bit substructure keys:

--substruct

generate ChemFP substructure fingerprints

ChemFP version of the 166 bit RDKit/MACCS keys:

--rdmaccs

generate 166 bit RDKit/MACCS fingerprints

This program guesses the input structure format based on the filename extension. If the data comes from stdin, or the extension name us unknown, then use "--in" to change the default input format. The supported format extensions are:

File Type

Valid FORMATs (use gz if compressed) --------- ------------------------------------

SMILES

smi, ism, can, smi.gz, ism.gz, can.gz

SDF

sdf, mol, sd, mdl, sdf.gz, mol.gz, sd.gz, mdl.gz