Compound statistics

Distribution, overlap, and identifier coverage for unique compounds (ns_id).

Unique compounds
143,920
Shared compounds (≥2 lists)
80,031
With SMILES
133,618
With InChIKey
133,181

Top 10 compounds by list count

Most shared compounds across lists.

Compound overlap distribution

How many compounds appear in 1, 2, 3–5, … lists.

Identifier coverage (unique compounds)

Counts of compounds having each identifier in mv_compound_cards.

Missing values in compounds table (%)

Percent of rows where each column is NULL/blank (based on public.compounds row count).

DTX vs PubChem match coverage

Unique compounds grouped by whether we have DTX (dtx_id) and/or PubChem (pc_id) enrichment.

Identifier coverage (radar)

Same coverage as the donut, shown as percentages of unique compounds.

Identifier completeness score

Distribution of how many identifiers are available per compound (0–6).

Mass distribution (unique compounds)

Histogram of numeric mass values from mv_compound_cards (0–1000 Da, 50-Da bins).

Top lists by unique compounds

Largest lists measured as DISTINCT ns_id per list (Top 20).

H-bond donor vs Monoisotopic mass

Bubble bins built from PubChem enrichment (pc_chem): size ∝ √count, color ∝ log(1+count).

Heaviest compounds

Top mass values from mv_compound_cards (numeric).

Structure Compound Mass
Structure Eptotermin alfa 15664.52
Structure n.a. 11002.28
Structure CID 168266388 10417.74
Structure Mekasermin 7643.59
Structure LEPIRUDIN 6981.00
Structure ISIS 2302 6363.61
Structure Inulin 6176.02
Structure MURODERMIN 6035.54
Structure insulin (human) 5803.64
Structure Insulin (ox), 8A-l-threonine-10A-l-isoleucine- 5773.63