This Week In Cheminformatics: Issue #010
Novel NP Inspired Chemotypes for Kinase, Virtual Screening with GPU using UniDock-Pro, Benchmarking Count Fingerprint Performance and a Long List of Papers from the Last Week of February.
Highlights
Natural Product-like Fragments Unlock Novel Chemotypes for a Kinase Target─Exploring Options beyond the Flatland
Santura et al. caught my eye because it brings "escape from flatland" idea within fragment-based (kinase) drug discovery. They screened an 87-membered natural product-like fragment library against protein kinase A and achieved a surprisingly high (41%) crystallographic hit rate. What I found most compelling from a cheminformatics perspective is that these hits yielded 32 novel (Bemis-Murcko) scaffolds. These scaffolds populate an underexplored, three-dimensional chemical space, which the authors demonstrated through significantly higher Fsp3, FC_Stereo, and nPBF metrics compared to established ChEMBL and PDB kinase ligands. It is a highly grounded, data-rich read that provides a clear structural rationale for integrating spatially complex, sp3-enriched building blocks into targeted screening libraries without sacrificing hit rates.
Count your bits: fingerprint benchmarking to assess broad chemical space representation
We often default to standard folded binary fingerprints without second-guessing the parameterization, but Huber and Pollmann’s recent benchmarking preprint strongly challenges this habit when dealing with large, heterogeneous chemical spaces. They empirically demonstrate that folding high-occupancy fingerprints specifically RDKit path-based and MAP4 into fixed vectors introduces severe bit-collision artifacts that systematically distort Tanimoto similarities and artificially inflate scores for larger molecules. I found their evaluation against graph-based MCES baselines particularly compelling; it shows that using count or log-count representations alongside unfolded sparse arrays consistently mitigates these collision penalties and improves structural specificity. To make implementing this practical rather than purely theoretical, they released chemap, an open-source Python library that standardizes the generation of these unfolded, folded, and frequency-scaled fingerprint variants. If you are building predictive ML pipelines or running similarity searches at scale, you should definitely review their quantitative breakdown on exactly when standard vector folding falls apart. Good read !
UniDock-Pro: A Unified GPU-Accelerated Platform for High-Throughput Structure-Based, Ligand-Based, and Synergistic Hybrid Virtual Screening
Boyang Ni and Douglas R. Houston introduce UniDock-Pro, which brings inter-ligand batch parallelism to a unified GPU framework for structure-based, ligand-based, and hybrid virtual screening. What stands out is they’ve chosen to use a continuously differentiable energy landscape optimally conditioned for their Monte Carlo and BFGS local optimization pipeline. This targeted adjustment yields ~2.5-fold boost in early enrichment on the DUDE-Z benchmark. Furthermore, their Hybrid mode merges receptor and ligand grid maps on-the-fly, supported by a newly introduced “Force Field Complementarity Analysis” that spatially quantifies where these two force fields cooperate or conflict during the conformational search. Code is here.
Long List
Cheminformatics
ProMol_Func: A Structure-Free Deep Learning Model for Virtual Screening
Enhancing molecular structure elucidation with reasoning-capable LLMs
Scientific knowledge graph and ontology generation using open large language models
Can We Automate Scientific Reasoning in Closed-Loop Experiments using Large Language Models?
KLSD: A Curated Kinase–Ligand Database Mapping Selectivity Landscapes and Polypharmacology
A feature-aligned diffusion model for controllable generation of 3D drug-like molecules
Synthesis Planning in Reaction Space: A Study on Success, Robustness and Diversity
Precision fragment addition: domain-specific DeepFrag2 models for smarter lead optimization
VFMol: A Discrete Flow Matching Variational Autoencoder for Molecular Graph Generation
Generative AI Uncovers Novel Chrebp/Txnip Axis Inhibitors with Potential Anti-inflammatory Activity
A Computational Community Blind Challenge on Pan-Coronavirus Drug Discovery Data
Query Matters: How Selection Strategies Influence Active Learning in Drug Discovery
SPECTRE: A Multimodal Spectral Transformer for Small Molecule Annotation
Toward Generalizable Data-Driven Pharmacokinetics with Interpretable Neural ODEs
Cycle-MS: A Closed-Loop End-to-End Framework for Mass Spectrometry Structure Elucidation
Mapping Allosteric Communication in the Nucleosome with Conditional Activity
Graph-based transformer to predict the octanol–water partition coefficient
Statistics and Ontology of Published Small Molecule Ring Systems
Enhancing ADMET property predictions using cross-aligned multimodal attention mechanisms
LigandExplorer: An Automated Tool for Ligand Extraction from PDB Structures
IRIS: A Machine Learning-Based Pose Reranking Tool for RNA-Ligand Docking
MedChem
Other
Palate Cleanser
https://github.com/burghoff/Scientific-Inkscape
https://x.com/docmilanfar/status/2027593076156604705
https://x.com/damnGruz/status/2027372644245385381
https://x.com/lilfloo/status/2027153279021633826
https://x.com/ylecun/status/2027402804772446243
https://x.com/TheLincoln/status/2027215235103207693
https://x.com/Vinny_Daniel0/status/2027137725124546571
https://x.com/docmilanfar/status/2026889273207857567
https://x.com/stupidtechtakes/status/2026166788518822374
https://x.com/JustJake/status/2026373209248174162
https://x.com/NoContextDutch1/status/2026207945633984862
https://x.com/personofswag/status/2026372733446271412
https://x.com/feikcel/status/2026067975241920769
https://x.com/anaumghori/status/2026039714382569662
https://x.com/boldleonidas/status/2025893154365518001
https://x.com/fkadev/status/2026145372318425259
for shits and giggles,
Manas



