This Week In Cheminformatics: Issue #024
Covalent Binders in PDB, Fast RMSD Calculations, Better Activity Cliff Prediction using Assay Information and a long list of (rather interesting chemistry) papers
Highlights
Chemical Space Exploration of a Database of Covalent Binders in the PDB
Velasco-Saavedra et al. published an extensively curated database of 3,585 unique Covalent Binders extracted from the Protein Data Bank (PDB), updated through Jan 2026. The authors provide structural similarity analysis, using ECFP4 fingerprints and t-SNE projections to compare Covalent Binders directly against Ro3 fragments, FDA-approved drugs, and COCONUT natural products. The authors provide a breakdown of the biological target landscape, noting that 86.77% of PDB-resolved Covalent Binders target enzymes. Predominantly, these enzymes are hydrolases (~60%) and transferases (~15%) while showing specific enzyme classes have distinct preferences for warheads like Michael acceptors, aldehydes, and halohydrocarbons. Good read (and really nice figures; mexican pink is my new favorite accent color now :)
Dataset is available here.
Fast Computation of Exact Symmetry-Corrected RMSD of Conformers
José Manuel Vásquez-Pérez introduced “Hierarchical Neighborhood of Atoms partitioning” which offers an efficient algorithmic improvement for symmetry-corrected RMSD calculations. Topology-unaware linear assignment algorithms produce chemically invalid mappings in ~90% of protein conformer pairs and standard graph isomorphism searches frequently time out on complex molecules (story of my life). This method resolves the issues by recursively grouping topologically equivalent atoms into an assignment tree. By independently evaluating partial RMSD contributions across these branches, the algorithm effectively reduces the combinatorial search space from the product of branch possibilities to their sum !! That’s up to 16 orders of magnitude for systems like myoglobin, achieving correct assignments without timeouts across 1.4 million benchmarked pairs at millisecond speeds. Very cool implementation, see code here.
Accurate prediction of activity cliff compounds based on bioactivity profiles depends on assay nearest neighbor relationships
This paper from Abe et al. offers a pragmatic approach by attempting to predict activity cliff compounds using assay bioactivity profiles. Using data partitioning schemes like intra-series and series-unit splits (I have a poster coming up on something similar, stay tuned for the TWIC on 29th June, from the Strasbourg Summer School in Chemoinformatics (CS3-2026)), the authors show that while ECFP4 fingerprint fails, bioactivity-based machine learning models can give highly accurate predictions. This makes sense and they further explored when this happens, wrt 1NN performance, etc. This all ties back to the Modelability, SALI, and similar papers and I think it’s becoming more and more obvious that your next model should be validated specifically against activity cliffs.
Long List
Cheminformatics
Bayesian Active Learning to Accelerate High Throughput Phase Diagram Exploration
Computational Analysis of ELOVL6 Structure and Inhibition for Rational Drug Design
Measuring Differences in Protein Allosteric Graphs Constructed via Molecular Dynamics Simulations
Adaptive Vibrational Coordinates via Symmetry-Aware Normalizing Flows
Generation of Molecules Near the Applicability Domain Boundaries of Property Prediction Models
Geometric Structure-Aware Diffusion Model with Self-Optimization Strategy for Molecular Generation
AdsorPy: A Python Package for Lattice-Based Random Sequential Adsorption Simulations
Integrating multimodal features with deep learning for protein solubility prediction
Classification of Thyroid Peroxidase (TPO) Inhibitors Using Transfer Learning with SMILES Embeddings
Flow matching for fast, multi-purpose structure-based ligand generation for drug discovery
Chemical space visualization at scale: a survey of end-to-end pipelines and dataset-size archetypes
CM-MTL-DTI: Drug-Target Interaction Prediction via Cross-Modal Alignment and Multi-Task Learning
MedChem
Other
Palate Cleanser
YET ANOTHER WEEK HUH,
Manas






























