This Week In Cheminformatics: Issue #013
Saturn for sample-efficient molecular generation, minimum message length based fingerprint, AF3 for covalent virtual screening and a long list of papers.
Highlights
Compressing Chemistry Reveals Functional Groups
Sharma and King’s paper applies the Minimum Message Length principle to validate conventional functional groups. They developed an unsupervised algorithm, FGCOMPRESS, to identify valid SMILES substrings that maximally compress nearly three million ChEMBL molecules. What makes this compelling is that their results aligns almost perfectly with what most chemists would think about when we say functional groups. Beyond just validation, when these discovered substructures are used to generate unhashed, count-based vectors, they significantly outperformed MACCS, Morgan fingerprints, and even MolFormer-XL continuous embeddings in Ridge Regression across 24 bioactivity prediction data sets !
Discovery of Covalent Ligands with AlphaFold3
Shamir et al. evaluate AlphaFold3 for covalent virtual screening, demonstrating that it significantly outperforms classical docking algorithms like DOCKovalent and DOCK6. To quantify enrichment, the authors introduce COValid, a benchmark dataset comprising 874 active acrylamide protomers and 37,919 topologically dissimilar, property-matched decoys across ten target cysteine sites. Ranking AF3 cofolding predictions strictly by the minimum predicted aligned error (mPAE) yielded near-optimal classification of true binders, surpassing both traditional docking outputs and Rosetta-rescored AF3 models. This work is highly relevant for virtual screening workflows as it establishes mPAE as an effective empirical ranking function. It is worth noting the authors found mPAE to be unsuitable for precise cross-target selectivity prediction.
Sample-efficient generative molecular design using memory manipulation
Saturn by Guo and Schwaller is a Mamba state space model for goal-directed molecular generation. This paper focuses on sample efficiency. The authors demonstrate that combining Mamba with Augmented Memory’s experience replay and SMILES augmentation induces a “hop-and-locally-explore” sampling behavior. Under fixed multi-parameter optimization budgets, Saturn consistently outperforms strong baselines like GEAM, successfully generating molecules with strict QED and synthetic accessibility profiles in significantly fewer oracle calls. Good read !
Long List
Cheminformatics
Development of Reaction-Centered Encoders and Benchmarking of Enzyme-Reaction Pair Models
Enhancing Retrosynthesis Prediction with Distillation Learning
Looking back and to the future after four-plus years of language in chemistry
Assessment of molecular dynamics time series descriptors in protein-ligand affinity prediction
Reaction Optimization through Mechanistic Insight and Predictive Modelling
Enhancing High-Dimensional Neural Network Potentials Accuracy in OLED Systems via Element-Relabeling
Graph Neural Networks Model Based on Atomic Hybridization for Predicting Drug Targets
A light-weight Graph Neural Network for the prediction of 31P Nuclear Magnetic Resonance signals
Critical Assessment of a Structure-Based Pipeline for Targeting the Long Noncoding RNA MALAT1
Hybrid Graph–Machine Learning Framework for Accurate and Interpretable Band Gap Prediction
LiBRe: A Ligand-Aware Sequence-Based Binding Residue Prediction Model for Virtual Screening
Rapid Machine Learning-Driven Detection of Pesticides and Dyes Using Raman Spectroscopy
Computational Mapping and Targeting of BK Channel Protein–Protein Interactions in Breast Cancer
SurfSol: A Multimodal Surface-Based Deep Learning Framework for Protein Solubility Prediction
KOC-WebPredictor: An Open-Access Tool for Prediction and Insights into Soil Sorption
Multiscale-Aware Graph Embedding Approach Uncovers LC-61, a Potent Anti-Leishmania infantum Compound
MedChem
Structural and Kinetic Basis for the Rational Design of Next-Generation β-Lactamase Inhibitors
Normalizing Covalent Potency for Electrophilicity with Ligand Reactivity Efficiency
Other
Palate Cleanser
all quiet on the frontal lobe,
Manas


















