This Week In Cheminformatics: Issue #012
AF2BIND, Automated R Group Recognition, Cyclovoltammetry KG and a long list of papers
Highlights
AF2BIND: predicting small-molecule binding sites using the pair representation of AlphaFold2
An interesting read this week by Gazizov et al. proposes AF2BIND, a method that creatively repurposes AlphaFold2's pair representation to predict small-molecule binding sites de novo. Instead of relying on homology modeling, MSAs, or probe spheres, the authors feed the target structure alongside 20 disconnected "bait" amino acids to extract pairwise attention features. They then train a straightforward logistic regression classifier on these embeddings to compute a per-residue binding probability. The activation of specific baits correlates neatly with the physicochemical properties of compatible ligands. While the model still predictably struggles with highly cryptic, collapsed pockets in unbound states, it successfully identified thousands of novel, druggable sites across the human proteome missed by standard structural tools like P2Rank. It's a computationally inexpensive and pragmatic way to squeeze pocket discovery utility out of existing structure prediction weights. Good read!
RGReco: a unified framework for automated R-group recognition in chemical publications
Extracting Markush structures and variable R-groups from literature has always been a bottleneck, but this new paper introducing RGReco offers a highly pragmatic, multistage pipeline to handle this. What makes this framework interesting is how it systematically tackles the heterogeneity of R-group representations, specifically tables, text lists, and graphical substituent structures, by chaining together specialized open-source tools. It uses DECIMER for segmentation, Surya for layout parsing, and MolScribe for optical chemical structure recognition. They use a fine-tuned YOLOv11-OBB model to detect tricky attachment points and superscript/subscript identifiers. The pipeline achieves an 82.9% F1 score on a newly curated benchmark dataset. It is a well-engineered, rule-augmented deep learning solution that I highly recommend reading if you are building automated data extraction workflows for chemical databases.
Database Utility for Cyclovoltammetry Knowledge (DUCK): Unified Platform for Electrochemical Data
Garay-Ruiz et al. introduce DUCK, Database Utility for Cyclovoltammetry Knowledge. The authors standardize cyclic voltammetry measurements by mapping them to an EMMO-based ontology and converting the metadata into queryable knowledge graphs. DUCK is open-sourced with automated graph generation and SPARQL-enabled visualization interface. They demonstrate its utility across 130 traditional lab measurements and 79 automated runs from self-driving labs performing Bayesian optimization. I recommend reading this if you are interested in FAIR data implementation & electrochemistry.
Long List
Cheminformatics
SynFrag: Synthetic Accessibility Predictor Based on Fragment Assembly Generation in Drug Discovery
Assessment of quantum chemical predictors for anti-colorectal cancer agents using QSAR modeling
Advances in computational prediction of RNA-small molecule binding affinity
Automated Force Field Developer and Optimizer Platform: Torsion Reparameterization
Synthesis planning in reaction space: a study on success, robustness and diversity
ncProFormer: A CNN-enhanced Transformer for ncRNA Coding-Potential Prediction
Siamese graph neural networks for melting temperature prediction of molten salt eutectics
MAESD: A Unified Multi-Agent Evolutionary Framework for Protein Sequence Design
An Active Learning Algorithm for Identifying Transition States on a Potential Energy Surface
Molecular embedding-based algorithm selection in protein-ligand docking
MedChem
Other
What is the Diatomic Molecule with the Largest Dipole Moment?
Jeweler-in-the-loop: personalized alloy color optimization via preference-based BO
Palate Cleanser
wish me luck for asymmetric organic chemistry exam,
Manas































