This Week In Cheminformatics: Issue #022
Oral drug-likeness prediction, reaction discovery using AI-HTE and a long list of papers
A huge thank you to the now 200+ of you reading this every week :)
Highlights
Machine Learning-Based Soft Voting Ensemble Model for the Prediction of Oral Drug-Likeness of Chemical Structures
I wanted to highlight this paper by Petrosyan et al. introducing HADES, a soft-voting ensemble of five machine learning algorithms that quantifies oral drug-likeness using a combination of physicochemical and ADMET descriptors. To address the inherently limited size of approved drug datasets, the authors used Code Llama to extract structured information from clinical trial records, and then mapped them to PubChem SMILES to build a enriched, phase-annotated dataset. From a feature perspective, the whole-molecule descriptor BCUTd-1l and predicted CYP2C9 metabolic liability consistently emerged as primary drivers for the model (which is quite interesting to say the least). So, maybe next time let’s use this along with / instead of Ro5. Good read !
Accelerating Reaction Discovery through AI-HTE Integration: Nitrene-Mediated C–O Coupling as a Validated Case Study
Liu et al. caught my eye as it shows a "validation-forward" approach to using machine learning with high-throughput experimentation. Rather than using ML for post-hoc reaction optimization, the team built an XGBoost model using RDKit descriptors trained on a defined microscale matrix of phenolic substrates to prospectively predict yields for a ruthenium-catalyzed, nitrene mediated C–O coupling. They handled data leakage, evaluated the model on a component-disjoint phenolic set and a class-shift set of aliphatic alcohols, demonstrating the model captured _transferable_ steric and electronic features rather than simply interpolating shared reagents. By deploying the algorithm to rank unseen substrates under a standardized protocol, they reduced experimental effort and translated top predictions directly to preparative-scale carbamate synthesis without needing any substrate-specific re-optimization. Pretty cool !
Long List
Cheminformatics
Machine Learning for Superconductor Discovery: From Data-Driven Insights to Accelerated Design
Degree-Based Topological Indices and Machine Learning for QSPR Modeling of Arthritis Drugs
Adaptive Disorder as the Hallmark of Nanobodies Antigen-Binding Loops
Machine Learning for Raman Spectroscopy Glioblastoma Classification
Large-Scale Collaborative Assessment of Binding Free Energy Calculations for Drug Discovery Using OpenFE Structural Hotspot Conditioning for Diffusion-Based Molecular Design
Fragment, Entangle, and Consolidate: Strong Correlation through Bifold Quantum Circuits
NMR-Challenge for LLMs: Evaluating Chemical Reasoning in Humans and AI
Quantifying the Uncertainty of Molecular Dynamics Simulations: Good–Turing Statistics Revisited
ExPO: an exposure-conditioned neural operator for L1000 signature prediction
Comparative Evaluation of Explicit Solvent Models for RNA-Ligand Docking
MotifLeadDB: A Hierarchical Structural Data Set for Congeneric Ligand Binding Activity Change
A User-Tunable Machine Learning Framework for Step-Wise Synthesis Planning
ConGen: Targeted Molecule Generation Through Contrastive Learning and Latent Optimization
Emerging Insights into the Distinct Pharmacological Mechanisms of Buprenorphine
Wide and Cross GNNs: Cross‐Interactions and Parallel Scaling for Robust Chemical Property Prediction
Phenotypic AI-based design of cell-specific small molecule cytotoxics
PyRMD Studio: A Unified Suite for Next-Generation, AI-Powered Virtual Screening
3DOpt: Benchmark for Automated Design of 3D Molecular Structures across the Periodic Table
Pep2MARS: Automated Cyclic Peptide Parameterization for Molecular Dynamics and Compound Design
Novel molecular design via a scaffold-aware transformer with multi-scale attention mechanisms
CataCon: a contrastive graph representation learning framework for catalyst prediction
Conditional Diffusion Model-Based Method for Annotation of Antibiotic Resistance Gene Properties
CSCAN: Conformational Analysis of Macrocyclic Peptides through NMR Chemical Shifts
ProtSATT: An Advanced Protein Solubility Predictor Based on Attention Mechanism
B–Z DNA Transitions under Z–DNA-Favoring Conditions: Benchmarking the OL21-vdW7 Force Field
Revisiting ADMET prediction reliability under real-world challenges in the foundation model era
MedChem
Dual Metabolic Blockade in Pancreatic Cancer: Potent Anticancer Activity of Mitochondria-Targeted Glycolysis and OXPHOS Inhibitors Synchronous Rotation Dynamics in a Molecular Motor
Other
Strategic Applications of Single-Atom Skeletal Editing in Natural Product Synthesis
Emerging Roles of Photoredox Catalysis in Biomedical Research
Minimum-Excess-Work Guidance: Score-Based Sampling with Experimental Data or Sparse Restraints
Palate Cleanser
Best,
Manas

































