This Week In Cheminformatics: Issue #001
Holiday highlights you might have skipped. Plus, why I’m starting this.
Yet Another Newsletter...Why ??
Science Twitter has fragmented (you know why), and our feeds on LinkedIn and BlueSky have become unreliable for discovery. Finding high-quality research shouldn’t require navigating social media or sifting through humblebrags. This newsletter is simply an effort to keep track of relevant work without the noise of algorithmic feeds.
This Week in Cheminformatics is (supposed to be a) weekly curation of publications, preprints, code, and datasets. The scope covers cheminformatics, molecular modeling, and machine learning for drug discovery. The focus here is utility over engagement, so do with these links what your heart desires. Every Monday, I will share the previous week’s technical updates alongside any specific items that catch my eye.
Links
Generative inverse design of RNA structure and function with gRNAde
Chemprop v2: An Efficient, Modular Machine Learning Package for Chemical Property Prediction
ChemFM as a scaling law guided foundation model pre-trained on informative chemicals
SynGFN: learning across chemical space with generative flow-based molecular discovery
Site-Selective Protein Modification via Peptide-Directed Proximity Catalysis
Generalized DeepONets for viscosity prediction using learned entropy scaling references
Predicting PROTAC-mediated ternary complexes with AlphaFold3 and Boltz-1
MolEncoder: towards optimal masked language modeling for molecules
Leveraging large language models for enzymatic reaction prediction and characterization
Database mining of ZINC15 natural compounds reveals potential thyroid receptor β agonists for NAFLD management: an in silico study
Multi-modal contrastive learning for chemical structure elucidation with VibraCLIP
Retrosynformer: planning multi-step chemical synthesis routes via a decision transformer
Mol2Raman: a graph neural network model for predicting Raman spectra from SMILES representations
SLAB: simultaneous labeling and binding affinity prediction for protein–ligand structures
Hierarchical Attention Graph Learning with LLMs Enhancement for Molecular Solubility Prediction
Multi-agentic AI framework for end-to-end atomistic simulations
Explainable Active Learning Framework for Ligand Binding Affinity Prediction
Computer vision for high-throughput materials synthesis: a tutorial for experimentalists
Kinetic predictions for SN2 reactions using the BERT architecture: Comparison and interpretation
An exploration of dataset bias in single-step retrosynthesis prediction
One-Class Genetic Algorithm for Authentication Analysis of Spectrochemical Data
Fractional Kinetic Modelling of the Adsorption and Desorption Processes From Experimental SPR Curves
Peptide-Tools–Web Server for Calculating Physicochemical Properties of Peptides
Leveraging Consensus Docking Approaches for Human Mitochondrial Complexes I and III
Path-Based Graph Neural Network for Drug Synergy Prediction and Interpretation
Deciphering DNA’s Sequence-Dependent Structure and Deformability with Normalizing Flows
Gaussian process emulation for exploring complex infectious disease models
Cell-DINO: Self-supervised image-based embeddings for cell fluorescent microscopy
A data-driven biology-based network model reproduces C. elegans premotor neural dynamics
ViVo: A Temporal Modeling Framework That Boosts Statistical Power and Minimizes Animal Usage
CalVSP: a program for analyzing the molecular surface areas, volumes, and polar surface areas
ChemTSv3: Generalizing Molecular Design via Flexible Search Space Control
StereoMolGraph: Stereochemistry-Aware Molecular and Reaction Graphs
EDWARD: E(3)-Equivariant Dual-Way Attentive Reduction for Peptide-to-Small-Molecule Design
MethylMSI: Prediction of microsatellite instability based on DNA methylation profile and SVM model
RxnNet: An AI Framework for Reaction Mechanism Discovery - A Case Study of Carbocations
Guiding Ligand Selection in Copper-Catalyzed Cross Couplings via Principal Component Analysis
Graph-Based Internal Coordinate Analysis for Transition State Characterisation
Machine Learning-Guided Scope Selection to Balance Performance and Substrate Similarity
ASPEN: Robust detection of allelic dynamics in single cell RNA-seq
RetroScore: graph edit distance-guided retrosynthesis for accessibility scoring with route metrics
qMol: A Web Server for Efficient Molecular Queries Using Fragment-Based Reduced Graphs
Molecules in Wikipedia: Analysis of Their Chemical Diversity, Functional Roles, and Popularity
Archive
I’m quite interested in meaningful benchmarks. So, here are some standout papers on model validation and modellability I’ve read this year. This isn’t an exhaustive list, so please send over any of your own favorites that I missed.
Roughness of Molecular Property Landscapes and Its Impact on Modellability
k Nearest Neighbors QSAR Modeling as a Variational Problem: Theory and Applications
Study of Data Set Modelability: Modelability, Rivality, and Weighted Modelability Indexes
Kernel Target Alignment Parameter: A New Modelability Measure for Regression Tasks
Shallow Representation Learning via Kernel PCA Improves QSAR Modelability
Comparative Studies on Some Metrics for External Validation of QSPR Models
Comments on the Definition of the Q2 Parameter for QSAR Validation
Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning
Metric Validation and the Receptor-Relevant Subspace Concept
Statistical Confidence for Variable Selection in QSAR Models via Monte Carlo Cross-Validation
Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.
Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data
Palate Cleanser
Have a wonderful New Year,
Manas










