This Week In Cheminformatics: Issue #025
BigSMILES canonicalization, Cleavage Rules Using SMIRKS Heuristics, EasyDock 1.3, and a long list of papers
Highlights
A Canonical Text Representation for Polymers via BigSMILES and Tree Automata
Handling stochastic polymer representations in databases just got significantly more manageable thanks to a new paper from the Olsen lab on BigSMILES canonicalization. BigSMILES successfully encodes the structural connectivity of complex polymer ensembles, but, its inherent string degeneracy makes routine graph-based searches computationally expensive. The authors present a robust algorithm that maps linear and branched BigSMILES strings onto tree automata state machines that recognize branch points. It then minimizes these branch points into unique graphs, and translates the transition rules back into human-readable, canonicalized strings. Fun read !
EasyDock 1.3: An Automated Pipeline for Molecular Docking
Minibaeva et al. present EasyDock 1.3 as a useful tool for large-scale virtual screening campaigns. What makes this release particularly relevant is its complete transition to an open-source ligand preparation pipeline, replacing commercial dependencies with tools like Uni-pKa and MolGpKa for thermodynamic-aware protonation state assignment. Crucially, this pipeline natively integrates PoseBusters and ProLIF for automated post-docking validation and protein-ligand interaction fingerprinting, storing the geometric pass/fail flags and interaction data directly in a SQLite database. I think I’m going to switch to this, goodbye DockingPie, you had a good run.
CRUSH—Cleavage Rules Using SMIRKS Heuristics: an enhanced molecular fragmentation algorithm
López-López et al. introduce CRUSH (Cleavage Rules Using SMIRKS Heuristics) as a comprehensive, chemistry-aware bond disconnection strategy. This method applies 33 curated SMIRKS based rules across all eligible bonds at each fragmentation step. The authors benchmarked CRUSH across five diverse datasets including natural products and macrocycles and demonstrated that it consistently shows smaller, Rule of Three-compliant fragments that cover regions of chemical space untouched by existing methods.
Long List
Cheminformatics
Developing a Machine-Learning Interatomic Potential for Non-Covalent Interactions in Proteins
SpaceExpander: An Automated System for Drafting Markush Claims to Expand Chemical Space
Back to the Future of Lead Optimization: Benchmarking Compound Prioritization Strategies
Generative flow model on distance geometry for predicting transition states of chemical reactions
Unified Topological Framework for Representation and Construction of Generalized Carbon Nanobelts
Atomic-level protein–ligand recognition with PBCNet2.0 for probe discovery
AI-Enforced Ultra-Large Virtual Screening Discovers Potent CD28 Binders
Band Gap Prediction of Two-Dimensional Materials Using a Gradient-Boosted Feature Selection Approach
Predicting enantiomer migration order of levobunolol via sequential computational modeling
Accelerated Sampling of Protein Dynamics Using BioEmu-Augmented Molecular Simulation
Discovering CO2–Reactive Carbanions via Property-Guided Generative AI
HQMol: Hierarchical Fusion and Query-Guided Alignment for Molecular Graph-Language Modeling
Determination of bonding radii from small-molecule crystal structures
Synergistic Protein–Protein and Protein–Lipid Interactions Drive SARS-CoV-2 Envelope Assembly
Librarian of Alexandria: A Modular Chemical Data Extraction Pipeline to Compare LLM Performance
KNexPHENIX: A PHENIX-Based Workflow for Improving Cryo-EM and Crystallographic Structural Models
Automating Computational Chemistry Workflows via OpenClaw and Domain-Specific Skills
The Systematic Study of Spatially Conserved Salt Bridges in Protein
Generative pretraining for drug molecule design with bidirectional structure-property optimization
Implementation and Validation of Titratable Cysteine in GROMACS-Based Constant-pH Molecular Dynamics
MedChem
Structure-Based Discovery of Potent BCL-XL Inhibitors through Rescaffolding
Structure‐Based Design of Isoxazolidine RIPK1 Inhibitors for Neuroinflammation
Other
Tracking Gene Expression of Single Mitochondria in Live Neurons Using Nanotweezers
Burst-Mode Near-Infrared Chemiluminescent Probes for In Vivo Imaging
Palate Cleanser
stay hydrated,
Manas




















