This Week In Cheminformatics: Issue #021

conformal prediction for molecular retrieval from MS2, how expensive should your conformer generation workflows be and a long list of papers

May 18, 2026

Highlights

Conformer Generation Workflows for COSMO-RS Calculations: Are They All the Same?

Gomes et al. systematically evaluate whether expensive conformer generation is strictly necessary for COSMO-RS thermodynamic predictions. The authors tested four pipelines. They used RDKit ETKDG/MMFF94 to compute-heavy DFT and semiempirical GOAT workflows across 926 experimental solubility data points. Surprisingly, for general high-throughput screening of typical drug-like molecules, the RDKit-Direct (ETKDGv3), as the authors call it, performed comparably to the complex pipelines, with mean absolute errors around 0.90 log units. Authors note that for highly flexible molecules with more than five rotatable bonds or systems prone to solvent-dependent intramolecular hydrogen bonding, choice of method dictates the result drastically.

Reliable Molecular Retrieval from Mass Spectra Using Conformal Prediction

This paper by Rakhshaninejad et al. applies conformal prediction to candidate-based molecular retrieval from LC-MS/MS data. Usually, retrieval pipelines are evaluated using metrics like top-k accuracy, which summarize performance at the dataset level but fail to provide a spectrum-specific reliability statement. To address this, the authors construct prediction sets that contain the true molecule with a user-specified probability, effectively quantifying uncertainty for individual spectra. They evaluated marginal and conditional conformal prediction across in-distribution, partially shifted, and fully out-of-distribution scenarios on the MassSpecGym benchmark. As this framework operates directly on the output scores of a retrieval model without requiring any architectural modifications, it is a good read for anyone looking to implement uncertainty quantification in their annotation pipelines.

Long List

Cheminformatics

MedChem

Other

Host–Guest Complexation through Geometric Self-Optimization in 12Cycloparaphenylene

Palate Cleanser

The Story of the Woodward–Hoffmann Rules — Very cool !!!

Luca Dellanna@DellAnnaLuca

The reactions of many researchers on finally being held responsible for having read the very paper they submitted are... something.

12:13 PM · May 16, 2026 · 299K Views

271 Replies · 887 Reposts · 8.81K Likes

François Fleuret@francoisfleuret

Awesome. Seriously, people are harsh with this platform, but if you are careful with whom you follow, it is a constant stream of awesomness.

Eric Jang @ericjang11

For the last few months I've been working on a from-scratch implementation of AlphaGo, a 2016 AI breakthrough that inspired me to get into deep learning. My casual understanding of AlphaGo was "search-augmented deep neural networks trained with self-play", but I wanted to go

9:41 AM · May 16, 2026 · 138K Views

13 Replies · 45 Reposts · 1.17K Likes

Jamie Birch@birch_js

Yeah good luck searching for "c" in your codebase a month from now

Jamon @jamonholmgren

Unpopular opinion: 1- or 2-letter variable names in focused, obvious contexts are totally fine.

12:44 AM · May 15, 2026 · 616K Views

311 Replies · 294 Reposts · 11.6K Likes

LaurieWired@lauriewired

30.9% of genetics papers data are kind of trash because of Excel’s aggressive auto-formatting. Until 2023, there was no global option to disable data conversion. For example, the human SEPT family (1-14) of genes is directly related to cell division and cancer research. I’ll

5:20 PM · May 13, 2026 · 105K Views

106 Replies · 299 Reposts · 2.69K Likes

andrew arruda@andrewarruda

“E-mails 'hurt IQ more than pot'” - CNN, Friday, April 22, 2005

unusual_whales @unusual_whales

"Prolonged AI use may make it harder to think critically and creatively," per the Economist

2:29 PM · May 13, 2026 · 285K Views

103 Replies · 500 Reposts · 4.12K Likes

Jack Fields@OrdinaryInds

The AI bros are discovering CSS stylesheets. What a fucking time to be alive.

Nicolas Bustamante @nicbstme

A lot of people are arguing that HTML burns more tokens than markdown. It's true but you can save at least 40% by externalizing the CSS to a template with <link rel="stylesheet" href="./styles.css">. This style.css is your formatting so the LLM will never output CSS again. I

4:48 PM · May 12, 2026 · 348K Views

73 Replies · 364 Reposts · 6.03K Likes

Sofia Sanchez@SofiasBio

Chat, they're already working on better animations !!! it's all good now😅

1:32 PM · May 13, 2026 · 813 Views

1 Reply · 15 Likes

Imran S. Haque (@ihaque@{bsky,genomic}.social)@ImranSHaque

me whenever i dip my toes back into quantum chemistry

7:18 PM · May 11, 2026 · 367 Views

spidey@lochan_twt

“We used to memorize the whole syntax”

4:57 AM · May 11, 2026 · 103K Views

53 Replies · 448 Reposts · 6.52K Likes

Nic Barker@nicbarkeragain

One of the biggest problems with using LLMs as a google replacement for programming, is that getting zero relevant results on google used to be a signal that you had the wrong idea about the root cause. Whereas LLMs will happily indulge any terrible idea you suggest.

4:20 AM · May 11, 2026 · 194K Views

140 Replies · 621 Reposts · 10.2K Likes

Best,
Manas

Discussion about this post

No posts

Ready for more?

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts