Abstract
Thousands of short open reading frames (sORFs) are translated outside of annotated coding sequences. Recent studies have pioneered searching for sORF-encoded microproteins in mass spectrometry (MS)-based proteomics and peptidomics datasets. Here, we assessed literature-reported MS-based identifications of unannotated human proteins. We find that studies vary by three orders of magnitude in the number of unannotated proteins they report. Of nearly 10,000 reported sORF-encoded peptides, 96% were unique to a single study, and 12% mapped to annotated proteins or proteoforms. Manual curation of a benchmark dataset of 406 manually evaluated spectra from 204 sORF-encoded proteins revealed large variation in peptide-spectrum match (PSM) quality between studies, with immunopeptidomics studies generally reporting higher quality PSMs than conventional enzymatic digests of whole cell lysates. We estimate that 65% of predicted sORF-encoded protein detections in immunopeptidomics studies were supported by high-quality PSMs versus 7.8% in non-immunopeptidomics datasets. Our work stresses the need for standardized protocols and analysis workflows to guide future advancements in microprotein detection by MS towards uncovering how many human microproteins exist.
| Original language | English |
|---|---|
| Pages (from-to) | 1241 |
| Journal | Nature Communications |
| Volume | 17 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 21 Jan 2026 |
Keywords
- Humans
- Proteomics/methods
- Benchmarking
- Mass Spectrometry/methods
- Open Reading Frames/genetics
- Peptides
- Molecular Sequence Annotation
- Proteins/genetics
- Databases, Protein
Fingerprint
Dive into the research topics of 'Community benchmarking and evaluation of human unannotated microprotein detection by mass spectrometry based proteomics'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver