Hugh Kim & Soo Yeon Chae
Proteogenomics research is a field that utilizes genomic and proteomic data to diagnose diseases and predict prognoses. It refers to all studies that analyze and interpret proteomic data using genomic information. But how is proteome analysis conducted? Where and how are the proteins for analysis obtained? To answer these questions, we need to examine the overall experimental process of proteogenomics research. In this post, we will discuss protein preprocessing, detection, and advancements in these techniques.
If the protein of interest is inside a cell, the process of breaking the cell (cell lysis) is necessary to obtain it. Breaking a cell means disrupting the cell membrane. There are two main methods for breaking cell membranes: physical and chemical methods. Physical methods involve damaging the cell membrane using ultrasonic homogenizers, French presses, or bead mills. Chemical methods involve using lysis buffers to chemically degrade and break down the cell membrane and nucleus. However, if the target protein is attached to the cell membrane, DNA, or embedded in bone tissue, special extraction conditions are required. For example, to obtain transcription factor proteins that bind to DNA, deoxyribonuclease (DNase) is used to degrade DNA and isolate the proteins. To extract proteins from bone tissue, acids such as hydrochloric acid can be used to break down the bone. Research on efficiently extracting proteins from various environments is actively ongoing, allowing for a broader range of protein analyses.
Most proteome analyses are conducted using mass spectrometry. The method of analyzing intact proteins separated through the aforementioned protein separation techniques is known as the “top-down” approach. This approach is used for research purposes such as measuring post-translational modifications (PTMs) and proteoforms. Studies have analyzed proteins up to 200 kDa (where a single carbon atom weighs 12 Da) and entire proteomes consisting of over 1,000 proteins. Although the top-down approach is attractive, it presents technical challenges, such as difficulty in efficiently separating individual proteins and the challenge of analyzing large proteins using mass spectrometry. Due to these difficulties, many proteomics studies adopt the “bottom-up” approach, in which proteins are digested into peptides using proteolytic enzymes before analysis. As explained in the previous chapter, proteomics involves identifying peptides using liquid chromatography-mass spectrometry (LC-MS) to piece together the protein puzzle and ultimately complete the proteome map.

Since the proteome is a collection of various proteins, it is inherently complex. When a single protein is broken down into multiple peptides, this complexity increases even further. Fortunately, using specific proteolytic enzymes that cleave at particular amino acid residues allows for the predictable generation of peptides from proteins. While still complex, this makes the analysis more manageable. One of the most commonly used enzymes in proteome analysis is trypsin, which cleaves at basic residues such as arginine and lysine. Other enzymes, such as Lys-C, which cleaves lysine, and Glu-C, which cleaves acidic residues like glutamic acid and aspartic acid, can be used alongside trypsin to enhance protein digestion efficiency.
Peptide separation is necessary before peptide analysis. In the early days of proteomics research, two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) was used for analysis. This method separates proteins based on their molecular weight and charge using a polyacrylamide gel. While intuitive and capable of observing PTMs, it has limitations in detecting small-mass proteins or peptides and lacks sensitivity and resolution. These limitations have been overcome with advancements in liquid chromatography (LC).
LC involves passing an analyte-containing liquid mobile phase through a separation column filled with porous stationary phase particles, allowing separation based on differences in adsorption characteristics or affinities. For example, in C18 chromatography, the stationary phase consists of 18-carbon hydrocarbon chains, which interact strongly with hydrophobic molecules. As a result, more hydrophobic peptides remain in the stationary phase longer and elute later in the liquid mobile phase (typically an aqueous solution). This enables separation based on peptide hydrophobicity. LC-based peptide separation minimizes sample loss and offers better resolution, making it the most widely used separation method today. The resolution of LC improves when the column’s internal diameter is smaller and the stationary phase particles are finer. However, this also makes it harder for the liquid mobile phase to flow through. To enhance LC efficiency, high-performance liquid chromatography (HPLC) was developed, using columns with diameters of 1–5 mm and stationary phase particles sized 1.8–5 μm, with mobile phases pushed through at pressures of tens to hundreds of bars. More recently, ultra-performance liquid chromatography (UPLC) has been developed, using particles as small as 1–1.5 μm and pressures nearing 1,000 bars, enabling high-resolution peptide separation.
At this point, peptides obtained from cell or tissue-extracted proteins have completed the separation process. The next step is mass spectrometry (MS) analysis. Mass spectrometry is a technique that analyzes ionized analytes in the gas phase based on their mass-to-charge ratio (m/z). This method provides mass information about peptides, which is used to determine their amino acid sequences.
To be continued in the next post.
Please visit the Hugh Kim Research Group homepage.
References
1. Zhang, Y. et al., Chem. Rev. 2013, 113 (4), 2343-2394.
2. Han, X. et al., Science 2006, 314 (5796), 109-112.
3. Tran, J. C. et al., Nature 2011, 480 (7376), 254-258.
4. Yates III, J. R., J. Mass Spectrom. 1998, 33 (1), 1-19.
5. Wu, Z. et al., Anal. Chem. 2018, 90 (16), 9700-9707.
6. Tran, B. Q. et al., J. Proteome Res. 2011, 10 (2), 800-811.
7. Wilm, M. et al., Anal. Chem. 1996, 68, 1.
8. Lane C. S. Cell. Mol. Life Sci. 2005, 62, 848-869
9. Saba, J. et al., Proteome Res. 2009, 8, 3355.

Leave a comment