Hugh Kim
In my first semester of college, the most challenging aspect was getting accustomed to scientific terminology. There’s a famous joke in the STEM community that revolves around the specialized jargon used by scientists and engineers. Over time, you naturally get used to it, but initially, it does require some memorization, and taking tests on these terms can be a very effective learning method. That’s why I frequently include terminology questions as the first question in introductory chemistry courses. Back in the 1990s, when I was an undergraduate, the first question in quizzes and exams was often about defining terms. Among the many terms I encountered back then, one that felt particularly odd was “protein folding.” Although it felt odd at first, after over 20 years in the scientific field, I now see this term as the perfect way to capture the distinctive characteristics of proteins.
Protein folding is the process by which each protein acquires its unique three-dimensional structure to perform its specific function in the body. The three-dimensional structure of a protein forms through four stages: primary, secondary, tertiary, and quaternary structures. The primary structure is the sequence of amino acids in the polypeptide chain. The secondary structure refers to the local three-dimensional configurations of the peptide chain, depending on the amino acid sequence, such as alpha-helices, beta-sheets, and random coils. The tertiary structure is the geometrical shape that results from interactions among secondary structure elements within the protein, marking the emergence of each protein’s distinctive shape. The quaternary structure is formed when these tertiary-structured proteins interact with each other. Many proteins achieve their functional properties only after forming a quaternary structure.
Insulin, discovered over a century ago, is a critical protein regarded as humanity’s savior from diabetes. Human insulin consists of an A-chain peptide with 21 amino acids and a B-chain peptide with 30 amino acids (primary structure). The A-chain peptide consists of 47% alpha-helices and 49% random coils, while the B-chain peptide has 46% alpha-helices and 51% random coils (secondary structure). These chains are connected by two disulfide bonds, with an additional disulfide bond within the A-chain peptide, giving the protein a globular shape (tertiary structure). In aqueous solutions, insulin can form dimers or tetramers and, in the presence of Zn2+ ions, assemble into hexamers (quaternary structure).

Research on the complex phenomenon of protein folding, from primary to quaternary structures, is carried out using various techniques. The primary structure analysis differs based on the type of sequence information available, such as DNA, RNA, or protein. After searching extensive databases for similar proteins, a strategy for experimental analysis is decided. Software tools and web services play a major role in this process, where DNA, RNA, and protein sequences are analyzed to understand the connections between proteins and genes. The combination of electrospray ionization (ESI), high-performance liquid chromatography (HPLC), and mass spectrometry (MS), known as LC-MS, brought revolutionary advances to protein amino acid sequence analysis in the 1990s. LC-MS identifies amino acid sequences in peptides based on the mass differences among the 20 amino acids. The protein is hydrolyzed into smaller peptides using proteolytic enzymes, then separated by HPLC and converted to gas-phase ions through ESI. After obtaining mass data on these peptides via mass spectrometry, the amino acid sequence is deduced, allowing us to elucidate the primary structure of the protein. Nowadays, research extends beyond analyzing amino acid sequences to performing quantitative proteomics within cells and applying informatics analysis to understand protein interactions, which plays a central role in the field of systems biology.
The secondary structure of proteins is mainly determined through spectroscopic techniques. Different molecular transitions occur depending on interactions among amino acids when exposed to light. The most common techniques are infrared (IR) spectroscopy and circular dichroism (CD) spectroscopy. IR spectroscopy mainly identifies peptide backbones through absorbance differences based on structure, with specific attention to the amide groups. A challenge, however, is that water molecules also absorb IR, causing interference with protein spectra. Most IR research is thus conducted in deuterium oxide (D2O) to avoid spectral interference. CD spectroscopy is widely used to study protein structures in solution. Structural elements in the protein, depending on how the amide chromophores are aligned in the peptide backbone, produce characteristic CD spectra. However, CD spectroscopy may lack precision when analyzing proteins with a mix of alpha-helices and beta-sheets or proteins rich in beta structures.
The tertiary and quaternary structures of proteins are also analyzed using spectroscopic techniques. Traditional methods like X-ray crystallography and nuclear magnetic resonance (NMR) are frequently used. X-ray crystallography provides atomic-level resolution but requires the protein to be crystallized and does not reveal dynamic information about the protein. On the other hand, NMR offers dynamic information but has limited resolution and is only suitable for small proteins. Small-angle X-ray scattering is also commonly used for analyzing tertiary and quaternary structures and dynamics but is limited by its lower resolution. Cryo-electron microscopy offers the advantage of observing tertiary and quaternary structures with very high resolution. By rapidly cooling proteins to prevent structural disruption by water crystallization, cryo-EM enables the observation of various monomeric and complex forms of proteins.
Experimentally determining the tertiary and quaternary structures of proteins is still very expensive and challenging. Often, it involves trial and error and the use of complex methods. However, if we could predict a protein’s 3D structure based on its amino acid sequence information, it would reduce unnecessary trial and error. To achieve this, scientists have been developing methods to predict protein 3D structures using computational simulations. Notably, the AI-based method awarded the 2024 Nobel Prize in Chemistry for predicting protein structures has made significant contributions to protein research.
After years of effort, we now have methods to study the complex process of protein folding, allowing us to unravel the structure of many proteins. These achievements have brought direct benefits to humanity. For example, when the COVID-19 pandemic started in December 2019, scientists decoded the virus’s genetic information within a month, understanding the primary structure of its spike protein. Using this information, they applied cryo-EM to determine its tertiary and quaternary structures, which were then rapidly shared with the academic community. Based on this precise structural information, pharmaceutical companies and research institutes worldwide quickly began vaccine development, enabling the first vaccines to be completed within a year of the pandemic’s onset. Although various mutations have posed ongoing challenges, we continue to adapt using our advanced techniques.
Please visit the Hugh Kim Research Group homepage.
References
1. Dill, K. A., & MacCallum, J. L. Science 2012, 338 (6110), 1042-1046
2. Hua et al. Nat Struct Biol 1995, 2 (2), 129-138; Timofeev et al. Acta Cryst F 2010, 66 (3), 259-263
3. Hjorth et al. J Pharm Sci 2016, 105 (4), 1376-1386
4. Nettleton et al. Biophys J 2000, 79 (2), 1053-1065
5. Chang et al. Biochemistry 1997, 36 (31), 9409-9422; Mukherjee et al. J Phys Chem B 2018, 122 (5), 1631-1637
6. Aebersold, R., & Mann, M. Nature 2003, 422 (6928), 198-207
7. Barth, A. Biochim Biophys Acta Bioenerg 2007, 1767 (9), 1073-1101
8. Greenfield et al. Nat Protoc 2006, 1 (6), 2876-2890; Micsonai et al. Proc Natl Acad Sci USA 2015, 112 (24), E3095-E3103
9. Seffernick et al. J Chem Phys 2020, 153 (24), 240901
10. Walls et al. Cell 2020, 181 (2), 281-292
11. https://www.nobelprize.org/prizes/chemistry/2024/press-release/

Leave a comment