• Login & order NMR service now
  • NMR service Login & order NMR service now NMR service NMR chromatography service Why use our superior service Contact us The NMR team How to submit samples Use the instruments yourself Terms & conditions
  • Assignment of 1H-NM…
  • What is NMR What is NMR Uses of NMR Basis of NMR Chemical shift Spin-spin coupling
  • Techniques Techniques 1 H NMR 2D NMR Relaxation Multinuclear Semi-solids Solid state
  • Apps Apps Solvent shifts NMR thermometer Reference frequency
  • Guides Guides Measuring a 1 H spectrum on the old 500 Measuring a 1 H spectrum Measuring other nuclei Measuring 2D NMR Measuring diffusion Measuring relaxation Measuring solid & semi-solid
  • Terms & conditions

Assignment of 1 H-NMR spectra

On this page we will deal with how to interpret an NMR spectrum. The meaning of assignment in the title is to assign each peak to a proton in the molecule under investigation. The examples here are of 1D proton assignments. For more complex examples, see the 2D assignments of 12,14-di t butylbenzo[g]chrysene and cholesteryl acetate .

In the example in fig. 1 of isopropyl- β -D-thiogalactopyranoside (shown without the hydrogens for simplicity – each carbon has four bonds, click here to see the molecule with hydrogens ), all the hydroxyls have been exchanged with the deuterium oxide solvent to deuteroxyls. Therefore, the hydroxyl signals do not appear in the spectrum and do not couple with the other signals, making the spectrum simpler.

Fig. 1. 1 H-NMR spectrum of isopropyl- β -D-thiogalactopyranoside in D 2 O

From the integrals, we see that there are two multiples of three, one of which has tall sharp signals so very likely corresponds to the two methyl (CH 3 ) signals. The remaining signals are expected to yield integrals of one so the integrals of three and four are overlapping. H6 is expected to yield two separate signals because they are diasteriomeric (if one of them is exchanged with another group, the attached carbon would be optically active. This fact affects their chemical shift and they differ magnetically - If you don't understand this, don't worry, just take it form granted for now).

From the chemical shifts we see that what we suspect are methyls have the appropriate chemical shift and the remaining signals fall in the overlapping CH and CH 2 regions as expected. If you are an experienced sugar chemist you will know that the signal with the highest chemical shift is usually the anomeric signal (H1 – the hydrogen connected to the carbon next to the sugar ring-closing oxygen).

The coupling patterns can be used to continue the analysis. You could be forgiven for thinking that the methyl signals display an AXY coupling pattern. However, they only couple with the single i Pr proton so should yield an AX pattern. The reason is that the methyls (labeled MeA and MeB) are diasteriomeric so have different chemical shifts (not magnetically identical, just like the H6 protons). The result is two overlapping AX patterns (fig. 2).

Fig. 2. The methyl doublets of isopropyl- β -D-thiogalactopyranoside in D 2 O

The i Pr proton is coupled to six methyl protons yielding a septet (fig. 3).

Fig. 3. The i Pr septet of isopropyl- β -D-thiogalactopyranoside in D 2 O

The anomeric H1 is coupled to H2 yielding an AX doublet (fig. 4).

Fig. 4. The anomeric H1 doublet of isopropyl- β -D-thiogalactopyranoside in D 2 O

H4 has an unusually small coupling to H5 (this occurs when the two CH bonds are approximately at right-angles to each other), so small that it is not observed in a normal spectrum. So H4 displays an AX pattern instead of the expected AXY pattern although the peaks are slightly broad indicating the missing coupling (fig. 5).

Fig. 5. The H4 multiplet of isopropyl- β -D-thiogalactopyranoside in D 2 O

H3 couples with both H2 and H4 and yields the expected AXY pattern. While H5, H6A and H6B have very similar chemical shifts and stong coupling that combine to yield very strongly second order coupled ABC pattern that is difficult to analyze (fig. 6).

Fig. 6. The H3 and the overlapping H5, H6A and H6B multiplets of isopropyl- β -D-thiogalactopyranoside in D 2 O

In the example of trans -geraniol in fig. 7 (shown without the hydrogens for simplicity – each carbon has four bonds, click here to see the molecule with hydrogens ), proton-5 (H5) is coupled and therefore split by proton-4 (H4); H8 and H9 represent two protons each that are split by each other into triplet AX 2 patterns; and H2 is split into four by the three protons at H1 and the resulting quartet is split again by H3. However, second order coupling distorts the multiplets making the assignment more difficult.

Fig. 7. Part of the 1 H-NMR spectrum of trans -geraniol in CDCl 3

We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.

Analysis & Separations

Stay up to date on the topics that matter to you

NMR Spectroscopy Principles, Interpreting an NMR Spectrum and Common Problems

In this article, we consider how nuclear magnetic resonance spectroscopy works, what it tells you and how it is used..

Héctor Zamora Carreras, PhD image

Complete the form below to unlock access to ALL audio articles.

Nuclear magnetic resonance (NMR) spectroscopy is a physicochemical technique used to obtain structural information about molecules. It is based on the physical phenomenon of magnetic resonance that was first demonstrated by Isidor I. Rabi in 1938. In the 1940s, two research groups independently obtained the first successful measurements of NMR in condensed matter. The two principal investigators of these groups, Felix Bloch from Stanford University and Edward M. Purcell from Harvard University, were jointly awarded with the Nobel Prize in Physics in 1952 for their contributions to the field of magnetic resonance. 1 , 2 , 3

Since those early days, NMR spectroscopy progressed concurrently with advances in many other fields, such as mathematics, physics and informatics. In the 1960s, the implementation of superconducting magnets and computers to NMR equipment opened the door to a great improvement in sensitivity and the possibility to design new types of NMR experiments. As a consequence, scientists have developed a myriad of novel methodologies to study complex systems, such as membrane proteins, metabolically complex samples, or even biological tissues. NMR spectroscopy has become one of the most powerful techniques for structural determination of chemical species, as well as for the study of molecular dynamics and interactions. 4 , 5

What is NMR?

NMR spectroscopy is a physicochemical analysis technique that is based on the interaction of an externally applied radiofrequency radiation with atomic nuclei. During this interaction there is a net exchange of energy which leads to a change in an intrinsic property of the atomic nuclei called nuclear spin .

The nuclear spin is defined by a quantic number ( I ), which varies depending on the considered isotope. Only atomic nuclei with I ≠ 0 are detectable by NMR spectroscopy (NMR-active nuclei, such as 1 H, 2 H, 13 C and 15 N). These NMR-active nuclei behave as tiny magnets (magnetic dipoles), capable of aligning with external magnetic fields (a process called magnetization ). The force of those tiny magnets is defined by a constant known as the magnetogyric ratio (γ), whose value depends on the isotope. 6,7

Nuclear spins of some NMR-active nuclei are able to adopt two different orientations when they align to an external magnetic field (B 0 ). One orientation corresponds to the lowest energy level of the nucleus (parallel to the external magnetic field), and the other one is associated to the highest energy level of the nucleus (antiparallel to the external magnetic field) (Figure 1, left panel). The difference between energy levels (ΔE) depends on the magnetic field and the magnetogyric ratio (Eq. 1) and affects the sensitivity of the technique (Figure 1, right panel). 6,7

Nuclear spin orientations of a sample aligned (parallel and antiparallel) with the direction of an external magnetic field B0 (left panel). Distribution of nuclear spin populations in the two possible energy levels in nuclei with I = ½ (right panel).

Magnetic resonance is achieved when nuclei are irradiated with radiofrequency. This causes transitions between energy levels, which involves changes in the orientation of nuclear spins.

When atomic nuclei are under the effect of a magnetic field, nuclear magnetic dipoles are not statically aligned with the magnetic field B 0 , but rather move like a spinning top (precession movement) around an axis parallel to the direction of the field (Figure 2, left panel). The frequency of this precession movement, called Larmor frequency (ν L ), is defined by the magnetogyric ratio and the magnetic field: 6,7

As a consequence of this precession movement , the magnetic vector (μ) associated with the nuclear magnetic dipoles possesses a component parallel to the magnetic field (μ z ) and another component perpendicular to the magnetic field (μ xy ), with this last one having a net value of zero in the absence of external perturbations. In an NMR experiment, it is not possible to measure the signal in the z direction, as the magnetic field is too intense in that direction. Therefore, it is necessary to transfer the magnetization of the z component to the xy plane. For this purpose, a magnetic pulse containing frequencies close to the Larmor frequency is applied perpendicular to B 0 to reach the resonance of nuclear spins, which generates a non-zero μ xy component. After this pulse, a relaxation process  takes place and the μ xy component gradually recovers its net value of zero (Figure 2, right panel) . As a consequence of this relaxation, energy is emitted as radiofrequency, producing a characteristic signal called free induction decay (FID) which is registered by the detector. This FID is subsequently transformed into a plot of intensities versus frequencies known as an NMR spectrum . 6,7

Nuclear spin behavior under the influence of an external magnetic field (left panel). Scheme of a basic NMR experiment in which the magnetization is transferred to the xy plane upon the application of a magnetic pulse (right panel).

NMR Spectroscopy: A Brief Guide

How does nmr work.

NMR spectrometers consist of three main components: a superconducting magnet, a probe and a complex electronic system (console) controlled by a workstation (Figure 3).

General design of an NMR spectrometer with its principal components - workstation, NMR console, magnet and probe section.

The magnet is responsible for the generation of a strong magnetic field that aligns the nuclear spins of the atoms present in the sample. Nowadays, the magnets used in NMR spectroscopy are based on superconducting materials, and thus, they require very low temperatures to work (around 4 K). For this reason, NMR spectrometers contain a cooling system composed of an inner jacket filled with liquid helium which is refrigerated by an additional jacket filled with liquid nitrogen, and many layers of thermal isolating materials (Figure 4). 6,8

The superconducting magnet surrounds a cylindrical chamber known as the “probe”, which is a crucial component of the instrument. The sample is introduced into the probe and thus placed under the influence of the magnetic field. Additionally, the probe contains a series of magnetic coils that are also located around the sample (Figure 4). These coils have multiple purposes. On one hand, they are used to irradiate the radiofrequency pulses and to detect and collect the NMR signal emitted by the sample. On the other hand, they also enable control of the magnetic field homogeneity and the application of pulse gradients that are used in some NMR experiments. 6,8

Internal components of an NMR spectrometer, including a detailed view of the probe. The location of the probe and sample can be seen in relation to the cooling outer liquid helium and nitrogen coats.

Finally, the electronic system of the spectrometer controls all the experimental conditions and enables the set up and modification of every parameter of the NMR experiment through the workstation. This system is also responsible for data acquisition and subsequent mathematical transformation into an NMR spectrum. The spectrum contains a series of peaks of different intensities as a function of a magnitude known as the chemical shift that is derived from the Larmor frequency of the different atomic nuclei present in the sample. 6,8

How to read an NMR spectrum and what it tells you

The signal detected by an NMR spectrometer (the FID) must be transformed prior to analysis. As the Larmor frequency is dependent upon the intensity of the magnetic field, it varies from instrument to instrument. For this reason, a mathematical transformation is performed to provide a relative magnitude called chemical shift (δ) (see Eq. 3). Unlike the Larmor frequency, this magnitude is independent of the magnetic field and the value can be compared across instruments. 6,7,8

Where ν L is the observed Larmor frequency of a nucleus and ν L 0 is the Larmor frequency of a reference nucleus, both in Hz. By convention, chemical shift is always expressed in parts per million (ppm). The zero value of the chemical shift scale is set using a reference compound (such as tetramethylsilane (TMS) or sodium trimethylsilylpropanesulfonate (DSS) for 1 H).

Figure 5 provides an example of a proton ( 1 H) NMR spectrum, meaning that only the protons of the molecule are detected.

1H solution NMR spectrum of acetic acid. The signals correspond to the two different 1H nuclei present in the molecule and their areas are proportional to the number of nuclei contributing to the signal.

An NMR spectrum provides a lot of information about the molecules present in the sample. First, chemical groups within a molecule can be identified from chemical shift values. In the example provided in Figure 5, acetic acid (H 3 C-COOH) has four protons so you could be forgiven for expecting to see four signals in the spectrum. However, the three protons of the methyl group (CH 3 ) are magnetically equivalent and therefore have the same chemical shift. This means that one signal comes from the CH 3 group and the other one, from the proton in the carboxylic acid group (COOH). Secondly, in 1 H-NMR spectra, signal area is proportional to the number of atomic nuclei producing that signal (this does not apply to 13 C-NMR spectra). In this example, if the areas of both signals were to be calculated, the most intense signal will be three times larger than the other. This is in accordance with the fact that one signal represents the three protons from the CH 3 group (signal at δ = 2.0 ppm) and the other one the proton from the COOH group (signal at δ = 11.5 ppm). 9,10

The spins of two nuclei that are connected through a few chemical bonds can interact, causing a phenomenon known as scalar coupling which splits the signals . Typically, this coupling is only observable when the number of chemical bonds separating two nuclei does not exceed four. The splitting of the signals follows a pattern that depends on the number of coupled nuclei and on a coupling constant (J) defined by the type of nuclei and the distance (in chemical bonds) between them. The characteristic shape of a split signal is called multiplicity and provides additional information about the molecule. This multiplicity can be calculated using the N+1 rule. This rule states that if a proton shows scalar coupling with N protons attached to contiguous carbon nuclei, its signal will split into N+1 peaks with relative intensities defined by the Pascal’s triangle (Figure 6). Peak splitting because of the scalar coupling causes a reduction of the peak intensity. Finally, the observation of signals arising from an effect called the nuclear Overhauser effect (NOE) is essential for structural determination of macromolecules, since it emerges from the interaction of nuclear spins of atoms that are spatially close, but distant in the molecular sequence. 6,7,8,9,10

Example of a scalar coupling. If there is no scalar coupling (top), NMR signals from HA and HB appear as simple peaks. However, if the two nearby protons HA and HB show scalar coupling with a constant J (bottom), the signals will split. Both protons HA and HB are coupled with one proton attached to a contiguous carbon nucleus, following the N+1 rule each proton signal will split into two signals, forming a doublet and the split distance will be equal to the coupling constant, J.

In this context, to interpret an NMR spectrum it is necessary to use all that information to assign each observed signal to the corresponding atomic nucleus of the molecule(s) in the sample. This process is called spectral assignment and it can be difficult to achieve with complex molecules. For this reason, many types of NMR experiments providing different and complementary information are used to characterize a sample. 11

Upfield vs downfield NMR

As seen in Figure 5, the same kind of nuclei can generate signals with different chemical shift values. These chemical shifts differ as the magnetic field sensed by a particular nucleus strongly depends on its local chemical environment. The circulation of electrons in the surroundings of a nucleus creates small magnetic fields that oppose the applied external field. This “shielding” effect (σ) is directly proportional to the electronic density around the nucleus. As a result, the effective magnetic field acting on the nucleus is lower and the Larmor frequency is affected (Eq. 4). When there is a high electronic density around the considered nucleus, the shielding effect is high, the Larmor frequency decreases and so does the chemical shift (it moves upfield). On the contrary, when the electronic density is low in the vicinity of the nucleus, the shielding effect is low, the Larmor frequency takes higher values and so does the chemical shift (it moves downfield). 6,7,8,12

Therefore, in NMR spectroscopy, upfield and downfield are terms that refer to the regions of lower and higher values, respectively, within the chemical shift scale (Figure 7). 6,7,8, 12

1H NMR chemical shift scale indicating the downfield (left), deshielded and upfield (right), shielded regions.

Hydrogen nuclei from methyl groups or aliphatic molecules are strongly shielded and their typical chemical shift values are located upfield. On the other hand, hydrogen nuclei attached to electronegative atoms (such as oxygen or nitrogen) or close to electronegative groups (such as carboxylic acids or aldehydes) are deshielded and show chemical values located downfield. This illustrated and discussed further in a later section, NMR charts.

Proton NMR vs carbon NMR

The principal constituent elements of organic and biological molecules are hydrogen and carbon. As described above, NMR spectroscopy can only be applied on NMR-active nuclei (that is, nuclei with I ≠ 0 ). In the case of hydrogen, the most abundant isotope is NMR-active ( 1 H, 99.98%, I = ½ ). In the case of carbon, its most abundant isotope is not NMR-active ( 12 C, 98.89%, I = 0 ). NMR spectrometers can only detect the isotope 13 C, which has an abundance of 1.11%. Moreover, the magnetogyric ratio of 13 C is also lower by a factor of four than that of 1 H (see Table 1). Both factors make 13 C-NMR significantly less sensitive than 1 H-NMR (see Table 1). This difference in sensitivity leads to longer experimental times in the case of 13 C (hours) compared to 1 H (seconds or minutes) 13,14

The chemical shift of 1 H typically occurs in the range 0 to 14 ppm, whereas, the 13 C chemical shifts occur over much a larger range, typically 10 to 220 ppm. This dependence of the chemical shift values on the type of nuclei arise from the fact that different nuclei possess different Larmor frequencies (as they depend on the magnetogyric ratio, as mentioned before). These increased shifts in 13 C-NMR results in a better resolution compared to 1 H-NMR, as the signals are normally more dispersed.

Additionally, scalar coupling between 13 C is rarely observed because, due to its low natural occurrence, two 13 C atoms are unlikely to be found close enough to establish interactions between their nuclear spins. However, the coupling of 13 C atoms with other nuclei is possible and it can further reduce the sensitivity of the technique. The reason for this is that 13 C coupling constants are large and the reduction of signal intensity upon splitting (see section 3) is more marked when coupling constants are large. For this reason, 13 C-NMR experiments are usually performed using special pulse sequences capable of removing the scalar coupling between 13 C and 1 H. 14

There are a number of ways in which the sensitivity of 13 C-NMR can be improved, these include:

  • 13 C-enrichment of the sample
  • Increasing the number of accumulated spectra, therefore reducing the signal-to-noise ratio
  • Using NMR pulses to increase the population difference between nuclear spin energy levels 14

Despite the limitations of 13 C-NMR, it offers valuable information that is not accessible using only 1 H-NMR. The identification of primary, secondary, tertiary and quaternary carbons for example. For this reason, 13 C-NMR and 1 H-NMR are often used jointly in NMR laboratories as a basic approach for molecular structure determination. 14

Table 1: Comparison of 1 H and 13 C NMR properties. 15

b Considering a magnetic field with a flux density of 14.0954 T.

By convention, the chemical shift scale in an NMR spectrum is represented from right to left. As described above, the zero value is established using a standard compound whose carbon and hydrogen atoms are strongly shielded and hence, their signals appear in the furthest upfield region (as seen in Figure 7). The assignment of the NMR spectra is usually performed with the help of NMR charts or diagrams that facilitate the identification of the NMR signals.

Hydrogens or carbons that are highly shielded, such as the ones of methyl groups, have low chemical shift values. However, hydrogens attached to very electronegative groups (e.g., carboxylic acids, ketones or aldehydes) have high chemical shift values (Figures 8 and 9).

These charts represent typical chemical shifts, but sometimes the values could be displaced to other regions of the scale. 16 For instance, in large macromolecules, a distant chemical group can be relocated due to spatial rearrangements of the tridimensional structure. This relocation could alter the chemical environment of the measured nucleus, leading to a change in its chemical shift value.

In order to facilitate NMR spectra assignment, there are public NMR libraries or databases (such as the Biological Magnetic Resonance Data Bank or the Spectral Database for Organic Compounds ) containing NMR spectra and chemical shift values for thousands of biochemical molecules and chemical compounds.

1H-NMR chart showing the typical chemical shift values for different types of hydrogen atoms.

NMR variants

Macromolecules, such as proteins, have a large number of NMR-active nuclei and, consequently, their NMR spectra are complex with many overlapping peaks. In addition, relaxation is faster in large molecules, which causes peak broadening and loss of resolution. To address these limitations, 2D NMR experiments generate spectra defined by two chemical shift axes (instead of one, as in 1D spectra), with signals that correlate pairs of different nuclei. Three examples of 2D NMR experiments are COSY, TOCSY and NOESY. 6,7,8,9  

  • COSY (COrrelated SpectroscopY) spectra display peaks that correlate pairs of nuclei that are separated by a maximum of three chemical bonds. This correlation arises from the interaction between nuclear spins through scalar coupling (Figure 10).
  • TOCSY (TOtal Correlated SpectroscopY) spectra show signals correlating pairs of nuclei that are part of the same spin system (a spin system is a set of nuclei whose spins interact with one another, that is they are coupled) (Figure 10).
  • NOESY (Nuclear Overhauser Effect SpectroscopY) experiments are very important in the structural elucidation of macromolecules because they provide information about their spatial organization. NOESY spectra contain peaks that correlate pairs of nearby nuclei (typically, separated by less than 5-6 Å). Differently to COSY, NOESY correlation arises from the nuclear Overhauser effect, in which interaction occurs when two nuclei are spatially close, no matter the number of chemical bonds separating them. 6,7,8,9  

Examples of 1H,1H COSY and 1H,1H TOCSY spectra for a molecule with the structure A-B-C-D and hydrogen atoms attached to each of them. The COSY spectrum only shows the peaks arising from the correlation between two hydrogens separated by a maximum of three chemical bonds (this is H-A-B-H; H-B-C-H; and H-C-D-H). The TOCSY spectrum shows the peaks arising from the correlation between two hydrogens belonging to the same spin system (in this case, all the possible H-H correlations).

Solid-state NMR

Solution NMR and solid-state NMR show remarkable differences, mainly due to the fact that molecules in solution are able to move freely and nuclear spin interactions are averaged. However, in solid samples there is little or no molecular motion and, as a result, nuclear spin interactions depend on the spatial direction (this is termed anisotropic interactions ). This anisotropy causes the broadening of the NMR spectral signals (Figure 11, bottom spectrum). To address this issue, scientists using solid-state NMR have developed special techniques that prevent the loss of sensitivity and resolution. 17

The most well-known solid state NMR technique is magic angle spinning (MAS). The approach used in this technique involves placing the sample inside a rotor which is spun at high speed, forming a particular angle (magic angle ≈ 54.74º) with respect to the direction of the external magnetic field. The effect of this rotation is the cancelation of all the anisotropic spin interactions (including dipolar, chemical shift anisotropy and quadrupolar interactions) (Figure 11). 17

Effect of MAS on the spectral line shape. When the solid sample is not rotating, the spectrum shows a broad signal (bottom spectrum). When the rotor is spinning, anisotropic effects are averaged and thus, resolution and sensitivity increases. When the spinning frequency is sufficiently high, all anisotropic effects are cancelled.

Strengths and weaknesses of NMR and common problems

NMR spectroscopy is a powerful technique that has many advantages over other techniques, but it has some limitations. These are summarized in Table 2: 18

Table 2: Summary of the principal strengths and weaknesses of NMR spectroscopy.

NMR impurities

One of the most common problems encountered in NMR spectroscopy, as mentioned in Table 2, is the presence of interfering substances, such as traces of impurities or solvents, that give rise to the appearance of non-desired peaks in the spectrum.

The analytes contained in NMR samples are usually obtained by a synthesis and/or purification process in which many substances are involved. For this reason, it is not infrequent to find that certain amounts of those substances remain in the final sample as impurities. Occasionally, the analyte undergoes transformations or degradation, which can lead to the appearance of undesirable chemical species in the sample. If these impurities have NMR-active nuclei, they can hamper the correct assignment of the NMR spectrum. Generally, NMR impurities are found in trace concentration and therefore they are relatively easy to identify, as their NMR peaks show very low intensities compared to those of the analyte. To make it easier to characterize impurities, laboratories often make use of tables that summarize the chemical shifts of the most common impurities. 19 , 20 , 21

NMR solvent peaks

Solvents used in NMR spectroscopy usually contain NMR-active nuclei, especially 1 H, and hence, they can potentially cause interferences in the NMR spectra. Unlike trace impurities, the solvent is present in the sample at a very high concentration and peaks arising from it are usually huge. To avoid this problem, it is very important to know exactly the chemical shift of the solvent peaks to control to what extent they can be concealing any analyte signal. On the other hand, the customary strategy to reduce the effect of hydrogen-containing solvent peaks is to use deuterated solvents (some deuterated solvent must always be present in the sample as the lock system of NMR spectrometers uses the 2 H signal to monitor the homogeneity of the magnetic field). These solvents have their 1 H nuclei substituted by 2 H, thus considerably reducing the intensity of the solvent peaks. However, in spite of the fact that the usual percentage of deuteration is close to 100%, due to the high solvent concentration, their peaks can still be too intense and hinder the correct visualization of the analyte signals. For this reason, there are some NMR pulses available that are able to reduce the solvent peak perturbations in NMR spectra, especially for aqueous samples (for instance, pulses with presaturation or gradient-suppression pulses). 22 , 23 , 24

Applications of NMR

Nmr applications in chemistry.

In chemistry, the main application of NMR spectroscopy is the identification and structural elucidation of organic, organometallic and biochemical molecules. Generally, the identification of compounds is complemented with data obtained with other techniques, such as mass spectrometry , infrared spectroscopy and elemental analysis. Moreover, the proportionality between the area of the signals and the amount of nuclei that generate it allows NMR spectroscopy to be used as a quantitative analysis tool. 15 Some examples of NMR applications in chemistry-related fields are:

  • Chemistry: structural determination of new compounds, quality control of products and purity determination 25
  • Pharmaceutics : study of structure, dynamics and molecular interactions for drug discovery, quality control and purity determination of drugs   26
  • Petrochemistry : analysis of rock materials to check the suitability of an oil reservoir to be exploited, solid state NMR composition analysis of petroleum derivatives, quality control of products 27
  • Materials: characterization of new materials by solids state NMR   28

NMR applications in the life sciences

Within the life sciences, NMR spectroscopy has been widely applied to the structural resolution of biological macromolecules, including peptides, proteins, lipids, carbohydrates and nucleic acids. These systems are highly complex, and it is therefore necessary to employ a special approach. This includes:

  • Isotopic labeling to enrich the sample in 13 C and 15 N, or even in 2 H
  • The use of special NMR pulses to reduce signal overlapping and gain resolution
  • Employing NMR experiments of high dimensionality (2D, 3D or even higher)

Once the NMR assignment is achieved, the data obtained are processed to obtain information about chemical shifts, torsion angles and distance restraints between atoms. This information is then used to calculate the molecular structure using a methodology that employs computer software developed for that purpose. The software generates molecular structures that fulfill the imposed restraints, minimizing their energy (since the lowest energy structures are the most stable and thus, the most probable). 29 , 30

Alongside the structural elucidation, NMR spectroscopy can also be used to extract information about molecular dynamics such as relaxation times, structural rigidity and chemical exchange as well as interactions between molecules (chemical shift perturbations, intermolecular magnetization transfer). 31 , 32 In this context, solid state NMR is useful to study proteins interacting with lipid structures or other biological systems that behave like a condensed phase. 33

Some examples of NMR applications in life science-related fields are:

  • Molecular biology and biophysics: study of structure, dynamics and molecular interactions of peptides, proteins, nuclei acids, carbohydrates and other biomolecules 31 , 32
  • Health sciences: analysis of biological fluids to obtain metabolic profiles related to diseases (metabolomics), use of NMR imaging techniques for medical diagnosis 34 ,35
  • Food science: NMR fingerprint analysis to check quality or authenticity of food samples 36 , wine and cannabis .

NMR abbreviations

COSY                     Correlated spectroscopy

DSS                         Sodium trimethylsilylpropanesulfonate

FID                         Free induction decay

MAS                       Magic-angle spinning

NMR                      Nuclear magnetic resonance

NOE                      Nuclear Overhauser effect

NOESY                 Nuclear Overhauser effect spectroscopy

TMS                        Tetramethylsilane

TOCSY                  Totally correlated spectroscopy

1.         Rabi II, Millman S, Kusch P, and Zacharias JR. The molecular beam resonance method for measuring nuclear magnetic moments. the magnetic moments of  3 Li 6 ,  3 Li 7  and  9 F 19 .  Phys Rev . 1939. 55(6): 526. doi:  10.1103/PhysRev.55.526 2.         Bloch F, Hansen W, and Packard M. Nuclear induction.  Phys Rev . 69: 127L. doi:  10.1103/PhysRev.70.460 3.         Purcell EM, Torrey HC, and Pound RV. Resonance absorption by nuclear magnetic moments in a solid. 1946.  Phys Rev . 69: 37L. doi:  10.1103/PhysRev.69.37 4.         Emsley JW, and Feeney J. Milestones in the first fifty years of NMR. 1995.  Progr Nucl Mag Res Sp . 28(1): 1-9. doi:  10.1016/0079-6565(95)01023-8 5.         Marion D. An introduction to biological NMR spectroscopy.  Mol Cell Proteomics . 2013. 12(11): 3006-3025. doi:  10.1074/mcp.O113.030239 6.         Friebolin H & Becconsall JK. (2005).  Basic one-and two-dimensional NMR spectroscopy   (Vol. 7) . Weinheim: Wiley-vch. 2005. 7.         Hore PJ.   Nuclear magnetic resonance . USA: Oxford University Press. 2015. 8.         Derome AE.  Modern NMR techniques for chemistry research . Elsevier. 2013. 9.         Jacobsen NE.  NMR data interpretation explained: understanding 1D and 2D NMR spectra of organic compounds and natural products . John Wiley & Sons. 2016. 10.      Bible RH.  Interpretation of NMR spectra: an empirical approach . Springer Science & Business Media. 2013. 11.      Wüthrich K. Protein structure determination in solution by NMR spectroscopy. 1990.  J Biol Chem . 1990. 265(36): 22059-22062. doi :  10.1016/S0021-9258(18)45665-7 12.      Xu S.  1D and 2D NMR experiment methods . 2011.  Available online.  Accessed on 28/09/2021. 13.      Silverstein RM.  Spectrometric identification of organic compounds.  New York: Wiley. 1991. 14.      Balci M.  Basic 1H-and 13C-NMR spectroscopy . Elsevier. 2005. 15.      Skoog DA, Holler FJ, and Crouch SR.  Principles of instrumental analysis . Cengage learning, 2017. 16.      Du Vernet R & Boekelheide V. Nuclear magnetic resonance spectroscopy. ring-current effects on carbon-13 chemical shifts.  P Natl Acad Sci USA . 1974. 71(8): 2961-2964. doi:  10.1073/pnas.71.8.2961 17.      Brown SP & Emsley L. Solid-state NMR. In: Vo-Dinh, T. (Ed.).  Handbook of spectroscopy . John Wiley & Sons. 2006. 18.      Emwas AHM. The strengths and weaknesses of NMR spectroscopy and mass spectrometry with particular focus on metabolomics research. In  Metabonomics  (pp. 161-193). Humana Press, New York, NY. 2015. doi:  10.1007/978-1-4939-2377-9_13 19.      Fulmer GR, Miller AJ, Sherden NH et al. NMR chemical shifts of trace impurities: common laboratory solvents, organics, and gases in deuterated solvents relevant to the organometallic chemist.  Organometallics . 2010. 29(9): 2176-2179. doi:  10.1021/om100106e 20.      Maggio RM, Calvo NL, Vignaduzzo SE, & Kaufman TS.  Pharmaceutical impurities and degradation products: Uses and applications of NMR techniques.  J Pharmaceut Biomed . 2014. 101: 102-122. doi:  10.1016/j.jpba.2014.04.016 21.      Babij NR, McCusker EO, Whiteker GT, et al. NMR chemical shifts of trace impurities: Industrially preferred solvents used in process and green chemistry.  Org Process Res Dev . 2016. 20(3): 661-667. doi:  10.1021/acs.oprd.5b00417 22.      Guéron M, Plateau P, & Decorps M. Solvent signal suppression in NMR.  Prog Nucl Mag Res Sp . 1991.  23(2): 135-209. doi:  10.1016/0079-6565(91)80007-O 23.      Simpson AJ, & Brown SA. Purge NMR: effective and easy solvent suppression.  J Magn Reson . 2005. 175(2): 340-346. doi:  10.1016/j.jmr.2005.05.008 24.      Adams RW, Holroyd CM, Aguilar JA, Nilsson, M, & Morris, GA. “Perfecting” WATERGATE: clean proton NMR spectra from aqueous solution.  Chem Commun . 2013. 49(4): 358-360. doi:  10.1039/C2CC37579F 25.      Maciel GE. NMR in industrial process control and quality control. In  Nuclear Magnetic Resonance in Modern Technology  (pp. 225-275). Springer, Dordrecht. 26.      Diercks T, Coles M, & Kessler H. Applications of NMR in drug discovery.  Curr Opin Chem Biol . 2001. 5(3): 285-291. doi:  10.1016/S1367-5931(00)00204-0 27.      Olaide AJ, Olugbenga E & Abimbola D. A review of the application of nuclear magnetic resonance in petroleum industry.  Int J Geosci . 2020. 11(04): 145. doi:  10.4236/ijg.2020.114009 28.      Brown SP. Applications of high-resolution 1H solid-state NMR.  Solid State Nucl Mag . 2012. 41: 1-27. doi:  10.1016/j.ssnmr.2011.11.006 29.      Fan TWM, & Lane AN. Applications of NMR spectroscopy to systems biochemistry.  Prog Nucl Mag Res Sp . 2016. 92: 18-53. doi:  10.1016/j.pnmrs.2016.01.005 30.      Wüthrich K. The way to NMR structures of proteins.  Nat Struct Biol . 2001. 8(11): 923-925. doi:  10.1038/nsb1101-923 31.      Ishima R, & Torchia DA. Protein dynamics from NMR . Nat Struct Biol . 2000. 7(9): 740-743. doi:  10.1038/78963 32.      Takeuchi K, & Wagner G. NMR studies of protein interactions.  Curr Opin Struct Biol . 2006. 16(1): 109-117. doi:  10.1016/j.sbi.2006.01.006 33.      Ladizhansky V. Applications of solid-state NMR to membrane proteins.  BBA-Proteins Proteom . 2017. 1865(11): 1577-1586. doi :  10.1016/j.bbapap.2017.07.004 34.      Markley JL, Brüschweiler R, Edison AS et al. The future of NMR-based metabolomics.  Curr Opin Biotech . 2017. 43: 34-40. doi:  10.1016/j.copbio.2016.08.001 35.      Vlaardingerbroek MT & Boer JA.  Magnetic resonance imaging: theory and practice . 2013. Springer Science & Business Media. 36.      Hatzakis E. Nuclear magnetic resonance (NMR) spectroscopy in food science: A comprehensive review.  Compr Rev Food Sci F . (2019). 18(1): 189-220. doi:  10.1111/1541-4337.12408

Héctor Zamora Carreras, PhD image

NMR spectra processing for everybody

Unrestrained access to first-class online software for NMR spectra processing It is free and you can get started right away from your browser.

Process directly online

You don't have to go through the hassle of installing any software or applications. Click here to start.

1D and 2D spectra

NMRium accepts 1D and 2D spectras. For 1D spectra it can either be a FID or a fourrier transformed spectrum. Currently, only FT 2D spectra are allowed.

Smart peak picking

NMRium includes an advanced peak picking detection for 1D and 2D spectras and is able to generate the NMR string required for publication or patent.

All the processing and assignment can be stored as a “.nmrium” file. This file contains the original data as well as all the processing that was applied on the spectrum. Assignment of the molecule are also saved in the file.

Not just signal processing

NMRium also handles chemical structures. They can either be imported from a MDL Molfile, copy pasted directly in the molecule panel or drawn.

Perfect for teaching

Try out our structure elucidation exercises or create your own exercises ! They are great for students.

Great user experience

To provide an optimal user experience, the spectra processing is efficiently performed within the web browser.

Opens multiple file formats

Just drag and drop a JCAMP-DX file, a bruker folder or a JEOL file.

Assignment of Protein NMR Spectra Using Heteronuclear NMR—A Tutorial

  • First Online: 01 January 2015

Cite this chapter

nmr spectra assignment

  • Christina Redfield 3  

Part of the book series: Biological Magnetic Resonance ((BIMR,volume 32))

2131 Accesses

3 Citations

The assignment of resonances in the complex nuclear magnetic resonance (NMR) spectrum of a protein is the first step in any NMR study of protein structure, function or dynamics. This chapter aims to provide a tutorial on protein NMR resonance assignment. Two approaches to the assignment are commonly used: the triple resonance methodology, which uses a suite of three-dimensional (3D) 13 C/ 15 N/ 1 H NMR experiments, relies on through-bond 1 J and 2 J intra- and interresidue spin–spin couplings that are observed in 13 C– 15 N double-labelled proteins; and the sequential assignment methodology, which can be applied to unlabelled or 15 N single-labelled proteins, relies on through-bond total correlation spectroscopy (TOCSY) data to identify spin systems and through-space nuclear Overhauser effect (NOE) data to establish connections between neighbouring amino acid residues. This chapter describes both the methodologies for protein NMR resonance assignment. Examples of how sequence specific resonance assignments can be obtained using a suite of 2D and 3D NMR experiments are presented and suggestions on how overlap problems can be overcome are included.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Akke M, Carr PA, Palmer AG 3rd (1994) Heteronuclear-correlation NMR spectroscopy with simultaneous isotope filtration, quadrature detection, and sensitivity enhancement using z rotations. J Magn Reson Ser B 104:298–302

Article   CAS   Google Scholar  

Arseniev AS, Wider G, Joubert FJ, Wüthrich K (1982) Assignment of the H-1 nuclear magnetic-resonance spectrum of the trypsin inhibitor-e from Dendroaspis-Polylepis-Polylepis two-dimensional nuclear magnetic-resonance at 500 MHz. J Mol Biol 159:323–351

Article   CAS   PubMed   Google Scholar  

Aue WP, Bartholdi E, Ernst RR (1976) 2-Dimensional spectroscopy—application to nuclear magnetic-resonance. J Chem Phys 64:2229–2246

Bai YW, Milne JS, Mayne L, Englander SW (1993) Primary Structure Effects on Peptide Group Hydrogen-Exchange. Proteins 17:75–86

Article   PubMed Central   CAS   PubMed   Google Scholar  

Bartels C, Guntert P, Billeter M, Wüthrich K (1997) GARANT—a general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J Comput Chem 18:139–149

Bax A, Ikura M (1991) An efficient 3D NMR technique for correlating the proton and 15N backbone amide resonances with the alpha-carbon of the preceding residue in uniformly 15N/13C enriched proteins. J Biomol NMR 1:99–104

Bax A, Clore GM, Gronenborn AM (1990) H-1-H-1 correlation via isotropic mixing of C-13 magnetization, a new 3-Dimensional approach for assigning H-1 and C-13 spectra of C-13-enriched proteins. J Magn Reson 88:425–431

CAS   Google Scholar  

Bermel W, Bruix M, Felli IC, Kumar MVV, Pierattelli R, Serrano S (2013a) Improving the chemical shift dispersion of multidimensional NMR spectra of intrinsically disordered proteins. J Biomol NMR 55:231–237

Bermel W, Felli IC, Gonnelli L, Kozminski W, Piai A, Pierattelli R, Zawadzka-Kazimierczuk A (2013b) High-dimensionality C-13 direct-detected NMR experiments for the automatic assignment of intrinsically disordered proteins. J Biomol NMR 57:353–361

Bertini I, Jimenez B, Pierattelli R, Wedd AG, Xiao Z (2008a) Protonless C-13 direct detection NMR: Characterization of the 37 kDa trimeric protein CutA1. Proteins Struct Funct Bioinform 70:1196–1205

Bertini I, Luchinat C, Parigi G, Pierattelli R (2008b) Perspectives in paramagnetic NMR of metalloproteins. Dalton Trans 29:3782–3790

Article   PubMed   Google Scholar  

Billeter M, Braun W, Wüthrich K (1982) Sequential resonance assignments in protein H-1 nuclear magnetic-resonance spectra—computation of sterically allowed proton proton distances and statistical-analysis of proton–proton distances in single-crystal protein conformations. J Mol Biol 155:321–346

Bodenhausen G, Ruben DJ (1980) Natural abundance nitrogen-15 NMR by enhanced heteronuclear spectroscopy. Chem Phys Lett 69:185–189

Buck M, Boyd J, Redfield C, Mackenzie DA, Jeenes DJ, Archer DB, Dobson CM (1995) Structural determinants of protein dynamics—analysis of N-15 NMR relaxation measurements for main-chain and side-chain nuclei of hen egg-white lysozyme. Biochemistry 34:4041–4055

Cavanagh J, Fairbrother WJ, Palmer AG, Rance M, Skelton NJ (2007) Protein NMR spectroscopy—principles and practice, 2nd edn. Elsevier Academic Press, Burlington

Google Scholar  

Cheung MS, Maguire ML, Stevens TJ, Broadhurst RW (2010) DANGLE: A Bayesian inferential method for predicting protein backbone dihedral angles and secondary structure. J Magn Reson 202:223–233

Clubb RT, Thanabal V, Wagner G (1992) A constant-time 3-Dimensional triple-resonance pulse scheme to correlate intraresidue H-1(N), N-15, and C-13(′) Chemical-Shifts in N-15-C-13-Labeled Proteins. J Magn Reson 97:213–217

Connelly GP, Bai YW, Jeng MF, Englander SW (1993) Isotope effects in peptide group hydrogen-exchange. Proteins 17:87–92

Cornilescu G, Delaglio F, Bax A (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol NMR 13:289–302

Driscoll PC, Hill HAO, Redfield C (1987) H-1-NMR sequential assignments and cation-binding studies of spinach plastocyanin. Eur J Biochem 170:279–292

Englander SW, Wand AJ (1987) Main-chain-directed strategy for the assignment of H-1-NMR spectra of proteins. Biochemistry 26:5953–5958

Fesik SW, Eaton HL, Olejniczak ET, Zuiderweg ERP, Mcintosh LP, Dahlquist FW (1990) 2D and 3D NMR-spectroscopy employing C-13-C-13 magnetization transfer by isotropic mixing—spin system-identification in large proteins. J Am Chem Soc 112:886–888

Frenkiel T, Bauer C, Carr MD, Birdsall B, Feeney J (1990) Hmqc-Noesy-Hmqc, a 3-Dimensional NMR experiment which allows detection of nuclear overhauser effects between protons with overlapping signals. J Magn Reson 90:420–425

Grzesiek S, Bax A (1992a) Improved 3D triple-resonance NMR techniques applied to a 31-Kda protein. J Magn Reson 96:432–440

Grzesiek S, Bax A (1992b) An efficient experiment for sequential backbone assignment of medium-sized isotopically enriched proteins. J Magn Reson 99:201–207

Grzesiek S, Bax A (1992c) Correlating backbone amide and side-chain resonances in larger proteins by multiple relayed triple resonance NMR. J Am Chem Soc 114:6291–6293

Grzesiek S, Bax A (1993a) Amino-acid type determination in the sequential assignment procedure of uniformly C-13/N-15-enriched proteins. J Biomol NMR 3:185–204

CAS   PubMed   Google Scholar  

Grzesiek S, Bax A (1993b) The importance of not saturating H 2 O in protein NMR—application to sensitivity enhancement and NOE measurements. J Am Chem Soc 115:12593–12594

Grzesiek S, Anglister J, Bax A (1993) Correlation of backbone amide and aliphatic side-chain resonances in C-13/N-15-enriched proteins by isotropic mixing of C-13 magnetization. J Magn Reson Ser B 101:114–119

Herrmann T, Guntert P, Wüthrich K (2002) Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J Mol Biol 319:209–227

Ikura M, Bax A, Clore GM, Gronenborn AM (1990a) Detection of nuclear overhauser effects between degenerate amide proton resonances by heteronuclear 3-Dimensional nuclear-magnetic-resonance spectroscopy. J Am Chem Soc 112:9020–9022

Ikura M, Kay LE, Bax A (1990b) A novel-approach for sequential assignment of H-1, C-13, and N-15 spectra of larger proteins—heteronuclear triple-resonance 3-Dimensional NMR-spectroscopy—application to calmodulin. Biochemistry 29:4659–4667

Jeener J, Meier BH, Bachmann P, Ernst RR (1979) Investigation of exchange processes by 2-Dimensional NMR-spectroscopy. J Chem Phys 71:4546–4553

Jung YS, Zweckstetter M (2004) Mars—robust automatic backbone assignment of proteins. J Biomol NMR 30:11–23

Kay LE (1995) Pulsed field gradient multi-dimensional NMR methods for the study of protein structure and dynamics in solution. Prog Biophys Mol Bio 63:277–299

Kay LE, Ikura M, Bax A (1990a) Proton–proton correlation via carbon carbon couplings—a 3-Dimensional NMR approach for the assignment of aliphatic resonances in proteins labeled with C-13. J Am Chem Soc 112:888–889

Kay LE, Ikura M, Tschudin R, Bax A (1990b) 3-Dimensional triple-resonance NMR-spectroscopy of isotopically enriched proteins. J Magn Reson 89:496–514

Logan TM, Olejniczak ET, Xu RX, Fesik SW (1993) A general-method for assigning NMR-spectra of denatured proteins using 3d Hc(Co)Nh-Tocsy triple resonance experiments. J Biomol NMR 3:225–231

Lyons BA, Montelione GT (1993) An hccnh triple-resonance experiment using C-13 isotropic mixing for correlating backbone amide and side-chain aliphatic resonances in isotopically enriched proteins. J Magn Reson Ser B 101:206–209

Marion D, Driscoll PC, Kay LE, Wingfield PT, Bax A, Gronenborn AM, Clore GM (1989) Overcoming the overlap problem in the assignment of H-1-NMR spectra of larger proteins by use of 3-dimensional heteronuclear H-1-N-15 hartmann-hahn multiple quantum coherence and nuclear overhauser multiple quantum coherence spectroscopy—application to interleukin-1-Beta. Biochemistry 28:6150–6156

McIntosh LP, Brun E, Kay LE (1997) Stereospecific assignments of the NH2 resonances from the primary amides of asparagine and glutamine side chains in isotopically labeled proteins. J Biomol NMR 9:306–312

Messerle BA, Wider G, Otting G, Weber C, Wüthrich K (1989) Solvent suppression using a spin lock in 2D and 3D NMR-spectroscopy with H2O solutions. J Magn Reson 85:608–613

Montelione GT, Lyons BA, Emerson SD, Tashiro M (1992) An efficient triple resonance experiment using C-13 isotropic mixing for determining sequence-specific resonance assignments of isotopically-enriched proteins. J Am Chem Soc 114:10974–10975

Moseley HNB, Monleon D, Montelione GT (2001) Automatic determination of protein backbone resonance assignments from triple resonance nuclear magnetic resonance data. Nucl Magn Reson Biol Macromol Pt B 339:91–108

Pervushin K, Riek R, Wider G, Wüthrich K (1997) Attenuated T-2 relaxation by mutual cancellation of dipole-dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. Proc Natl Acad Sci U S A 94:12366–12371

Redfield C, Dobson CM (1988) Sequential H-1-NMR assignments and secondary structure of hen egg-white lysozyme in solution. Biochemistry 27:122–136

Redfield C, Smith LJ, Boyd J, Lawrence GMP, Edwards RG, Smith RAG, Dobson CM (1991) Secondary structure and topology of human interleukin-4 in solution. Biochemistry 30:11029–11033

Salzmann M, Pervushin K, Wider G, Senn H, Wüthrich K (1998) TROSY in triple-resonance experiments: new perspectives for sequential NMR assignment of large proteins. Proc Natl Acad Sci USA 95:13585–13590

Salzmann M, Wider G, Pervushin K, Senn H, Wüthrich K (1999) TROSY-type triple-resonance experiments for sequential NMR assignments of large proteins. J Am Chem Soc 121:844–848

Sattler M, Schleucher J, Griesinger C (1999) Heteronuclear multidimensional NMR experiments for the structure determination of proteins in solution employing pulsed field gradients. Prog Nucl Mag Res Sp 34:93–158

Serrano P, Pedrini B, Mohanty B, Geralt M, Herrmann T, Wüthrich K (2012) The J-UNIO protocol for automated protein structure determination by NMR in solution. J Biomol NMR 53:341–354

Shen Y, Bax A (2013) Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J Biomol NMR 56:227–241

Spera S, Bax A (1991) Empirical correlation between protein backbone conformation and Ca and Cb 13C nuclear magnetic resonance chemical shifts. J Am Chem Soc 113:5490–5492

Vranken WF, Boucher W, Stevens TJ, Fogh RH, Pajon A, Llinas P, Ulrich EL, Markley JL, Ionides J, Laue ED (2005) The CCPN data model for NMR spectroscopy: development of a software pipeline. Proteins Struct Funct Bioinform 59:687–696

Vuister GW, Bax A (1993) Quantitative j correlation—a new approach for measuring homonuclear 3-Bond J(H(N)H(Alpha) coupling-constants in N-15-enriched proteins. J Am Chem Soc 115:7772–7777

Wang AC, Lodi PJ, Qin J, Vuister GW, Gronenborn AM, Clore GM (1994) An efficient triple-resonance experiment for proton-directed sequential backbone assignment of medium-sized proteins. J Magn Reson Ser B 105:196–198

Wider G, Lee KH, Wüthrich K (1982) Sequential resonance assignments in protein H-1 nuclear magnetic-resonance spectra—glucagon bound to perdeuterated dodecylphosphocholine micelles. J Mol Biol 155:367–388

Wishart DS, Sykes BD (1994) The C-13 chemical-shift index—a simple method for the identification of protein secondary structure using C-13 Chemical-shift data. J Biomol NMR 4:171–180

Wishart DS, Bigam CG, Holm A, Hodges RS, Sykes BD (1995) H-1, C-13 and N-15 random coil NMR chemical-shifts of the common amino-acids.1. investigations of nearest-neighbor effects (Vol 5, Pg 67, 1995). J Biomol NMR 5:332–332

Wüthrich K (1986) NMR of proteins and nucleic acids. Wiley-Interscience, New York

Wüthrich K, Wider G, Wagner G, Braun W (1982) Sequential resonance assignments as a basis for determination of spatial protein structures by high-resolution proton nuclear magnetic-resonance. J Mol Biol 155:311–319

Yamazaki T, Forman-Kay JD, Kay LE (1993) 2-dimensional NMR experiments for correlating C-13-Beta and H-1-Delta/Epsilon chemical-shifts of aromatic residues in C-13-labeled proteins via scalar couplings. J Am Chem Soc 115:11054–11055

Yamazaki T, Lee W, Arrowsmith CH, Muhandiram DR, Kay LE (1994) A suite of triple-resonance NMR experiments for the backbone assignment of N-15, C-13, H-2 labeled proteins with high-sensitivity. J Am Chem Soc 116:11655–11666

Yang DW, Kay LE (1999) TROSY triple-resonance four-dimensional NMR spectroscopy of a 46 ns tumbling protein. J Am Chem Soc 121:2571–2575

Download references

Author information

Authors and affiliations.

Department of Biochemistry, University of Oxford, OX1 3QU, Oxford, UK

Christina Redfield

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Christina Redfield .

Editor information

Editors and affiliations.

Department of Chemistry and Biochemistry, University of Denver, Denver, Colorado, USA

Lawrence Berliner

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media, LLC

About this chapter

Redfield, C. (2015). Assignment of Protein NMR Spectra Using Heteronuclear NMR—A Tutorial. In: Berliner, L. (eds) Protein NMR. Biological Magnetic Resonance, vol 32. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7621-5_1

Download citation

DOI : https://doi.org/10.1007/978-1-4899-7621-5_1

Published : 25 August 2015

Publisher Name : Springer, Boston, MA

Print ISBN : 978-1-4899-7620-8

Online ISBN : 978-1-4899-7621-5

eBook Packages : Biomedical and Life Sciences Biomedical and Life Sciences (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

About NMRtist

NMRtist is a cloud computing service for the fully automated analysis of protein NMR spectra (e.g. peak picking, chemical shift assignment, structure determination) using deep learning-based approaches. Each project created in NMRtist receives 30 GB of private storage, which can be filled by experimental data and analyzed using the available applications. You don't need to have any hardware resources or follow complex software configuration processes. NMRtist applications can be executed by just few mouse clicks in your web browser. All calculations are executed on NMRtist computational nodes, making the results available for download from NMRtist website.

ARTINA is a deep learning-based application for end-to-end protein structure determination by NMR spectroscopy. Using as input NMR spectra and the protein sequence, the method identifies automatically (strictly without any human intervention): cross-peak positions, chemical shift assignments, upper limit distance restraints, and the protein structure. ARTINA deep learning models have been trained with over 600 000 cross-peak examples from more than 1300 2D/3D/4D spectra. The method demonstrated its ability to solve structures with a median backbone RMSD of 1.44 Å to PDB reference, and identified correctly 91.36% of the chemical shift assignments. View our short video tutorial to learn how to get started with ARTINA.

New Update (27.03.2023): We've added support for additional file types: (a) manual peak lists (.list, .peaks), (b) chemical shift lists (.prot), (c) chemical shift statistics (.stats), (d) lower/upper distance restraints (.lol/.upl), (e) Talos angle restraints (.aco), and (f) protein structure (.pdb). With this latest release, all of these files can now be uploaded to the project storage and utilized as inputs for applications. The new data files enable users to perform structure-based assignment , chemical shift transfer , and to use manually refined ARTINA output files in application runs. Learn more about the supported file formats in our blog article .

You can use the NMRtist platform free-of-charge (academic users) to perform automated peak picking, shift assignment, or full structure determination. Create a free account to use all functions of the service, or start an anonymous project by pressing the button below.

Recommended articles

Getting started tutorial.

This tutorial presents the first steps in the NMRtist system. It guides through account and project creation, data upload, and submission of an exemplary structure calculation job. You can go through the tutorial with your own data, or use one of our example datasets. We highly recommend doing this tutorial before making the first application call.

Artificial Intelligence for NMR Applications (ARTINA) is a deep learning-based approach to fully automated NMR protein structure determination. The method takes as input only NMR spectra and the protein sequence, and delivers automatically: peak lists, shift assignments, distance restraints, and the structure.

Video tutorial

This video tutorial introduces beginners to the NMRtist system, guiding them through the process of submitting an automated protein structure determination job, and showcasing representative results from such a job.

Examples of automatically determined structures

nmr spectra assignment

Jan. 15, 2024, 8:14 p.m.

[Manuscript] The 100-protein NMR spectra dataset: A resource for biomolecular NMR data analysis

Open dataset containing 1329 2D-4D NMR spectra that allow the reproduction of 100 protein structures from original measurements. This dataset was originally compiled for the development of the ARTINA deep learning-based spectra analysis method (see https://nmrdb.ethz.ch and the manuscript ).

Nov. 30, 2023, 8:15 p.m.

[Manuscript] Time-optimized protein NMR assignment with an integrative deep learning approach using AlphaFold and chemical shift prediction

Our new study, recently accepted in Science Advances ( https://www.science.org/doi/full/10.1126/sciadv.adi9323 ), explores the integration of in-silico predictions like AlphaFold with ARTINA, enhancing the efficiency and accuracy of NMR data analysis. This research represents a significant leap towards data-efficient use of our system for protein studies.

Feb. 2, 2023, 8:39 p.m.

[Manuscript] NMRtist: an online platform for automated biomolecular NMR spectra analysis

Our manuscript (application note), presenting the NMRtist platform, has been accepted for publication in Bioinformatics ( https://doi.org/10.1093/bioinformatics/btad066 ).

Dec. 21, 2022, midnight

NMRtist usage

Since the release of the platform in February 2022, NMRtist analysed 4 368 2D/3D/4D NMR spectra, completed 1 100 automated chemical shift assignment and 444 automated structure determination jobs.

Dec. 20, 2022, midnight

ARTINA and NMRtist presented to the broader audience

Between 06.2022 and 01.2023, we presented ARTINA and NMRtist at several NMR events, including: Chianti Workshop (Principina Terra, Italy), EUROMAR (Utrecht, The Netherlands), EMBO Practical Course (Basel, Switzerland), EMBO Lecture Course (Berhampur, India), Biomolecular NMR: Advanced Tools, Machine Learning (Gothenburg, Sweden), and ICMRBS (Boston, USA).

Oct. 19, 2022, midnight

[Manuscript] Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA

Our manuscript, presenting the ARTINA workflow for rapid assignment and structure determination, has been published in Nature Communications ( https://doi.org/10.1038/s41467-022-33879-5 ).

Oct. 2, 2021, midnight

Biomolecular NMR: Advanced Tools workshop

NMRtist was presented at the Biomolecular NMR: Advanced Tools workshop (29.09-01.10 2021). All participants of the training, supervised by Prof. Peter Güntert and Dr. Piotr Klukowski, submitted datasets to the platform, obtaining automatically determined structures and/or assignments.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of springeropen

Assigning NMR spectra of RNA, peptides and small organic molecules using molecular network visualization software

Jan marchant.

1 Department of Chemistry and Biochemistry, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250 USA

Michael F. Summers

2 Howard Hughes Medical Institute, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250 USA

Bruce A. Johnson

3 Structural Biology Initiative, CUNY Advanced Science Research Center, 85 St. Nicholas Terrace, New York, NY 10031 USA

Associated Data

NMR assignment typically involves analysis of peaks across multiple NMR spectra. Chemical shifts of peaks are measured before being assigned to atoms using a variety of methods. These approaches quickly become complicated by overlap, ambiguity, and the complexity of correlating assignments among multiple spectra. Here we propose an alternative approach in which a network of linked peak-boxes is generated at the predicted positions of peaks across all spectra. These peak-boxes correlate known relationships and can be matched to the observed spectra. The method is illustrated with RNA, but a variety of molecular types should be readily tractable with this approach.

Electronic supplementary material

The online version of this article (10.1007/s10858-019-00271-3) contains supplementary material, which is available to authorized users.

The power of NMR spectroscopy relative to other molecular spectroscopies lies in the ability to detect spectral signals and interactions associated with specific atoms. Requisite assignment of NMR signals typically follows a paradigm of measuring the chemical shifts of local maxima (peaks) within each spectrum followed by correlating signals within and among different types of NMR spectra and associating those peak positions with specific atoms, either by automated methods or interactive analysis. Although automated assignment methods are desirable and work toward this goal is ongoing, interactive methods remain the standard for much NMR assignment. Although interactive analysis is aided by the display of peak-boxes associated with each measured peak, which can display known assignments or other annotations, the tracking of thousands of scalar- and dipolar-coupled peaks in multiple datasets can be challenging. We describe here an inverted approach that focuses on networks of coupled peaks that are predicted from the molecular structure and type of NMR experiment. Instead of picking peak positions and then attempting to assign them, we generate a linked network of assigned peak-boxes at these predicted positions that can then be interactively aligned with the observed spectra. This approach allows the spectroscopist to make simultaneous use of multiple spectral features that can minimize ambiguity in the assignments compared to the process of assigning individual peaks.

Our technique relies on a priori knowledge of the molecular topology and the ability to predict chemical shifts and coupling patterns. Both types of information are available for a range of molecule types (Steinbeck et al. 2003 ; Ulrich et al. 2008 ; Barton et al. 2013 ; Brown et al. 2015 ). We have used the technique to assign RNAs as large as 68 nucleotides (including fragments of much larger RNA projects) (Keane et al. 2015 ; Marchant et al. 2018 ; Zhang et al. 2018 ) and small cyclic peptides, but the principles apply to many molecular types including DNA, modest sized peptides, and arbitrary small organic molecules. The higher the quality of chemical shift predictions and predicted NOE peaks the better the starting point, but the approach allows bootstrapping from better portions of the starting set to regions of lower quality.

The approach described here could be implemented with a variety of tools for chemical shift prediction, peak network generation and interactive assignment. Here we describe using the protocol with NMRFx Analyst, a software tool that is freely available and open sourced and extends the existing NMRFx Processor (Norris et al. 2016 ). NMRFx Analyst integrates NMR processing, chemical shift prediction and peak picking and assignment tools useful for this approach. An earlier implementation of the approach is also available in NMRViewJ (Johnson and Blevins 1994 ). There are several requirements to implement the approach in other tools. The key requirements are a source of predicted shifts and the ability to interactively move multiple peaks in response to the movement of a single peak. The shift prediction is done once at the start of a project, and so can be done with an external tool. Generation of peaks based on the NMR shift prediction can also be done external to the software. So any NMR analysis tool (such as CCPN Analyst (Skinner et al. 2016 ), Sparky (Lee et al. 2015 ) or CARA(Keller)) that can read external peak files has the core technology to get started without any modifications. The interactive adjustment of peak positions in response to moving a single peak would likely require code modifications or a plugin module, but this should be relatively straightforward to implement.

We describe here the approach for a 50-nucleotide RNA hairpin. A 3 min video illustrating the major steps on a 22-nucleotide RNA hairpin is available as supplementary material. The molecular topology for the RNA is readily available from the primary sequence coupled with NMRFx Analyst’s built-in library of nucleotides. A secondary structure (or if available, the tertiary structure) is additionally necessary for chemical shift prediction and NOE cross-peak prediction. Predicting chemical shifts of the target molecule is an essential component of the protocol. For RNA molecules we use our previously described attribute-based shift prediction technique (Barton et al. 2013 ; Brown et al. 2015 ), but with 3D coordinates a structure based method could be used (Frank et al. 2013 , 2014 ; Brown et al. 2015 ). The attribute technique predicts hydrogen, carbon, and nitrogen chemical shifts based on a set of attributes describing the central nucleotide in a five-nucleotide window. The only input necessary is the primary sequence and a dot-bracket style representation of the secondary structure (Lorenz et al. 2011 ).

For RNA assignments we have used a set of three different experiments. These are homonuclear 2D TOCSY, 2D 1 H- 13 C HMQC and 2D NOESY. The technique is not dependent on having the TOCSY and HMQC experiments, but a greater number of complementary experiments will reduce ambiguities in the assignment process. Each experiment type necessitates a different protocol for peak-box generation. The TOCSY protocol simply generates peak-boxes for protons that have less than a specified number of homonuclear J-coupling steps. In particular, the H5–H6 coupling of uracil and cytosine and couplings between ribose protons are generated. The HMQC involves all carbons with directly bonded protons. While the expected peaks and correlations for the HMQC and TOCSY are relatively insensitive to tertiary structure, peak-box generation for the NOESY involves various assumptions.

For an RNA (or other molecule) where the 3D structure is known peak-boxes are generated for all hydrogen pairs whose distance is less than a specified limit (often 5 or 6 Å). Where the 3D structure is not available, NMRFx Analyst uses the secondary structure specified with dot-bracket notation and a built-in set of rules to generate peak-boxes for helical and tetraloop regions. Intra-residue peak-boxes are also generated and are less dependent on the structural information. While this NOESY protocol is unable to generate inter-residue peak-boxes in larger loops, the combination of peak-boxes in helical and tetraloop regions and intra-residue peak-boxes in all regions gives a substantial number of predicted peaks that can be used as a basis for a search to other regions. The intra-residue assignments can be used to get the correct shift assignments which are then used to assign peaks that haven’t been predicted ( vide infra ).

Overlapping peaks are a serious impediment to the assignment of larger RNAs, but this can be alleviated by the use of isotopically labeled RNA molecules to minimize the number of spectral peaks (Lu et al. 2010 ; Longhini et al. 2016 ). Nucleotide and atom-specific 2 H labeling, or 13 C labeling combined with pulse sequences that filter and edit the spectra based on the presence of 13 C labelled nuclei can be used to generate a complementary set of experiments in which the number of peaks in each individual experiment is reduced, but all expected peaks can be observed in the complete set of experiments (LeBlanc et al. 2017 ). NMRFx Analyst allows specifying the labeling pattern by both nucleotide type and specific residues. The peak-box generator uses this in combination with each experiment’s edit-filter scheme to generate the expected cross-peaks for the labelled RNA.

Once the set of peak-boxes is generated for each experiment the user can begin to interactively assign the spectra. Each available spectrum is displayed with its corresponding peak-boxes superimposed. Any given spectrum might be displayed in multiple windows so that expansions of relevant portions of the spectra can be displayed. The user can then interactively drag, with motions of mouse or track pad, a peak-box from its predicted position to alignment with an observed spectral peak (Fig.  1 ).

An external file that holds a picture, illustration, etc.
Object name is 10858_2019_271_Fig1_HTML.jpg

Screenshot of the NMRFx Analyst GUI with a network assignment procedure in progress. The rectangular peak-boxes illustrate predicted peaks, label numbers indicate the residues involved, and arrows are used to show whether peak-boxes can be moved in each dimension (no X) or are frozen in that dimension (with X). Peak-boxes in black (with residue numbers 6–46, and 6–7) are initially in the predicted positions and can be freely adjusted, as shown by black arrows for peak-box 6–46. Peak-box 7–6 (red) has been selected (yellow background) and then frozen and can no longer be adjusted in either dimension. As a consequence of freezing this peak-box, peak-box 7 (orange) is now frozen in the horizontal position yet adjustable in the vertical so it could be slid down to align with the peak below. The opposite is true for peak-box 6 (magenta) which could be slid left to align with a peak. Other red peak-boxes have already been positioned and frozen. Controls at bottom allow for freezing and thawing peaks. The Tweak + Freeze button will automatically center a peak-box on an overlapped peak before freezing

In the traditional approach, peak-boxes are initially not assigned so there is no unambiguous relationship between different peak-boxes within the spectrum or between spectra. In this new approach, while peak-boxes are not necessarily correctly positioned, they each have an assigned atom for each dimension. The assignment means that sets of peaks will share atoms on one or both dimensions. This is illustrated visually when one selects a peak as shown in Fig.  1 . Connecting lines are drawn between peak-boxes with common atom assignments. As a user drags a peak-box, the entire set of peak-boxes that share an atom with the moved peak will move synchronously with the directly shifted peak. The essence of the method is that whereas observing an individual peak in relation to a spectral signal might be ambiguous, a whole set of coupled peaks is not.

Individual peak-boxes may initially be predicted to be close to multiple spectral signals, precluding unambiguous placement in isolation. In this new approach, however, the entire set of linked peak-boxes across multiple experiments inform the user’s decision. An example of this is shown in Fig.  2 , step 3, where two possible alignments of a group of peak-boxes within the NOESY spectrum are possible, but can be resolved with analysis of the HMQC spectrum. Positioning peak-boxes in crowded regions is still difficult, but is often unnecessary due to the presence of linked peak-boxes that are in uncrowded regions. An additional practical advantage of the approach is that typographical errors are minimized. Rather than the user typing in, with possible errors, an atomic assignment to a peak-box label field in the GUI, all peaks start with a computer generated assignment.

An external file that holds a picture, illustration, etc.
Object name is 10858_2019_271_Fig2_HTML.jpg

Demonstration of the assignment procedure for a portion of a 50 nt RNA. In each panel the upper spectrum is a 1 H- 1 H NOESY and the lower a 1 H- 13 C HMQC. 1 Peak-boxes are initially positioned according to predicted chemical shifts. Upon selecting a peak-box for positioning, the linked peak-boxes are indicated by connecting lines. Visual inspection identifies a candidate peak to which the peak-box labeled 4–5 is manually repositioned, as indicated by the solid arrow. Linked peaks are repositioned automatically, as indicated by the dashed arrows. 2 The peak-box position is frozen, indicated in red. The remaining three peak-boxes in the spin system are automatically frozen, and prevented from moving in their shared dimension, indicated in orange for the x -axis. Their associated peaks are readily identified due to this restriction. 3 Examination of the NOESY spectrum reveals two well-matched possibilities for assignment of the peak-box labeled 7. The correct assignment is found by reference to the HMQC spectrum, in which there is only one reasonable candidate. 4 Repositioning the remaining peak-boxes for the spin-system associated with this atom automatically repositions associated peak-boxes from the remaining spin-system under consideration. 5 The remaining spin-system contains peak-boxes restricted from moving along the y -axis due to previously frozen peaks, indicated in magenta, such that their associated peaks are readily identified. 6 Final positions of the peak-boxes under consideration

The protocol is greatly facilitated by a means to specify whether any given peak-box has been positioned into a final location. In NMRFx Analyst, this is done by clicking a “Freeze” button or using a corresponding keyboard shortcut. Once frozen, a peak-box will be displayed with a different color so that the user has a visual indication of which peak-boxes have been confidently placed (Fig.  1 ). Freezing an individual peak-box will lock both of its dimensions to their current position so that it can’t subsequently be moved. The linked (sharing the same atom) dimensions of other peak-boxes, in the same and different experiments, will also be frozen. Thus, linked peak-boxes might only be frozen in a single dimension. Such peak-boxes may only be slid along the free dimension which facilitates their assignment by minimizing the choice of locations to a single dimension. A color scheme is used to indicate whether a peak is frozen on the x-axis, y-axis or both axes. Peak-boxes can also be unlocked via a “Thaw” button. Freezing peak-boxes also updates the atom assignment table with the chemical shift of the peak-box dimensions. Thus the final assignment list is generated from only peak-boxes that have been frozen.

As described above, the set of peak-boxes generated for NOESY spectra requires assumptions about the molecular structure and it is unlikely that they will perfectly match the spectra. Extraneous peak-boxes are easily deleted. Where peaks cannot be associated with a generated peak-box, the user can manually add a peak-box at the peak’s location. The software still provides significant value in this process as the observed signal might align with peak-boxes that have already been frozen. In this case assignment possibilities for the manually added peak-box are displayed and a link can be made to the already frozen peak-boxes.

The above description has focused on applications to RNA. The approach, however, was initially developed as a means to assign cyclic peptides. The basic protocol for peptides is essentially the same as described above. The differences involve methods for chemical shift prediction and rules for peak-box generation. Predicted chemical shifts can be obtained simply from average chemical shifts for standard amino-acids available from the BMRB (Ulrich et al. 2008 ). Alternatively, NMRFx Analyst includes a built-in (as yet, unpublished) tool for generating predictions based on sequence and dihedral angles, and optionally ring-current shifts. Projects involving cyclic peptides often include non-canonical amino-acids (Hosseinzadeh et al. 2017 ). Shift prediction for non-canonical amino-acids is supported using a built-in predictor based on HOSE codes that can form predictions for any arbitrary organic molecule. Peptides, and all other molecules supported, can also use predictions generated in 3rd party software and imported from a text file. As for RNA, 2D TOCSY, 1 H- 13 C HMQC and 2D NOESY experiments have been implemented, but various experiment combinations are possible. COSY experiments can be included, for example, by using the TOCSY peak-box generation protocol but limiting the number of transfer steps in the peak generator to one. The TOCSY and HMQC experiments are particularly robust because they don’t depend on having 3D structural information, though constraints involved in cyclizing the peptide can be used to generate a reasonable family of structures for NOESY predictions.

The described protocol is also completely applicable to arbitrary small organic molecules and provides a means to rapidly assign, without typographical errors, these molecules using one or more 2D spectra. Predictions can be made using the internal HOSE code based predictor or external tools (Schütz et al. 1997 ; Smurnyy et al. 2008 ). Prediction of NOESY peaks to complement those from scalar-coupled experiments can be made with an approximate 3D structure. Missing and additional peaks can be dealt with as described above.

While the chemical shift predictions that are used always have some level of error, a key benefit of this approach is that individual errors of large magnitude are easily identified and tolerated due to redundancy in the network of moving peaks. More widespread errors in the predicted chemical shifts, particularly if accompanied by errors in the predicted network of NOEs, would potentially prove more challenging, however in our experience of close to 100 distinct RNA molecules this problem has not arisen. This tolerance to error should also allow the method to be used in situations such as RNA–protein complexes where the RNA chemical shifts near the interface are perturbed from their expected values.

The above protocol, as implemented in NMRFx Analyst, provides a rapid way to facilitate the assignment of a variety of RNA, DNA, peptides and small molecules. It has been used for the assignment of a variety of published RNA projects (Keane et al. 2015 ; Marchant et al. 2018 ; Zhang et al. 2018 ) and for rapid assignment of a variety of cyclic peptides (unpublished studies). Its use requires access to chemical shift predictions which are available within NMRFx Analyst or through a wide variety of external software packages. Prediction of peaks expected in scalar-coupled experiments (e.g. TOCSY, COSY, and HMQC) require only an understanding of the covalent structure of the molecule and prediction of a significant number of NOESY peaks can be made with reasonable assumptions about structure. In particular, intra-residue peaks can be predicted and used to aid in assigning inter-residue peaks. The protocol fits between the traditional manual assignment methods that rely on assigning picked peaks and fully automated methods. We anticipate that it will form a basis for adding more automated capabilities in the future. For example, one can already drag a peak near to a signal and have it automatically positioned to the close peak. By basing the automated capabilities on this visual tool, the user will be able to observe the results of the automation and manually intervene. As chemical shift and structural prediction methods are developed across all molecule types, we expect the approaches for chemical shift assignment illustrated here to be adopted into widespread use.

Below is the link to the electronic supplementary material.

Acknowledgements

This work was supported in part by grants from the National Institute of General Medical Sciences of the National Institutes of Health (U54 GM 103297 to BAJ and JM, R01 GM 123012 to BAJ, and GM 42561 to MFS). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Barton S, Heng X, Johnson BA, Summers MF. Database proton NMR chemical shifts for RNA signal assignment and validation. J Biomol NMR. 2013; 55 :33–46. doi: 10.1007/s10858-012-9683-9. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brown JD, Summers MF, Johnson BA. Prediction of hydrogen and carbon chemical shifts from RNA using database mining and support vector regression. J Biomol NMR. 2015; 63 :39–52. doi: 10.1007/s10858-015-9961-4. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Frank AT, Bae SH, Stelzer AC. Prediction of RNA 1H and 13C chemical shifts: a structure based approach. J Phys Chem B. 2013; 117 :13497–13506. doi: 10.1021/jp407254m. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Frank AT, Law SM, Brooks CL. A simple and fast approach for predicting 1H and 13C chemical shifts: toward chemical shift-guided simulations of RNA. J Phys Chem. 2014; 118 (42):12168–12175. doi: 10.1021/jp508342x. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hosseinzadeh P, Bhardwaj G, Mulligan VK, et al. Comprehensive computational design of ordered peptide macrocycles. Science. 2017; 358 :1461–1466. doi: 10.1126/science.aap7577. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Johnson BA, Blevins RA. NMRView: a computer program for the visualization and analysis of NMR data. J Biomol NMR. 1994; 4 :603–614. doi: 10.1007/BF00404272. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Keane SC, Heng X, Lu K, et al. RNA structure. Structure of the HIV-1 RNA packaging signal. Science. 2015; 348 :917–921. doi: 10.1126/science.aaa9266. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Keller R, CARA: http://cara.nmr.ch
  • LeBlanc RM, Longhini AP, Le Grice SFJ, et al. Combining asymmetric 13C-labeling and isotopic filter/edit NOESY: a novel strategy for rapid and logical RNA resonance assignment. Nucleic Acids Res. 2017; 45 :e146. doi: 10.1093/nar/gkx591. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lee W, Tonelli M, Markley JL. NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinformatics. 2015; 31 :1325–1327. doi: 10.1093/bioinformatics/btu830. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Longhini AP, LeBlanc RM, Becette O, et al. Chemo-enzymatic synthesis of site-specific isotopically labeled nucleotides for use in NMR resonance assignment, dynamics and structural characterizations. Nucleic Acids Res. 2016; 44 :e52. doi: 10.1093/nar/gkv1333. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lorenz R, Bernhart SH, Zu Siederdissen CH, et al. ViennaRNA package 2.0. Algorithms Mol Biol. 2011; 6 (1):26. doi: 10.1186/1748-7188-6-26. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lu K, Miyazaki Y, Summers MF. Isotope labeling strategies for NMR studies of RNA. J Biomol NMR. 2010; 46 :113–125. doi: 10.1007/s10858-009-9375-2. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marchant J, Bax A, Summers MF. Accurate measurement of residual dipolar couplings in large RNAs by variable flip angle NMR. J Am Chem Soc. 2018; 140 :6978–6983. doi: 10.1021/jacs.8b03298. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Norris M, Fetler B, Marchant J, Johnson BA. NMRFx Processor: a cross-platform NMR data processing program. J Biomol NMR. 2016; 65 :205–216. doi: 10.1007/s10858-016-0049-6. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schütz V, Purtuc V, Felsinger S, Robien W. CSEARCH-STEREO: a new generation of NMR database systems allowing three-dimensional spectrum prediction. Fresenius J Anal Chem. 1997; 359 :33–41. doi: 10.1007/s002160050531. [ CrossRef ] [ Google Scholar ]
  • Skinner SP, Fogh RH, Boucher W, et al. CcpNmr AnalysisAssign: a flexible platform for integrated NMR analysis. J Biomol NMR. 2016; 66 :111–124. doi: 10.1007/s10858-016-0060-y. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smurnyy YD, Blinov KA, Churanova TS, et al. Toward more reliable 13C and 1H chemical shift prediction: a systematic comparison of neural-network and least-squares regression based approaches. J Chem Inf Model. 2008; 48 :128–134. doi: 10.1021/ci700256n. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Steinbeck C, Krause S, Kuhn S. NMRShiftDB-constructing a free chemical information system with open-source components. J Chem Inf Comput Sci. 2003; 43 :1733–1739. doi: 10.1021/ci0341363. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ulrich EL, Akutsu H, Doreleijers JF, et al. BioMagResBank. Nucleic Acids Res. 2008; 36 :D402–D408. doi: 10.1093/nar/gkm957. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhang K, Keane SC, Su Z, et al. Structure of the 30 kDa HIV-1 RNA dimerization signal by a hybrid Cryo-EM, NMR, and molecular dynamics approach. Structure. 2018; 26 :490–498. doi: 10.1016/j.str.2018.01.001. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Computational Protocol for the Spectral Assignment of NMR Resonances in Covalent Organic Frameworks

Affiliations.

  • 1 Center for Molecular Modeling, Ghent University, Technologiepark 46, 9052 Ghent, Belgium.
  • 2 NMRCoRe, NMR/X-Ray Platform for Convergence Research, Celestijnenlaan 200F, Box 2461, B-3001 Leuven, Belgium.
  • 3 Department of Chemistry, Ghent University, Krijgslaan 281 (S3), 9000 Ghent, Belgium.
  • PMID: 38650071
  • DOI: 10.1021/acs.jctc.3c01414

Solid-state nuclear magnetic resonance spectroscopy is routinely used in the field of covalent organic frameworks to elucidate or confirm the structure of the synthesized samples and to understand dynamic phenomena. Typically this involves the interpretation and simulation of the spectra through the assumption of symmetry elements of the building units, hinging on the correct assignment of each line shape. To avoid misinterpretation resulting from library-based assignment without a theoretical basis incorporating the impact of the framework, this work proposes a first-principles computational protocol for the assignment of experimental spectra, which exploits the symmetry of the underlying building blocks for computational feasibility. In this way, this protocol accommodates the validation of previous experimental assignments and can serve to complement new NMR measurements.

nmr spectra assignment

Automatic Assignment of 1H NMR Spectra

nmr spectra assignment

Introduction

The assignment of 1H NMR spectra of small molecules is an everyday task within organic chemistry, which is usually tackled in a manual way. The chemist typically identifies the most relevant regions (aka multiplets) in the spectrum and assigns them to atoms in the putative molecular structure. A partial assignment is usually attempted and the process generally lacks rigor. This process is generally considered repetitious, time-consuming, very tedious and error-prone.

CSI

As a response to this necessity, we have developed an expert system for the automatic assignment of 1 H NMR spectra of small molecules. This uses the principles of fuzzy logic and probabilistic methods to first classify all the resonances (peaks) in the spectrum and then proceeds to enumerate the most likely assignments of experimental multiplets to a presumed molecular formula, and finally applies a score to them. It uses as inputs the experimental spectrum (or possibly various kinds of spectra spectra), the suggested molecular structure, and the predicted NMR parameters (shifts and coupling constants) and, as output, it generates the most likely assignment.

How does Auto Assignments work?

The Auto Assignment Algorithm combines several software techniques we had developed in recent years as tools for expert tasks such as automatic detection and characterization of spectral peaks, automatic solvent detection, and automatic structure verification (for which the auto-assignment feature is, in its own term, a building block).

Real-life spectra always contain a number of artifacts such as noise, baseline distortions, relaxation and radiation-damping induced distortions of peak intensities, lineshape distortions due to magnetic field inhomogeneity, lineshape distortions due to unresolved weak long-range couplings, second-order interactions, peaks crowding causing peaks and multiplets to overlap, etc.

For these reasons it is impossible to construct any NMR-data evaluation wizard, like the automatic assignment module, without an extensive usage of statistical methods, allowing for a degree of logical “fuzziness”. In our case this is done by applying at every step, to the full depth of the algorithm, a proprietary scoring system approach. A description of such scoring system is beyond the scope of this document, but it will be covered in a future article.

The Auto Assignment algorithm consists of the following constituent blocks (See Fig. 1)

Auto Assignments Schema

Fig. 1: Basic flowchart diagram of the new 1H-NMR Automatic Assignments algorithm. See the text for a description of its constituent blocks

(1) Basic processing.

An NMR-FID is loaded, apodized, transformed, phased and baseline corrected, typically in a transparent, fully unattended way (The process, however, can be customized by the user). In addition, a presumed correct molecular structure is loaded, using any of the popular formula-encoding formats (mol, ChemDraw files, etc)

The resulting frequency domain 1H spectrum is automatically deconvolved using the sophisticated Global Spectrum Deconvolution algorithm in order to generate a reliable list of peaks and their parameters (position, height, width, kurtosis, area, etc), even in situations characterized by a strong peaks overlap (Fig.2).

nmr spectra assignment

Fig.2 Example of information about the spectral peaks extracted by GSD in the presence of a strong overlap with large, broad water signal.

(3) AutoClassify.

Using another sophisticated fuzzy-logic algorithm, each peak in the GSD list is classified according to whether it belongs to the compound, solvent, an impurity, an artifact, a 13C satellite, etc (Fig.3). The algorithm even attempts to pinpoint possible labile peaks.

Auto Classify

Fig. 3 Illustration of the AutoClassiffy algorithm. Peaks are color coded according to their their type

An important part of this process is also the recognition of multiplets due to J-couplings and a detailed characterization of their many properties (this results in a multiplets list). Inter-multiplet coupling patterns are also detected and stored as another tool for the subsequent auto-assignment step.

(4) NMR Prediction.

NMR spectral parameters (chemical shifts and scalar coupling constants) of the suggested molecule are predicted using three complementary approaches: 3D conformer, substituent chemical shift and a HOSE code database which are then combined using the NMRPredict Best Algorithm which is seamlessly integrated within Mnova software. Users can also add their own assignments to the HOSE code database to further refine the accuracy of the predictions.

(5) AutoAssign.

The final step of the algorithm consists in combining all the information collected to this point. Basically, the wizard tries to find the best possible match between the experimental multiplets and the predicted multiplets, subject also to constraints dictated by NMR know-how. Mathematically, the number of possible assignments is staggering, but we apply a prior enumeration filter passing only a limited number (about 100) of the most likely ones. In this way it becomes feasible to score each assignment against all available information and select the best one.

A more in-depth description of the inner details of the auto assignment algorithm will be presented elsewhere. However, we hope that the present description provides a sufficiently clear picture of its underlying concepts and its most important features.

A simple example

In Figure below, we show the result of applying the new assignment algorithm to the spectrum of L-Proline. The result has been obtained in a fully automatic operational mode; simply drag and drop the molecule and the spectrum and run the command, that is it. Several points are worth nothing:

Auto Assignment

Fig. 4.: Result of running AutoAssignment with L-Proline showing how GSD helps to resolved overlapped issues as well as to classify peaks according to their type. Blue peaks correspond to the detected compound resonances, whereas red lines are signals identified as solvent (DMSO and water).

  • Solvent peaks (DMSO and water) have been automatically detected and displayed in the spectrum as red peaks superimposed on top of the experimental peaks.
  • Compounds peaks are displayed as blue curves. Notice that, thanks to the power of GSD, it is possible to quantify the peaks corresponding to H-5’’ despite the significant overlap with the large solvent (water) peak. Traditional peak picking routines would fail in cases like this.

Of course, even though this example illustrates some interesting challenges that were successfully overcome, like the ability to yield accurate multiplet integrals even in those cases where extra signals (e.g. solvent) overlap with the peaks or multiplet of interest, by no means this is a system that will yield all the assignments with a 100% success ratio. Certainly, there always will be cases of partial misassignments. In order to assess the number of assignments that a user would typically have to manually amend, we have conducted a test using fully assigned in-house 1H-NMR library consisting of 39 molecules with a total of 355 proton assignments. This test showed that 295 assignments were correctly identified whilst 60 were wrong, corresponding to a 80% success ratio. Very often, the errors were due to two assignments that have to be swapped, a feature that can be done in the software with just two mouse clicks.

In this document we have presented the basic concepts around the new automatic assignment module included in Mnova NMR 8.0. It has been intended mostly as a tool for those organic chemists that have to face routinely to the tedious task of assigning their 1H spectra. The results that we have obtained are, in our opinion, very promising and we believe that it should already constitute a real time saver.

In addition to the automatic facilities provided, the software includes a number of graphical features that facilitates enormously the manual correction of any potential errors made.

In this work, only 1D 1H NMR spectra were used but the system is already armed to accept HSQC spectra. The results obtained with a combined 1H & HSQC approach will be covered in a separate publication.

[1] Gronwald, W.; Kalbitzer, H.R.  Automated structure determination of proteins by NMR spectroscopy.  Prog. Nucl. Magn. Reson. Spectrosc. 2004 , 44 , 33-96.  DOI:10.1016/j.pnmrs.2003.12.002 [2] Griffiths, L.; Beeley, H. H.; Horton, R.  Towards the automatic analysis of NMR spectra: Part 7. Assignment of 1H by employing both 1H and 1H/13C correlation spectra.  Magn. Reson. Chem. 2008 , 46 , 818-82.  DOI: 10.1002/mrc.2257 [3] Cobas, c.; Sykora, S.The Bumpy Road towards Automatic Global Spectral Deconvolution (GSD), 50th ENC Conference , Asilomar, CA (USA), March 29-April 4, 2009   DOI: 10.3247/SL3Nmr09.003 [4] Kurtosis is one of the pure shape functions that we use to describe a multiplet mathematically [5] Cobas, C.: Seoane, F.; Domínguez, S.; Sykora, S. A new approach to improving automated analysis of proton NMR spectra through Global Spectral Deconvolution (GSD). Spectroscopy Europe, 2010 , 23  vol 23 (1), 2010 [ Online ]

About Author

nmr spectra assignment

Co-founder and President of Mestrelab Research S.L. Read his profile here .

Related Posts

Appnote - purification

Scoring of LC separation procedures for ezetimibe and its degradants using Mgears Chrom Best Method

MestreCast Ep.2 Automation Stories

MestreCast Episode 2 – Automation Stories

IVAN NMR webinar

IVAN NMR Webinar- Recorded Session

Comments are closed.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Chemistry LibreTexts

6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

  • Last updated
  • Save as PDF
  • Page ID 398288

  • Serge L. Smirnov and James McCarty
  • Western Washington University

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

In the previous Chapter we described 2D NMR spectroscopy, which offers significantly greater spectral resolution than basic 1D spectra. In this Chapter we will show how the well-resolved 2D 15 N-HSQC resonances can be assigned to specific residues and chemical groups within protein samples. As an example, we will consider a couple of complementary types of 3D NMR data: HNCACB and CBCA(CO)NH and their joint application for making heteronuclear NMR resonance assignment in proteins. Such an assignment opens a number of ways to probe structure and function (e.g. ligand binding) for the target protein samples.

Learning Objectives

  • Grasp why the resonance assignment of 2D 15 N-HSQC can be beneficial : the case of ligand (drug) binding by a protein (therapeutic target)
  • Familiarize with 3D heteronuclear through-bond (J-coupling) NMR : introduction and case of HNCACB and CBCA(CO)NH pair of 3D experiments
  • Follow an example of assignment of heteronuclear NMR resonances ( 1 H N , 15 N H , 13 Cα, 13 Cβ) from a combination of 2D 15 N-HSQC and 3D HNCACB/CBCA(CO)NH

15 N-HSQC as an assay for probing protein – ligand interactions: the need for the NMR resonance assignment

During the process of rational drug design, it is often necessary to characterize the interactions between the therapeutic target (protein) and candidate drug (ligand) beyond determination of the binding affinity ( K d ). Heteronuclear solution NMR experiments 15 N-HSQC can provide significant insight for such interactions. Let’s recall that most of the signals in this 2D NMR spectra originate from backbone H-N amide groups and some (minority) from the side chain NH and NH 2 groups. The position of 15 N-HSQC resonances are defined by the 1 H N and 15 N H chemical shift values, which in tern depend on the local electronic environment. Ligand binding changes such an environment for the residues forming the binding site even if the tertiary structure of the rest of the protein does not get perturbed. In such a case, the 15N-HSQC resonance pattern undergoes local changes: only the resonances representing NH groups involved in the binding site change their position significantly (>0.05 ppm in 1 H and/or >0.2 ppm in 15 N dimension) or signal intensity (including peak disappearance). Figure VI.2.A illustrates such a change.

Figure_VI.2.Ab_.png

Importantly, every 15 N-HSQC resonance in Figure VI.2.A is labeled with a single letter to help identify specific peaks which undergo spectral changes upon ligand binding. This data could have much greater impact if the peaks which underwent the most pronounced changes in position and/or intensity were assigned to specific amino acid residues within the polypeptide and chemical groups within those residues (backbone vs. side chain). The rest of this Chapter demonstrates some of the fundamentals of the heteronuclear NMR resonance assignment methodology.

Heteronuclear 3D NMR introduction: CBCA(CO)NH spectrum as an example

Just like every 2D 15N-HSQC resonance reports a J-coupling via a covalent bond between an 15N and 1H spin-½ nuclei, there are 3D NMR experiments which report resonances originating from J-coupling (through-bond) of three types of spin-½ nuclei ( 1 H, 13 C, 15 N). In this section we will introduce two such types of 3D NMR data: HNCAB and CBCA(CO)NH. In order to produce a protein sample with nearly complete uniform labeling with 13 C and 15 N isotopes, bacterial recombinant protein expression can be performed in a minimal media supplemented with 13 C-labeled glucose and 15 N-labeled ammonium chloride as the sole sources of carbon and nitrogen respectively. Figure VI.2.B introduces a general concept of a 3D NMR data and shows an element of 3D CBCA(CO)NH spectrum.

Figure_VI.2.Bd_.png

Each resonance (“cross-peak”) of a 3D CBCA(CO)NH spectrum indicates a through-bond (J-coupling scalar) interaction between two atoms of the backbone amide group ( 1 H N and 15 N H ) or residue j and Cα and Cβ nuclei ( 13 C) of preceding residue j -1. The name of the experiment, CBCA(CO)NH refers to the specific spin-½ nuclei involved (and not involved) in relevant J-coupling interactions: Cβ and Cα are J-coupled to NH while the connecting carbonyl carbon is not reporting any NMR signal (although its magnetization state is affected during the experiment). Two types of residues generate special CBCA(CO)NH peak pattern: prolines have no amide proton, so they do not have CBCA(CO)NH peaks linked with their amide groups. Glycine residues have no Cβ, therefore for any residue following a glycine only a single CBCA(CO)NH resonance will be observed (from glycine NH to previous Cα).

The NMR resonance assignment: combined use of two complementary datasets HNCACB and CBCA(CO)NH

By itself, CBCA(CO)NH does not convey much of sequential information. Another heteronuclear 3D NMR dataset, HNCACB, affords a powerful complement here. Just like CBCA(CO)NH, HNCACB reports resonances originating from J-coupling between backbone amide group and Cα / Cβ nuclei. The difference is that HNCACN reports two additional peaks, all intra-residual: between HN and Cα a Cβ spins ( Figure VI.2.C ).

Figure_VI.2.Clast_.png

Typically, HNCACB and CBCA(CO)NH are acquired with identical parameters including spectral width in all three dimensions and the same number of data points in the 15 N dimension (or 15 N planes as on panel B of Figure VI.2.B ) Now, let’s imagine that we go through every 15 N plane and build the pairs of “residue j / residue j -1″ HNCAB/CBCA(CO)NH peaks. This does not give us the sequence-specific NMR resonance assignments yet but already creates such pairs of 3D cross-peaks linked to di-peptides within the sequence. Now, let’s take into account that for some types of residues their 13Cα and 13Cβ chemical shift values differs remarkably from those from other residue types. For details, take a look at BMRB chemical shift statistics for amino acid residues with emphasis on Gly, Ala, Ser, Thr. Knowing where such residues are positioned within the polypeptide sequence, we can start “connecting the dots” by mapping HNCACB/CBCA(CO)NH planes and di-peptides on actual amino acid sequence.

Figure_VI.2.D.png

Figure VI.2.D provides a general idea of how the two 3D NMR experiments HNCACB and CBCA(CO)NH can be utilized together to map the signals on the amino acid sequence of a protein sample. The C of Ala residues typically has chemical shift values below 20.0 ppm, which is unique. This allows identification of Ala patterns HNCACB/CBCA(CO)NH spectral patters. Starting from this starting points (as well from other distinct values, e.g. Cα for Gly and Cβ for Ser/Thr), one can continue “connecting the dots” process outlined in Figure VI.2.D to cover the entire sequence. If these two 3D NMR datasets encounter resonance overlaps, which are impossible to resolve, more 3D NMR dataset pairs are utilized in a similar way, e.g. HNCO/HN(CA)CO and others. This process allows assignment to specific residues and chemical groups of nearly all backbone and some side-chain resonances ( 1 H N , 15 N H , 13 Cα, 13 Cβ). Methods for assigning side-chain chemical shift values are not discussed in this chapter but conceptually they are similar to the ones described here.

With the general process of the protein NMR resonance assignment described, let’s assume that this method was successfully applied to the protein target (T) sample presented in Figure VI.2.A. The resonance assignment completion allows one to replace letter labels with residue-number labels (similar to the ones used in Figure VI.2.D). This in turn allows one to determine the specific residues affected directly or allosterically by binding of the ligand (L) to the target. In many cases, such information together with other data leads to the determination of the ligand binding residues within the target. If the ligand is a candidate therapeutic agent, identification of the ligand binding residues greatly advances ensuing efforts to optimize the drug.

Example \(\PageIndex{1}\)

Analyze Figure VI.2.A and list at least two resonances which undergo major spectral changes upon binding of the unlabeled ligand (L) to the 15 N-labeled target protein (T). Major spectral changes for this model spectrum include resonances moving by >0.05 ppm in 1 H or >0.2 ppm in 15 N dimensions as well as peak disappearance (peak intensity going down to zero).

Upon ligand L binding target protein (T), resonance f disappears and resonance s moves by >0.05 ppm in 1 H dimension.

Example \(\PageIndex{2}\)

Inspect BMRB entry 50205 and list all the heteronuclear NMR datasets utilized for the NMR resonance assignment.

BMRB entry 50205 contains the chemical shift assignment data for the target sample and offers several ways to look at its underlying NMR data including the list of experiments used to perform the NMR resonance assignment and the chemical shift values. E.g., the NMR-STAR v3 text file has a section titled _Experiment_list, which sums up the heteronuclear NMR data types used for making the assignments: 2D 1 H- 15 N HSQC and 3D HNCACB, CBCA(CO)NH, HNCO and HN(CA)CO.

Example \(\PageIndex{3}\)

How many 3D HNCACB resonances would you expect to originate from a Lys residue which is preceded by a Met?

four as both Lys and Met have backbone amide (HN) groups and both have Cα and Cβ atoms.

Practice Problems

Problem 1 . Analyze Figure VI.2.A and list all the resonances which undergo major spectral changes upon binding of the unlabeled ligand (L) to the 15 N-labeled target protein (T). Example 1 above will help you start the analysis.

Problem 2 . From BMRB entry linked to PDB 5VNT, list all the heteronuclear NMR datasets utilized for the NMR resonance assignment for the target sample.

Problem 3 . Let’s consider panel B of Figure VI.2.B . Imagine that the 13 C dimension is taken out of the spectrum (all 13 C planes are collapsed together). What type of 2D spectrum will remain after such a dimension reduction?

Problem 4 . How many 3D HNCACB resonances would you expect to originate from a Gly residue which is preceded by a Pro?

Problem 5 . How many 3D HNCACB resonances would you expect to originate from a Pro residue which is preceded by a Gly?

Problem 6* . Look up the amino acid NMR chemical shift values statistics table presented with BMRB repository and list the average values for the following resonances: 15 N, 13 Cα and 13 Cβ for Gly, Ala, Tyr, Glu, Arg, Ser, Thr, Pro. From this analysis, suggest what types of residues tend to report unusually low or high chemical shift values in comparison with the rest of the amino acids?

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 21 March 2023

Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing

  • Anthony C. Bishop   ORCID: orcid.org/0000-0003-4853-1073 1 ,
  • Glorisé Torres-Montalvo   ORCID: orcid.org/0000-0001-5240-6000 1 ,
  • Sravya Kotaru   ORCID: orcid.org/0000-0002-5632-8190 2 ,
  • Kyle Mimun   ORCID: orcid.org/0000-0003-0060-4149 1 &
  • A. Joshua Wand   ORCID: orcid.org/0000-0001-8341-0782 1 , 2 , 3 , 4  

Nature Communications volume  14 , Article number:  1556 ( 2023 ) Cite this article

3218 Accesses

2 Citations

3 Altmetric

Metrics details

  • Molecular conformation
  • Solution-state NMR

Assignment of resonances of nuclear magnetic resonance (NMR) spectra to specific atoms within a protein remains a labor-intensive and challenging task. Automation of the assignment process often remains a bottleneck in the exploitation of solution NMR spectroscopy for the study of protein structure-dynamics-function relationships. We present an approach to the assignment of backbone triple resonance spectra of proteins. A Bayesian statistical analysis of predicted and observed chemical shifts is used in conjunction with inter-spin connectivities provided by triple resonance spectroscopy to calculate a pseudo-energy potential that drives a simulated annealing search for the most optimal set of resonance assignments. Termed Bayesian Assisted Assignments by Simulated Annealing (BARASA), a C++ program implementation is tested against systems ranging in size to over 450 amino acids including examples of intrinsically disordered proteins. BARASA is fast, robust, accommodates incomplete and incorrect information, and outperforms current algorithms – especially in cases of sparse data and is sufficiently fast to allow for real-time evaluation during data acquisition.

Similar content being viewed by others

nmr spectra assignment

Accurate structure prediction of biomolecular interactions with AlphaFold 3

nmr spectra assignment

Highly accurate protein structure prediction with AlphaFold

nmr spectra assignment

Synthetic intrinsically disordered protein fusion tags that enhance protein solubility

Introduction.

Nuclear magnetic resonance (NMR) spectroscopy is unique in its ability to provide simultaneous and comprehensive structural and dynamical atomic-scale information about macromolecules such as proteins in solution 1 , 2 , 3 , 4 . Unfortunately, however, an observed resonance frequency in an NMR spectrum cannot yet be directly assigned to the individual atom(s) within the protein from which they arise without the time-intensive collection and analysis of additional spectra. Comprehensive mapping of individual resonances comprising nuclear magnetic resonance (NMR) spectra to specific atoms within a protein molecule is a general prerequisite for the successful analysis of the structure and dynamics of proteins by NMR spectroscopy. Early applications of multi-dimensional homonuclear 1 H NMR data to the so-called resonance assignment problem relied heavily on human intervention. The first comprehensive approach was the sequential assignment method, which centered on identification of J-coupled spin systems 5 that are then assembled through connections provided by short distances revealed by the nuclear Overhauser effect (NOE) interactions between sequential residues using the identity of side chains to error-check against the primary structure 6 , 7 . The subsequent main chain directed (MCD) assignment strategy 8 , 9 formalized self-correcting cyclic patterns of backbone 1 H- 1 H NOE interactions and provided a more robust algorithmic framework that relieved somewhat the complexity of identifying side chain resonances 10 , 11 . While the MCD approach did lead to the first fully automated assignment of 1 H resonances to backbone hydrogens 11 , automation of 1 H-based resonance assignments was generally frustrated by the overwhelming spectral degeneracy of multidimensional 1 H spectra of proteins and the interference of technical attributes such as a prominent diagonal. The introduction of heteronuclear triple resonance spectroscopy 12 , 13 , 14 , 15 , 16 , 17 completely changed the landscape of the resonance assignment task by providing much greater resolution, generally higher quality data, and, most importantly, definitive rules with very precise meanings for making connectivities (correlations) between backbone resonances. Triple resonance assignments of the protein backbone permit access, either directly or by tethering to side chain resonance assignments, to a wide range of dynamic phenomena 17 , 18 and structural information 19 , 20 , 21 .

Automated triple resonance algorithms have led to effectively complete backbone resonance assignments of smaller proteins with little human intervention and greatly aided the assignment of larger systems 22 , 23 , 24 . Yet, even with the advent of transverse relaxation optimized spectroscopy (TROSY) 25 , the comprehensive assignment of systems larger than 30 kDa remains remarkably rare. The limitations are quite analogous to that summarized for earlier assignment strategies based exclusively on 1 H- 1 H scalar and NOE interactions: increasing ambiguity in connectivities due to degeneracy, loss of resonances due to relaxation or artifact, and other confounding spectral attributes are simply not sufficiently accommodated by current automated assignment strategies.

Here, we strive to overcome the issue of data sparseness and ambiguity by appealing to the statistics of Bayes to utilize available information more effectively via the calculation of explicit probabilities. Importantly, this formalism also allows for a flexible and adaptable incorporation of chemical shift prediction and structural knowledge into the assignment process. By implementing the Bayesian analysis within a simulated annealing engine, we develop a robust and efficient search for optimal solutions. Protein assignment algorithms utilizing simulated annealing have been developed in the past 26 . However, the stochastic algorithm described here takes advantage of readily available pre-existing structural models, both experimentally-determined and predicted, and in doing so more effectively exploits the rich information contained within structure-based predicted chemical shifts. We demonstrate how these invaluable restraints greatly aid the resonance assignment process, especially in cases where data may be otherwise sparse or even incorrect. We also compare the overall performance of BARASA against three highly cited assignment algorithms on a variety of experimental datasets.

Results and discussion

Bayesian assisted resonance assignments by simulated annealing (barasa).

We designed an algorithm, termed BARASA, which utilizes a simulated annealing approach 27 to efficiently search the immense solution space for the optimal set of resonance assignments starting with a set of raw crosspeaks derived from triple resonance type spectra. The objective is to find the correct mapping of individual resonances to specific atoms within the protein molecule. The algorithm first assembles an initial set of spin systems based on an analysis of crosspeak lists and the connectivity rules of the particular triple resonance experiments employed. This process may not yield an unambiguous nor complete set of spin systems due to inherent degeneracy and missing or artifactual peaks (See Methods). As a result, a given crosspeak could be associated with multiple, spectrally-overlapping spin systems; in which case, the crosspeak is randomly placed in one of the overlapping spin systems. The simulated annealing search engine then randomly distributes the starting set of spin systems to specific residue positions. If there are more spin systems than residue positions, then the excess spin systems are placed in a cache for later use as described below. The energy of this initial state is calculated as the sum of the energies of the individual spin systems currently placed in residue positions. Each spin system energy is composed of two terms: the adjacency energy and the chemical shift energy. The adjacency energy describes the interaction between two spin systems mapped to adjacent locations on the amino acid sequence. This energy is minimized if the Cα(i), Cβ(i), and C’(i) shifts of the spin system match the Cα(i-1), Cβ(i-1), and C’(i-1) of the spin system at the following residue in the sequence. In contrast, the chemical shift energy describes the interaction between a spin system and its current residue position i.e., it is defined by the local sequence and structure. This energy is minimized when the resonances of the spin system closely match the predicted values of the current residue position, while also failing to match the predicted values at all other residue positions. Application of Bayes’ theorem then provides a posterior probability of assigning each spin system at each location in the sequence that is based on the predicted and experimental shifts. Using this probability, the chemical shift energy is calculated (see Methods for a more detailed description). After the initial calculation of energy, a spin system or individual crosspeak is randomly chosen. A spin system is either moved to an unoccupied residue position, swapped with another spin system, or added to the cache. Spin systems or cross peaks deposited to their respective caches have no priority and are randomly selected from the cache. Similarly, if a chosen crosspeak can be productively added to the crosspeak cache, swapped with another crosspeak in an overlapping spin system, or moved to an overlapping spin system, the move is made. With every crosspeak/spin system swap, the decision to accept the proposed move is made based on the energy of the system before and after the proposed swap. Using an effective temperature T, the Metropolis criterion 28 is applied (Eq.  1 ).

\({P}_{{{{{\rm{accept}}}}}}\) is the probability of accepting the swap and \(\varDelta E\) is the change in energy due to the proposed swap. If \(\varDelta E\le 0\) then \({P}_{{{{{\rm{accept}}}}}}\) is set to 1. If \(\varDelta E\) > 0, then \(0 < \,{P}_{{{{{\rm{accept}}}}}} < 1\) and a uniformly distributed random number r such that \(0\le {r}\le 1\) is generated. If \(r\le \,{P}_{{{{{\rm{accept}}}}}}\) then the swap is accepted. Otherwise, the swap is rejected and the system state is left unchanged. Random swap attempts are continued until the average energy of system does not vary significantly. \(T\) is then decreased by following a highly optimized schedule based on a quantity analogous to the specific heat of the system (see Methods). The system is further cooled and equilibrated in this manner until a set of termination criteria are achieved and the annealing protocol is ended. Finally, to ensure that the system has reached a minimum in energy, a proposed swap of each spin system with every other spin system as well as every crosspeak with every other possible crosspeak is then attempted with only decreasing energy changes being accepted. This post-annealing minimization routine is repeated 100 times. The entire procedure, starting from initialization and ending with minimization, is repeated 20 times. The algorithm then chooses the spin system that was assigned to each residue location in a majority of the annealing runs (if any) and builds a consensus assignment set. The consensus assignment set is further curated using criteria defined below to produce the final assignment set. The overall BARASA algorithm is outlined in Figs.  1 and 2 .

figure 1

a The search engine rests on a Bayesian-based simulated annealing protocol that uses a specific-heat mechanism to guide cooling. Crosspeaks lists drawn from triple resonance spectra are assembled into putative spin systems, which are then randomly assigned to positions within the primary sequence of the protein. Sequential adjacency in the primary sequence is provided by apparent connectivities derived from triple resonance NMR spectra. Predicted chemical shifts, based on a high-resolution structural model or gleaned from empirical amino acid-specific distributions, are incorporated into the system energy using Bayesian statistics. Throughout annealing, crosspeaks may move among spin systems with overlapping resonances, changing the energies of the affected spin systems. Annealing involves Monte Carlo swapping of both crosspeak assignments to spin systems and spin system assignments to locations in the sequence. The concept of dynamic swapping of individual crosspeaks or entire spin systems is outlined in Fig.  2 . Annealing continues until energy equilibration is achieved. The temperature is then lowered and the system re-equilibrated. Annealing is stopped when the termination criteria are met and a local minimization routine is performed. b The final resonance assignments are developed from results of multiple independent simulated annealing runs. c Shown is a ribbon representation of maltose binding protein (PDB code: 1DMB [ https://doi.org/10.2210/pdb1DMB/pdb ]) color-coded according to assignment status following analysis by BARASA: correctly assigned residues (blue); unassigned residues (white), prolines (red). See main text for further details.

figure 2

a Spin systems (orange puzzle pieces) begin in the cache (black box) and are initialized by random assignment to the sequence (purple pieces). Spin systems can then be swapped with others or moved to different locations of the sequence or to the cache. Spin systems or cross peaks in their respective caches have no priority and are randomly selected. Swaps are accepted or rejected with a probability based on the change in energy of the proposed swap. b The energy of each spin system depends on how it fits with the adjacent spin system (adjacency energy) and with the predicted shifts for that residue location (chemical shift energy). Exchange of crosspeaks between spin systems can be thought of as changing the puzzle piece shape. See main text and supplementary material for details.

BARASA is accurate, robust, and fast

We tested BARASA against a test set of six different folded protein systems ranging in size and topology: human interleukin-1 receptor antagonist C66A, C122A (IL-1Ra, 152 residues, 17.1 kDa), human interleukin-1 β (IL-1 β , 154 residues, 17.5 kDa), S. solfataricus indole-3-glycerol phosphate synthase R43S (IGPS, 248 residues, 28.4 kDa), E. coli maltose binding protein (MBP, 371 residues, 40.8 kDa), the first cyclization domain from the Y. pestis yersiniabactin non-ribosomal peptide synthetase (Cy1, 453 residues, 51.9 kDa), and E. coli thymidylate synthase (ecTS, 264 residues, 61.0 kDa homodimer). In addition, we challenged the algorithm with two so-called intrinsically disordered proteins (IDPs). These include the V5 domain (residues 606-672) of human protein kinase C (V5dm, 68 residues, 7.7 kDa) and the intrinsically disordered region of human ANP32A (hIDD, 110 residues, 12.8 kDa). All crosspeak lists were derived from triple resonance data (Table  1 ). Crosspeak positions used were pulled from the canonical triple resonance spectra used for protein assignment (i.e., HSQC, HNCO 29 , HN(CA)CO 30 , HNCA 31 , HN(CO)CA 31 ,HNCACB 32 , HN(CO)CACB/CBCA(CO)NH 33 ) (see Supplementary Table  S1 ) with the exception of hIDD in which crosspeaks were derived from provided spin systems. To generate crosspeaks from the spin systems of hIDD, Gaussian error was added to the resonance values to create the chemical shifts of simulated crosspeaks. (see Methods). Four of the data sets (IL-1Ra, IL-1 β , IGPS, and MBP) were obtained in our laboratory. Crosspeak lists for Cy1, ecTS, V5dm, and spin systems for hIDD were kindly provided by Drs. Dominque Frueh (Johns Hopkins University), Andrew Lee (University of North Carolina at Chapel Hill), Tatyana Igumenova (Texas A&M University) and Martin Blackledge (Institut de Biologie Structurale), respectively.

The results from BARASA were compared to reference assignments to assess program performance. Reference assignments were obtained from either the BMRB, directly from another lab, or manually determined by us (Table  1 ). Deposited assignments were manually mapped to the acquired spectra for comparison. A small movement in crosspeak positions between the deposited assignments and the acquired spectra was permitted to account for differences in experimental conditions. In addition, a small number of resonances assigned in the deposited data sets were not present in the acquired spectra of IL-1 β . These were removed from the reference assignments and considered unassigned when assessing algorithm performance (Supplementary Table  3 ). For the most part, reference assignments were considered complete though in a few cases BARASA identified a small number of additional assignments that were confirmed manually and included in the reference assignments (Supplementary Tables  6 – 9 ). For each residue position, BARASA either outputs the spin system and its associated resonances that were assigned to that residue position or marks it as unassigned. The assignment given to each residue in the protein sequence by BARASA was determined to either be matching, missing, or mismatching its counterpart in the reference assignments. A residue was considered to have a matching assignment if the amide group assigned to it by the algorithm was the same as the reference. A residue was also considered to match the reference if it was unassigned both by BARASA and in the reference assignments. A residue was designated missing if an amide group was assigned to that location in the reference assignments, but BARASA did not assign that residue position. Lastly, a residue was labeled as mismatching if BARASA assigned an amide group and it did not match that in the reference assignments or if the residue was unassigned in the reference assignments.

In general, BARASA’s performance when utilizing structure-based chemical shifts and crosspeak lists derived from a comprehensive set of triple resonance experiments is marked by (nearly) complete assignments when compared to the manually curated reference assignments and, most importantly, produced very few errors (Fig.  3 & Supplementary Table  2 ). Individual statistics for each assignment are listed in Supplementary Tables  3 – 10 . BARASA had relatively more difficulty with the Cy1 and IGPS examples. This is likely due to a higher degree of variance in resonance chemical shifts of the backbone spins among the different spectra relative to the test cases because of the employment of multiple independently prepared samples, but the performance overall remained very good (Fig.  3 ). In the case of hIDD, a relatively high apparent mismatch rate is observed. Upon closer examination, the mismatching assignments made by BARASA were all assignments not previously reported as assigned. Many of these previously unreported assignments fall within regions of the sequence with low complexity (Supplementary Table  10 ) which is likely why they were difficult to assign manually. While there are no independent data supporting their veracity, these assignments proposed by BARASA and, as we discuss more below, by the next best performing automated assignment algorithm FLYA 34 are highly similar and are likely to be largely correct.

figure 3

Comparison of automated assignment algorithms. Results of automated resonance assignments by BARASA utilizing raw crosspeak lists drawn from a relatively comprehensive set of triple resonance experiments. Compared to manually curated resonance assignments obtained for eight test proteins: interleukin-1 β (IL-1 β ), interleukin-1 receptor antagonist (C66A, C122A) (IL-1Ra); indole-3-glycerol phosphate synthase (R43S) (IGPS), maltose binding protein (MBP), non-ribosomal peptide synthetase (Cy1), thymidylate synthase (ecTS), V5 domain of protein kinase C (V5dm), and intrinsically disordered region of human ANP32A (hIDD). Shown are the fractions of residues that are accurately matched (green), mismatched (magenta), or missing (i.e., unassigned) (blue) to the reference assignments. *In the case of hIDD, a number of de novo assignments were indicated by BARASA and are included as mismatching with the reference assignments. See main text and Table  1 . Source data are provided in the Source Data file.

BARASA utilized SHIFTX+ 35 predicted chemical shifts for the globular test proteins, whereas the algorithm utilized random coil chemical shifts 36 , 37 for the so-called IDP examples as predicted shift restraints during annealing (see Methods). SHIFTX+ was chosen as it appears to be among the best-reported chemical shift prediction algorithm based solely on three-dimensional structural information and other physical parameters (i.e., temperature, pH). The related algorithm SHIFTX2, though it gives more accurate predictions, relies on the analysis of shifts from homologous proteins as well as the three-dimensional structural inputs specific to the protein being analyzed. It was our concern that the accuracy of SHIFTX2 would vary with the number of homologs available and, under circumstances of sparse homologs, result in significantly larger errors than are reported for the average case. As accurate estimation of prediction error is crucial to the Bayesian analysis (Methods), inaccurate and/or unaccounted for variance in prediction errors could compromise performance. Furthermore, as SHIFTX2 performs searches for the known chemical shifts of homologous sequences as part of its prediction, it would utilize the previously assigned shifts of our test proteins to present the BMRB in the generation of the predicted shifts. Such shifts would not be generally available for the de novo assignment of a protein and would thus be an invalid test of BARASA. We also note that using predicted chemical shifts generated by SPARTA+ 38 gave similar results (Supplementary Table  11 ) as when using those predicted by SHIFTX+.

In this regard, it is important to appreciate that it is statistically anticipated from the distributions of chemical shifts, either predicted or documented in the BMRB, that values outside the error range will be encountered. For example, if the distribution were taken as Gaussian and employing the standard deviation as the prediction error (see Methods), approximately 32% of all predictions would be expected to be outside of the considered error range. This is what is observed. Supplementary Tables  3 – 10 contain the likelihoods of the spin systems for the various test proteins. These likelihoods represent the probability of observing the experimental shifts given that the assignment is correct and ranges from 0 to 1. Likelihoods lower than 0.32 correspond to spin systems with predicted resonance chemical shifts that are, on average, beyond the specified error range but are nevertheless well accommodated by BARASA.

Finally, BARASA also produces a curated set of assignments from 20 annealing runs within 1 hour for each system tested (see Supplementary Table  12 ). With high accuracy and runtimes under an hour, the advantages of BARASA become even more apparent when considering large proteins with suboptimal data sets.

The performance of BARASA with suboptimal data sets

The rather complete crosspeak lists from an extensive set of triple resonance experiments for each test protein provide valuable benchmarks for the validation of BARASA, but are arguably not fully illustrative of the difficult protein systems often challenging current applications of protein NMR spectroscopy. To examine the performance of BARASA in cases of missing data and to illuminate the most impactful triple resonance information, individual crosspeaks or all crosspeaks of entire spin systems from the MBP and ecTS data sets were randomly discarded to generate compromised data sets, emulating data collection on challenging protein systems. Individual crosspeaks were randomly retained in the data set with a probability based on the crosspeak type (i.e., Cα, Cβ, or CO resonance). This process was done over a wide range of retention probabilities to produce a multitude of distinct data sets that represent a wide range of data completeness. These depleted peaks lists were then used as input to BARASA the results of which are provided in Supplementary Tables  13 and 14 . In this way, the relative importance and completeness of different types of spectral data as well as the effects of entirely missing spin systems could be probed. In addition, a key question was to learn the extent to which structure-based chemical shifts, as opposed to general BMRB residue-specific statistics, can rescue the assignment and aid the assignment process.

Figure  4 illustrates the robustness of BARASA when analyzing conditions of missing spectral data. This specific example was generated using retention probabilities of 88% and 25% for the Cα- and Cβ-based information, respectively, and with retention probabilities of either 0% or 75% for the CO-based information. Reliance on the BMRB database for predicted shifts, as opposed to structure-based shifts, yielded poor performance. In brief, the use of structure-based SHIFTX+ 35 predictions entirely rescues the resonance assignment. These data indicate that the availability of the structure-based chemical shift predictions serves as a powerful restraint in protein assignment - large enough to potentially surpass the information provided by the CO experimental pair under many circumstances. This is likely due to the fact that spin system adjacency is established adequately with the Cα and Cβ spectral information and the remaining assignment ambiguity is due to residue type matching; CO resonances provide little residue type information and offer little help in this respect. We do not believe this observation to be an artifact of the parameterization of the energy function since carbonyl-derived connectivities are weighted roughly the same as the chemical shift probability (Methods). As such, the energy provided by CO connectivity information would be of a similar magnitude of the total chemical shift energy of the spin system.

figure 4

Shown are the fractions of residues that are accurately matched (green), mismatched (magenta) or missing (i.e., unassigned) (blue) to the reference assignments. Panels a – d correspond to results from input data sets where entire spin systems were discarded from the crosspeak lists. The ordinate axis is the fraction of retained spin systems and the dashed lines indicate the maximum fraction of possible matching assignments. The effects of random spin system depletion on the analysis of MBP both randomly ( a ) and as stretches of five consecutive residues being discarded ( b ). A similar analysis of ecTS with either individual ( c ) or groups of five consecutive spin systems being discarded ( d ). For the conditions 0.8 and 0.6 fractions retained, ten random data sets retaining the indicated fraction of spin systems were generated. The performance of BARASA on each data set is shown as a single orange solid circle, with the bar height representing the arithmetic mean. The full data set (“1.0” condition) results were taken from Fig.  3 . Only one result with the full data set was measured to avoid the comparison of run-to-run variation with variation due to differences in the input data set. The effects of restricting connectivity information by utilizing only a single pair of triple resonance experiments with either residue-type statistics (BMRB) ( e ) or structure-based (SHIFTX + ) ( f ) chemical shift predictions for MBP. Similarly, for ecTS using only residue-type statistics (BMRB) ( g ) or structure-based (SHIFTX + ) ( h ) chemical shift predictions. The effect of random depletion of crosspeaks from the comprehensive set of triple resonance experiments where the indicated percentages each type of crosspeak that are retained is illustrated for the MBP ( i ) and ecTS ( j ) data sets and used with residue-type statistics (BMRB) or structure-based (SHIFTX+) predicted chemical shifts. Results of ten individual runs ( n  = 10) are plotted as solid orange circles and bar heights represent the arithmetic mean. Source data are provided as a Source Data file.

Randomly retained spin system data sets were generated in two ways: by allowing all crosspeaks of any spin system assigned in the reference assignments to be randomly discarded from the input data set until only the indicated fraction of the assigned spin systems remained or by discarding the crosspeaks of random spin systems in the same manner, with the added condition that only those from sets of five random, but contiguous in sequence, spin systems are discarded. The latter condition was performed to simulate the performance of BARASA under the common situation where exchange broadening arising from physical motion of contiguous stretches of sequence (e.g., loops) results in loss of amide resonances. In both cases, BARASA is still able to produce the overwhelming majority of the possible assignments without errors even when up to 40% of the spin systems are missing (Fig.  4 ). There is little difference in performance whether the missing data is localized or distributed across the sequence. The performance of BARASA when challenged with artifact peaks, which often arises from low-concentration or unstable samples or instrumentation, was also examined. In this case, a depleted data set from above was augmented with randomly generated artifact peaks. Only a modest decrease in performance is observed even when the crosspeak list is contaminated with 20% artifactual entries (Supplementary Fig.  1 ).

Even with the considerable time-savings introduced by non-uniform sampling 39 , collection of NMR data on proteins is still time intensive. The superior performance of BARASA on missing data within a comprehensive set of triple resonance experiments raised the possibility that BARASA could tolerate a reduced set of triple resonance experiments. We tested this hypothesis using ecTS and MBP where information from a single triple resonance experimental pair (e.g., HNCA and HN(CO)CA) combined with BMRB or SHIFTX + predicted shifts were analyzed. The Cα- and Cβ-type triple resonance pairs are equally useful in the BARASA assignment process when provided SHIFTX+ shifts, but the Cβ information becomes relatively more effective when relying on BMRB amino acid distributions (Fig.  4 ). This is clearly due to the higher residue type information intrinsic to the Cβ resonance. Overall, BARASA performs extremely well with either the Cα -or Cβ-type triple resonance experimental pairs only. In contrast, the CO-type triple resonance experimental pair when used alone is much less effective, likely due to the reduced sensitivity of carbonyl carbon shifts to amino acid type and local structure.

Comparison to alternate automated resonance assignment algorithms

Computer-assisted resonance assignment strategies for analysis of triple resonance spectra have been employed for over two decades. For the sake of comparison, three highly-cited algorithms were compared to BARASA: FLYA, AutoAssign 22 , and I-PINE 40 . The same crosspeak lists derived from the comprehensive set of triple resonance experiments were used for all four algorithms (Fig.  5 ). BARASA achieved the highest percent matching among all the algorithms against the reference assignments in all test cases. BARASA outperformed AutoAssign and I-PINE by considerable margins, most notably with the two IDPs examined, while offering only marginal improvement over FLYA (Supplementary Table  2 ). Importantly, BARASA made few mismatching assignments (<3%) while I-PINE had up to 20% mismatches meaning that about 1 in 5 assignments made were incorrect. For these reasons, AutoAssign and I-PINE were not examined further.

figure 5

Performance of BARASA, FLYA, AutoAssign (AA), and I-PINE against reference triple resonance assignments of six protein systems: a IL-1 β ; b IL-1Ra; c IGPS; d MBP; e CY1; f ecTS; g V5dm; h hIDD. Shown are the fractions of residues that are accurately matched to the reference assignments (green), incorrectly matched (magenta) or missing (i.e., unassigned) (blue). *BARASA and, to a lesser extent, FLYA extended the reference assignments for hIDD considerably (Supplementary Table  10 ). The extended assignments are therefore denoted here as mismatching. Source data are provided as a Source Data file.

The marginal advantage of BARASA over FLYA when utilizing a comprehensive triple resonance data set prompted us to examine their behavior in the more challenging situations commonly encountered. BARASA’s performance in settings where there is a significant amount of missing data was compared against FLYA. MBP and ecTS crosspeak lists with varying retention probabilities were generated and used as input for BARASA and FLYA (Fig.  6 ). BARASA was able to generate a higher assignment match rate in all scenarios with the difference in performance between the algorithms growing as the data became increasingly sparse. In addition, the mismatch rate between the algorithms remained similar. These results demonstrate that BARASA has excellent outcomes in circumstances where there is a large quantity of missing data – greatly outperforming existing algorithms.

figure 6

The effects of random crosspeak depletion on the analysis of MBP ( a ) and ecTS ( b ) comprehensive triple resonance data sets with partial retention of the indicated crosspeak types (see text and Fig.  4 ). Shown are the fractions of residues that are accurately matched (green), mismatched (magenta) or missing (i.e., unassigned) (blue) to the reference assignments. Ten independent data sets ( n  = 10) were randomly generated for each depletion condition. The results of analysis by BARASA for each data set are shown as solid orange circles and the bar heights correspond to the mean. Source data are provided as a Source Data file.

Use of predicted versus experimentally determined structural models

It is clear from Fig.  4 that use of structure-based chemical shift predications provides significant advantages over simple residue-type predictions derived from empirical distributions. This is particularly true in the case of Cy1, which is perhaps an exemplar of the challenges facing modern protein NMR and required a battery of experimental spectra and labeling schemes 41 . The sheer number of samples and experiments required resulted in a relatively high variation in resonance positions among the spectra. The resonance assignment was carried out in the absence of an experimentally determined structural model with the closest homolog having only 38% identity. Accordingly, the resonance assignment of Cy1 must be considered a significant achievement.

The absence of an experimentally-determined atomic-resolution structure of the protein of interest is a common occurrence and can severely limit the resonance assignment process. However, powerful structure-prediction algorithms have recently been introduced 42 and we sought to learn how the availability of structures predicted by the AlphaFold2 algorithm influence the performance of BARASA. Chemical shifts predicted by SHIFTX+ using the structure of Cy1 predicted by AlphaFold2 were used for analysis by BARASA. Using only residue-type information based on the BMRB resulted in poor performance. However, when utilizing the predicted chemical shifts from the predicted structure of Cy1, BARASA recapitulated its performance based on the NMR-derived structure and a comprehensive set of triple resonance experiments. In addition, BARASA performed very well using subsets of triple resonance experiment pairs and significantly outperformed FLYA (Fig.  7 ). This level of success of BARASA using SHIFTX+ in concert with structures predicted by AlphaFold2 was observed across the test data sets (Supplementary Table  15 ). Taken together these data suggest that the lack of an experimental structure is unlikely to hinder the full capability of the BARASA algorithm.

figure 7

The resonance assignment by BARASA using the indicated cross crosspeak types from the triple resonance spectra and, residue-type (BMRB) chemical shift statistics ( a ) or chemical shifts predicted by SHIFTX + based on a structural model provided by AlphaFold2 ( b ). Triple resonance data sets include the peaks from the following spectra: HNCA/HN(CO)CA (Cα), HN(CA)CB/HN(COCA)CB (Cβ) and HNCO/HN(CA)CO (CO). Bar heights indicate the fractions of residues that are accurately matched (green), mismatched (magenta) or missing (i.e., unassigned) (blue) to the reference assignments. Equivalent runs with FLYA ( c ) using the data set of ( b ) reinforce the conclusion that BARASA is more robust to non-ideal data. Source data are provided as a Source Data file.

In summary, we have demonstrated that Bayesian-based simulated annealing combining sequential relationships derived from triple resonance spectra and chemical shift information predicted from a high-resolution structural model can greatly facilitate the triple-resonance backbone assignment of proteins. The implementation of this strategy in BARASA is robust to incompleteness of spin system definition (sparseness) and overall complexity of the resonance assignment challenge (protein size). Importantly, BARASA is relatively conservative and makes few errors. An optimized annealing strategy utilizing a specific heat approach to guide temperature cooling results in a very rapid analysis. The speed of analysis combined with its aforementioned robustness clearly positions BARASA to inform on the real time data acquisition side of the resonance assignment process. This becomes increasingly feasible with the utilization of automated crosspeak picking. Iterative examination by BARASA of sequentially acquired triple resonance spectra could, in principle, allow the user to determine if a satisfactory level of assignment can be achieved without further data acquisition and thereby save valuable spectrometer time. In summary, the BARASA algorithm provides the ability to easily and robustly assign unusually difficult protein systems and simplify this otherwise challenging task. The combination of fast and robust backbone resonance assignments with structure-based methyl resonance assignments 43 , 44 , 45 , 46 , 47 , 48 will reduce the resonance assignment barrier considerably and allow greater application of the power of NMR spectroscopy to be applied in a facile manner to otherwise challenging proteins.

NMR sample production

A vector encoding the gene for Interleukin 1- β (IL-1β) was transformed into E. coli BL21DE3 cells and expressed in 1 L of 95% D 2 O M9 media containing 15 NH 4 Cl and 2 H, 13 C glucose as the sole nitrogen and carbon sources, respectively. Cells were grown at 37 °C to an OD 600 of 0.9 and induced with 1 mM IPTG. Induction continued for 4 hrs at 37 °C until harvesting via centrifugation at 3500xg and frozen overnight. The cell pellet was then thawed, resuspended in 10 mM potassium phosphate pH 8.0, 0.2 mM EDTA, 5 mM DTT and 1 mM PMSF. The cells were then lysed by sonication and centrifuged at 32,000xg for 30 min at 4 °C. Lysate was then brought to 80% saturation with NH 4 SO 4 and allowed to stir for 1 hr at 4 °C. The suspension was then centrifuged for 30 min at 32,000 x  g 4 °C and the pellet was resuspended in 25 mM ammonium acetate pH 4.5, 1 mM BME and dialyzed overnight (8 kDa MWCO) in the same buffer at 4 °C. The dialyzed protein was then loaded onto a HiTrap Capto S column (Cytiva Life Sciences) equilibrated in 25 mM ammonium acetate pH 4.5, 1 mM BME and eluted with a linear gradient up to 500 mM ammonium acetate pH 4.5, 1 mM BME. Protein was then frozen and lyophilized. The lyophilized protein was dissolved in 20 mM Tris pH 8.0, 7 M urea, 20 mM DTT and added drop wise to 20x volume of 20 mM tris, 100 mM NaCl, 5 mM DTT pH 8.0 under constant stirring. The refolded protein was then dialyzed against 50 mM sodium acetate pH 5.0, 5 mM DTT and concentrated to 0.67 mM. To this sample 0.02% NaN 3 , 100 μM DSS and 5% D 2 O was added. Triple resonance spectra were acquired at 23 °C on an 800 MHz ( 1 H) Bruker NEO spectrometer running TopSpin and equipped with a CryoProbe.

A vector encoding the gene human interleukin-1 receptor antagonist (IL-1Ra) containing C66A/C122A amino acid substitutions was expressed using E. coli BL21(DE3) cells in M9 minimal media. The M9 minimal media contained 15 NH 4 Cl and 13 C-glucose as the sole nitrogen and carbon sources respectively. The culture was centrifuged at 5000 rpm, and the cell pellet was resuspended in 20 mM Tris, 500 mM NaCl, 20 mM imidazole, pH 7.9 for sonication. Sonicated cells were centrifuged at 15000 rpm, and supernatant was loaded onto a His60 column (Takara Bio USA). The column was washed with 20 mM Tris, 500 mM NaCl, 40 mM imidazole, pH 7.9; and protein was eluted with 20 mM Tris, 500 mM NaCl, 500 mM imidazole, pH 7.9. The collected protein fraction was buffer exchanged to 12.5 mM HEPES, 50 mM NaCl, 5 mM CaCl2, pH 6.5 for His-tag removal by FXa protease (New England Biolabs) and further purified both by affinity (His60 resin) and size exclusion chromatography (S-75 Sephadex, Cytiva Life Sciences). The NMR sample was prepared by buffer exchanging the protein into 100 mM NaCl, 25 mM MES, pH 6.0 and concentrated to 1 mM, with the addition of 100 μM DSS, 5% D2O, and 0.02% NaN3 (Supplementary Table  1 ). Triple resonance assignment experiments were acquired at 35 °C on either a 500 MHz Bruker Avance spectrometer or an 800 MHZ ( 1 H) Bruker NEO spectrometer both equipped with a Cryoprobe.

A R43S variant of the gene for indole-3-glycerol phosphate synthase from S. solfataricus (IGPS) was cloned in a modified pGS-21a vector downstream of an N-terminal His-tag and TEV protease site. This expression plasmid was a gift from the lab of Professor Robert Matthews, University of Massachusetts Medical School, Worcester. IGPS R43S protein was expressed in BL21(DE3) competent cells with ampicillin antibiotic selection. Cells were grown at 37 °C until they reached an OD600nm of 0.6 and 1 mM IPTG was added to induce expression for 16-20 h at 25 °C. To isotopically label the protein for NMR spectroscopy, cells were grown in M9 minimal medium with 15 NH 4 Cl and 13 C-glucose as the nitrogen and carbon sources, respectively. Cells were lysed in 100 mM potassium phosphate, pH 7.5, 50 mM KCl, 5 mM imidazole by sonication. The lysate was loaded onto a Ni 2+ -NTA column pre-equilibrated with the lysis buffer. Impurities weakly bound to the column were washed away with 100 mM potassium phosphate, pH 7.5, 150 mM KCl, 75 mM imidazole, followed by equilibration into the low salt buffer 100 mM potassium phosphate, pH 7.5, 50 mM KCl, 75 mM imidazole. Protein was eluted with 100 mM potassium phosphate, pH 7.5, 50 mM KCl, 500 mM imidazole and dialyzed into lysis buffer. Purified His-tagged protein was concentrated to 5 mL, and tag was cleaved with TEV protease added at 1:30 mass ratio and mixing at RT overnight. Untagged protein was separated TEV protease and uncleaved protein by Ni 2+ -affinity chromatography. Protein aliquots were flash frozen and stored at −80 °C. NMR samples of 15 N 13 C-labeled IGPS were prepared at 250 µM concentration in 60 mM potassium phosphate, pH 7.2, 50 mM KCl, 5% D 2 O, 100 μM DSS. All data were collected on a 750 MHz ( 1 H) Bruker AVANCE III NMR spectrometer equipped with a CryoProbe at 50 °C.

A vector encoding maltose binding protein (MBP) was transformed into BL21DE3 cells and expressed in 1 L of 95% D 2 O M9 media containing 15 NH 4 Cl and 2 H, 13 C glucose as the sole nitrogen and carbon sources respectively. Cells were grown at 37 °C to an OD 600 of 0.9 and induced with 1 mM IPTG. Induction continued for 4 hrs at 37 °C until harvesting via centrifugation at 3500 × g. The cell pellet was frozen overnight and resuspended in 20 mM Tris-HCl, 20 mM NaCl pH 8.0, 1 mM DTT. 6 mg of Lysozyme was added and was incubated under stirring for 30 min at room temperature. Cells were further lysed by sonication and centrifuged at 32000 × g for 30 min at 4 °C. Clarified lysate was filtered (0.45 um pore size) and loaded onto a 25 ml DEAE column equilibrated in 20 mM Tris, 20 mM NaCl, pH 8.0, 1 mM DTT. The protein was eluted using a gradient to 20 mM Tris, 500 mM NaCl. Protein was concentrated to 1-2 ml and run on a 112 ml Superdex 75 column equilibrated in 20 mM Tris, 20 mM NaCl, 2 mM DTT pH 8.0. The protein was pooled and unfolded by dialysis in 4 M GuCHl, 20 mM Tris-HCl, 1 mM DTT pH 7.5. Protein was refolded by repeated 10x dilution with 20 mM sodium phosphate pH 7.1, 1 mM EDTA, 2 mM β-cyclodextrin, 0.02% NaN3, 100 μM DSS 5%D 2 O followed by concentration (4 times). From this a 0.5 mM sample of MBP was created. Spectra were acquired at 37 °C on an 800 MHz ( 1 H) Bruker NEO NMR spectrometer. NMR data acquisition and processing parameters recorded by us for IL-1β, IL-1Ra, IGPS and MBP are summarized in Supplementary Table  1 . Poisson gap NUS spectra were reconstructed using hmsIST 39 and all spectra were processed with NMRpipe 49 on NMRBox 50 . Spin systems were built by manual peaking picking using NMRFAM-SPARKY 51 and referenced using DSS.

Origin of protein test data sets

Triple resonance data acquired in our laboratory were processed using the NMRPipe 49 installed on NMRbox 50 . The crosspeak lists were constructed from data acquired in our laboratory (see Table  1 ) by manually crosspeak picking using NMRFAM-SPARKY 51 (i.e., not reconstructed from deposited assignments) (see Table  1 ). Crosspeak lists for ecTS, Cy1 and V5dm were provided by Professors Andrew Lee (University of North Carolina, Chapel Hill), Dominique Frueh (Johns Hopkins University), and Tatyana Igumenova (Texas A&M University), respectively, and were used without further adjustment. Crosspeaks for hIDD were generated from spin systems provided by Professor Martin Blackledge (Institut de Biologie Structurale) in the following manner. Each provided spin system consisted of an amide proton (H) and amide nitrogen (N) chemical shift as well as chemical shifts for Cα, Cα(i-1), CO and CO(i-1) resonances (though a complete set of carbon resonances were not present for each spin system). HNCA, HN(CO)CA, HNCO, and HN(CA)CO crosspeak lists were generated from the spin system data by adding the following crosspeaks to the indicated crosspeak list from each spin system: H-N-Cα(i-1), H-N-Cα for the HNCA; H-N-CO and H-N-CO(i-1) for the HN(CA)CO; H-N-CO for the HNCO and H-N-Cα(i-1) for the HN(CO)CA. The resonance values for the crosspeak positions were drawn from a normal distribution with a mean given by the value of the resonance in the spin system and a standard deviation of 0.003, 0.04, and 0.04 ppm for hydrogen, nitrogen and carbon resonances, respectively.

BARASA algorithm description

The algorithm begins by reading in the crosspeak lists to assemble spin systems. Within the crosspeak lists, the user provides the possible crosspeak types that are produced by the experiment. For example, the HNCA would produce possible crosspeak types of H-N-CA(i) and H-N-CA(i-1). The user also specifies cutoff values for each spectral dimension that dictate the range over which chemical shifts will be matched during spin system construction. The provided crosspeak types dictate which dimensions have resonances of ambiguous type. In the example of the HNCA, the first two dimensions are of unambiguous type (H and N resonances respectively). However, the third dimension is ambiguous (CA(i) or CA(i-1)).

BARASA builds crosspeak lists by first arbitrarily choosing a crosspeak to seed the construction of the spin system. All other crosspeaks are searched to find those that have at least two resonances of unambiguous type that match the resonances of the seed crosspeak, both in terms of their chemical shift (i.e., fall within a tolerance cutoff specified by the user) and resonance type. After each subsequent addition, BARASA attempts to resolve ambiguous resonance types based on known chemical shifts already present in the spin system. For example, if a spin system has a Cα(i-1) value of 56.0 ppm (with a tolerance of 0.3 ppm) and a HNCA crosspeak, which is added (which could have a resonance type of Cα or Cα(i-1)) with a value of 58.0 ppm, then the algorithm will resolve the type of the new crosspeak as the Cα as it is not within the 0.3 ppm tolerance of the 56.0 ppm Cα(i-1). After adding the crosspeak and resolving type, the algorithm then iterates through the entire list of remaining crosspeaks and repeats the above addition procedure. Once no more peaks can be added to the spin system, a new crosspeak is arbitrarily chosen from the list of remaining peaks to seed the construction of additional spin systems. This continues until all peaks have been added to a spin system.

If BARASA finds a crosspeak that has two unambiguous resonances that match those already present in a spin system, but contains additional resonances that have shifts which conflict with those that are already present in the spin system, then an additional spin system in which to place the incongruent crosspeak is created. Such as situation arises due to spectral degeneracy (e.g., two spin systems with the same or similar amide shifts). The algorithm will then attempt to add the remaining peaks to both spin systems. Any further clashes are resolved by the generation of a new spin system. This continues until no more crosspeaks can be added to any spin system. The crosspeaks within this group of spin systems are then marked by the algorithm to be allowed to exchange to any other spin system within the group during the annealing process. In addition, the user has the option to allow the algorithm to use a crosspeak cache to which low intensity peaks (lowest 5%) can be added to over the course of the annealing run to provide a mechanism to eliminate potential artifactual crosspeaks.

Once all the crosspeaks to a spin system have been added, all possible resonance type sets are generated for that spin system. A resonance type set is a complete designation of each atom type of each crosspeak in a spin system. If a spin system only contains peaks with no ambiguous resonance types, then the spin system has only one possible resonance type set. This is the case for the majority of data sets as experiments with ambiguous resonance types are often paired with experiments that resolve this ambiguity (e.g., HNCA, HN(CO)CA experimental pair). However, if ambiguous resonance types are present in a spin system, then the spin system will contain all possible resonance type sets. A distinct set of average resonance values for the spin system are calculated for each resonance type set; all of which will be considered over the course of the annealing run.

The resonance assignment analysis is then initialized by randomly assigning the spin systems to the protein sequence. Often there are more spin systems than are residue positions (e.g., spin systems correlated to a side-chain amide group and not the backbone are also present in the data set). Any spin systems that were not randomly placed on the sequence, are placed in a spin system cache and may be assigned to the sequence over the course of the run. The simulation temperature is initialized at 1000 arbitrary units and a spin system or crosspeak is chosen at random to swap. The probability that a swap will be a crosspeak swap is set at 0.01 (which was found to be a good compromise between sampling and algorithm speed) with the remaining swaps being spin system swaps. A chosen spin system will have the ability to be added to the spin system cache, swap positions with another spin system, or move to an empty position in the sequence, making its former position available. Whenever a spin system is moved, a random resonance type set is chosen from among those possible. In addition, the algorithm may attempt to change the current resonance type set and keep the current spin system in place. If a crosspeak is chosen to swap, it has the potential to be added to the crosspeak cache (if it is of low intensity), added to another spin system within its spin system group, or swap places with any crosspeak within its spin system group. Upon moving/swapping cross peaks, the affected spin systems are evaluated for clashes. If there are none, the crosspeak swap is allowed to continue, otherwise the swap is rejected. In addition, a crosspeak move/swap will trigger the affected spin systems to generate all new resonance type sets and choose one at random from the possibilities. This forces a recalculation of average chemical shifts for each resonance type set of each spin system resonance, as well as the Bayesian probabilities described below for sequence position determination.

If the swap is not immediately rejected due to a crosspeak clash, the change in energy of the system due to the swap is calculated using the energy function described below. The swap is then accepted or rejected at a frequency corresponding to a probability generated by applying the Metropolis criterion (Eq.  1 ). After each successful swap, the energy of the state is recorded and stored as a part of a sample of energy values. Once the sample reaches a particular size, the sample mean and standard error are calculated and an additional sample is generated by continued swapping. A Student’s two tailed t-test is performed to compare the sample means of the two samples. The system is considered to have equilibrated at the current temperature if the p -value of the t-test is greater than a user supplied value (default p  > 0.5). If equilibration has not been reached, more swaps are performed to generate an additional sample and the t-test is repeated with the two most recent samples. If equilibration has been reached, then the energy values are used to estimate the specific heat at the current temperature:

Where \(T\) is the current temperature in arbitrary units, \(E\) is the energy of the system and the angled brackets indicate the sample mean. Large drops in average ensemble energy due to oversized temperature steps can lead to the system becoming trapped in a local minimum. By deciding on a target energy drop that is unlikely to lead to a frustrated state, we can utilize the specific heat calculated at each temperature to estimate the temperature drop needed to achieve the target change in energy. This is done in the following manner:

Where \(\triangle {\left\langle E\right\rangle }_{{target}}\) is a user-controlled parameter and is kept at −2000 for this study. Decreasing the magnitude of the target energy drop, in situations where the system is becoming trapped in a frustrated state can lead to better results at the expense of longer simulation time. If \({T-T}_{{next}}\) is greater than 10, then the temperature decrease is limited to 10 units to prevent overcooling the system. The use of the specific heat in this manner results in smaller temperature steps at temperatures where the system is rapidly decreasing in energy, while allowing for larger steps when drops in temperature have a modest effect on the ensemble. The resulting schedule avoids system quenching while simultaneously minimizing unproductive swaps at temperatures that are either too high or too low for effective annealing. After decreasing the temperature, the annealing run will terminate if any of the following criteria are met: the temperature is less than 1, the product of the temperature and the last specific heat calculated is less than 200, or the ratio of unsuccessful swaps to successful swaps while collecting the last sample is greater than 10,000. The rational for the criteria are as follows: Given the standard energy parameterization, productive annealing is unlikely to happen at temperatures below 1; the product of specific heat and current temperature (at low temperatures) provides a crude estimate as to the amount of energy between the current ensemble and global minimum (i.e. the thermodynamic ensemble at T = 0, which should correspond to a single state) and approximately 200 energy units is negligible; and at this ratio of unsuccessful to successful swaps, the system is near a minimum and further sampling is inefficient. If termination is not achieved, a new sample size is defined using the following equation:

Where N is the number of residues in the sequence. This equation permits increases in sample size when sampling at temperatures with high specific heats, which is where the most productive swaps occur. This approach also permits scaling of sampling for larger proteins. The parameters of this equation were found empirically to be a good compromise between sufficient sampling and speed. Samples are then drawn at the new temperature to determine equilibration and the cycle is continued. Upon termination of the annealing protocol, a steepest-descent type search is performed to locally minimize the system energy and refine the assignments, discarding potentially bad assignments that were left over from the run. This is done by attempting to place (or swap) every spin system/peak at every possible location in the sequence/spin system group (including the cache, if allowed). Only spin system/peak swaps/placements that decrease the system energy are accepted. This is repeated 100 times.

This entire process of simulated annealing is independently repeated with a number of different random starting conditions. Here we have used 20. A consensus set of assignments is generated by calculating the frequency with which each spin system is placed at each amino acid location. The spin system assigned to each residue location in a majority of the runs (if any) is kept as the consensus spin system. A curated set of assignments is generated from this consensus analysis. The curation procedure is as follows: the consensus spin system at each residue was chosen as the tentative assignment for that particular residue. Residues without a consensus spin system (i.e. did not have the same spin system assigned to it greater than 50% of the time) were marked as unassigned. Tentatively assigned spin systems are then evaluated by the posterior probabilities as well as the number of connectivities defined as a matching resonance between adjacent spin systems. Assignments were accepted if they met any of the following criteria: 1) the assigned spin system has at least two connectivities with adjacent spin systems, 2) the assigned spin system has at least 1 connectivity with adjacent spin systems and a posterior probability at least three times higher than the quantity 1/ N , or 3) the assigned spin system has a posterior probability > 50%. Residues with tentative assignments that did not satisfy any of these criteria were then marked as unassigned.

The energy function used in the annealing routine is calculated as the sum of all the energies of the constituent spin systems ( E tot ) (Eq.  5 ). At any given step during the annealing protocol, spin systems are either tentatively assigned to a position in the sequence or placed in the cache. Cached spin systems are defined as having zero energy (i.e., \({E}_{m}=0\) ).

The energy of each spin system tentatively assigned to a specific place in the amino acid sequence is comprised of the adjacency energy ( \({E}_{m}^{{adj}}\) ) and the chemical shift energy ( \({E}_{m}^{{cs}}\) ):

The adjacency energy is related to the degree of correspondence between the averages of the Cα, Cβ and CO resonances of the current spin system and the averages of the Cα (i-1), Cβ (i-1) and CO (i-1) resonances of the spin system tentatively assigned to the subsequent position in the sequence. Each average resonance value in a spin system is calculated as the arithmetic mean of all resonance chemical shifts of the indicated type from all of the crosspeaks that contain that resonance currently in the spin system. \({E}_{m}^{{adj}}\) therefore, captures the process of evaluating spin system adjacency and is based on the number of potential connectivities between adjacent spin systems tentatively assigned to the sequence. For example, if spin system m is assigned to a residue position immediately prior to that of spin system l , then the adjacency energy is given by:

Where \({\delta }_{k\left(i\right)}^{m}\) is the chemical shift of resonance k ( i ) (either Cα( i ), Cβ( i ) or CO( i )) of spin system m and \({\delta }_{k\left(i-1\right)}^{l}\) is the chemical shift of resonance k ( i −1) (either Cα ( i −1), Cβ ( i −1) or CO( i −1)) of spin system l ). \({\sigma }_{k}\) is related to the estimated precision of the measured chemical shifts. The E adj is the sum of inverted Gaussians when c 0  < 0. Previous assignment algorithms have used functions of this form to good effect for estimating adjacency 26 . In the limit of well-matched connectivities, the sum of inverted Gaussian functions will have a minimum value of K(c 0  +  c 1 ) where K is the number of connectivities whereas, for poorly matched putative connectivities, the adjacent energy will tend to a limit of Kc 1 . Importantly, when an expected element of spin system m or l is missing, that contribution to the adjacency energy is set to zero. Similarly, if the subsequent position in the sequence is not currently assigned a spin system, then E adj  = 0. Here, c 0 and c 1 were set to −100 and +50, respectively. This results in an energy of −50 if the difference in chemical shifts is 0 and approaches +50 as the magnitude of the difference in chemical shifts approaches infinity. The value \({\sigma }_{k}\) is influenced by the properties of the NMR spectra from which the spin systems are built. For all runs described, \({\sigma }_{k}\) was chosen so that the function has an abscissa-intercept at a chemical shift difference of 0.2 ppm for all nuclei k .

The second term of the spin system energy, \({E}_{m}^{cs}\) , evaluates the degree of correspondence of the observed chemical shifts to those predicted. It is this term that makes use of the ability of Bayesian statistics to incorporate diverse degrees of knowledge of the local structure of the protein. These include relatively structureless information encoded in the simple empirical distributions of chemical shifts of the amino acids observed in proteins or specific chemical shift predictions based on the high-resolution structure of the protein being examined. For the former, we utilize the BMRB 52 database. For the latter, we use SHIFTX + predictions derived from either crystallographic structures available in the PDB 53 or structures predicted by AlphaFold2 42 . Or in the case of the IDPs V5dm and hIDD, we use calculated, sequence-specific random coil chemical shifts 36 , 37 as prediction. \({E}_{m}^{{cs}}\) is ultimately calculated from the Bayesian posterior probability of a proposed assignment given the observed chemical shifts:

The subscripts n and m index over all residue positions and the provided spin systems, respectively. The condition A n,m refers to where spin system m is correctly assigned to sequence position n . The condition B m refers to the observed chemical shifts of spin system m . Condition \({Q}_{{m}_{i}}\) refers to where resonance type set i of spin system m is the correct resonance type set. Because it is possible for the spin system to have ambiguous resonance crosspeak types, the probability calculation explicitly considers each resonance type set of a spin system within the context of each residue location. Thus, an assignment entails both the placement of a spin system at a residue location and choice of resonance type set.

The prior probability \(P\left({A}_{n,m}\cap {Q}_{{m}_{i}}\right)\) refers to the initial probability of the assignment of spin system m to residue n being correct and that the resonance type set i is correct for spin system m . If I m represents the number of possible resonance type sets of spin system m then the total number of combinations of residue type sets and residue locations for spin system m is the product I m N . However, given the constraints provided by the amino acid sequence of the protein, not all combinations of sequence location and residue type sets are possible. For example, a resonance type set with a defined amide proton would be impossible to place at a proline. To encode the impossibility of certain resonance type set/residue location combinations, these assignments are assigned a prior probability of 0. The remaining prior probability is then evenly distributed among the remaining locations:

Where C is the number of possible combinations of n and \({m}_{i}\) in the sequence.

The likelihood of assignment \(P\left({B}_{m}|{A}_{n,m}\cap {Q}_{{m}_{i}}\right)\) (i.e., the probability of observing the chemical shifts of spin system m given the assignment \({A}_{n,m}\cap {Q}_{{m}_{i}}\) ) is given by Eq.  10 & 11 :

Where \({\delta }_{{pred},r}^{n}\) is the predicted chemical shift of spin r at sequence position n ; \({\delta }_{{obs},r}^{{m}_{i}}\) is the observed chemical shift of resonance r of resonance type set i of spin system m and \({\sigma }_{r}^{n}\,\) is the standard error for the chemical shift prediction of resonance r at sequence position n . The resonances, represented by variable r , are the following: H, N, C α , C β , CO, C(i-1), C β (i-1), CO(i-1). In Eqs.  10 and 11 it is assumed that the random variable \({\delta }_{{pred},r}^{n}\,\) is normally distributed about \({\delta }_{{obs},r}^{m}\) with a standard deviation \({\sigma }_{r}^{n}\) and that the error in the chemical shift measurement is much less than the error in the prediction. With these assumptions, the random variable \({X}_{n,m}^{2}\) is a chi-square distribution with R degrees of freedom, where R is equal to the number of spins for which data are provided. The likelihood is then calculated as the value of the complementary cumulative distribution function (CCDF) of a chi square variable of R degrees of freedom at \({X}_{n,{m}_{i}}^{2}\) .

The likelihoods of all other residue position/resonance type sets being a valid assignment of spin system m are considered via the calculation of the marginalization, \({{{{{\rm{P}}}}}}\left({B}_{m}\right)\) :

Where the summation terms are over all possible i and n combinations. Using Bayes’ theorem as expressed above, the posterior probability (i.e., the probability of a particular assignment being correct given the observed data) can be calculated and then \({E}_{m}^{cs}\) determined via:

To avoid numerical instability in the evaluation of logarithms of numbers near zero and to prevent a dominating influence of inaccurate chemical shift predictions on the energy function, instances where \({E}_{m}^{{cs}}\)  >  \({E}_{{{{{\rm{max}}}}}}^{{cs}}\) are fixed at  \({E}_{\max }^{{cs}}\) . \({E}_{\max }^{{cs}}\) and \({E}_{\min }^{{cs}}\) are set to 100 and −50 respectively, for this study.

The values of the parameters for the energy function were chosen to safeguard against inaccurate chemical shift predictions based on the following reasoning: a perfectly matching connectivity between two resonances will contribute −50 to the final energy function. Given that a spin system with a posterior probability of 0 will contribute 100 to the final energy function, it would require two perfect connectivities or three or more reasonable connectivities for that spin system to be favorably assigned to that position vs being left in the cache. This was done to permit the algorithm to assign a spin system to a particular location in the event of highly inaccurate chemical shift prediction of its resonances so long as there are sufficient resonance connectivities to justify the assignment. Likewise, the \({E}_{{{{{\rm{min}}}}}}^{{cs}}\) parameter was chosen such that a posterior probability of 1.0 would result in an energy contribution of −50 and would be equal to the contribution of a single perfect connectivity. This would require two bad connectivities to overrule a high posterior probability and disfavor its assignment. The user has control over these \({E}_{{{{{\rm{max }}}}}}^{{cs}}\) and \({E}_{{{{{\rm{min}}}}}}^{{cs}}\) to adjust the relative influence of chemical shift energy on the course of the annealing run.

The source of predicted shifts for each resonance can be from any source, so long as the precision of the prediction algorithm is accurately estimated. For IDPs, sequence-specific random coil chemical shifts can be substituted (see below). In the absence of an acceptable structural model, the average and standard deviation of the BMRB distribution of chemical shifts for a given atom of a given residue type are used as the predicted shift and prediction error respectively. This is also used in regions where the sequence of interest contains a tag that is absent in the structural model used to predict chemical shifts as well as regions that are not resolved.

Generation of predicted chemical shifts

Predicted H, N, C α , C β , and CO chemical shifts were generated via SHIFTX + using PDB entries and/or AlphaFold2 predicted structures (Table  1 ). Chemical shift prediction errors for H, N, C α , Cβ, and CO were taken from the reported root mean squared deviations (RMSD) of SHIFTX + predictions: 0.45, 2.4, 0.8, 0.95, and 0.9 ppm, respectively. Sequence regions present in the NMR sample but not resolved or present in the provided structure (e.g., loops or expression tags) were given predicted values from their corresponding average values in the BMRB. For the runs that were performed with SPARTA+ 38 predicted shifts, the reported errors for each individual prediction were used. For the IDPs V5dm and hIDD, predicted shifts were provided using predicted sequence-specific random coil chemical shifts according to the method in 36 , 37 Prediction errors were taken from the reported RMSD of the prediction method and were 0.16, 1.0, 0.42, 0.37, and 0.43 ppm for H, N, C α , Cβ, and CO resonances, respectively. Prediction errors associated with BMRB-derived values were taken as the standard deviation of the corresponding resonance distribution for the particular amino acid type in the BMRB.

Comparison of resonance assignment algorithms

BARASA was compared to three triple resonance assignment algorithms that are highly utilized by the NMR community. All algorithms were provided the same crosspeak lists as BARASA, albeit in different file formats. As FLYA can utilize predicted chemical shift data, the algorithm was provided with the same predicted shifts and associated errors as BARASA. Assignment results were taken from the strong assignments generated from 20 runs. The assignment algorithm I-PINE was run using the I-PINE server. AutoAssign was run on NMRbox using the default parameters. For each algorithm, proposed assignments were compared to reference assignments. At each residue position, the proposed assignment was determined to have either matched, mismatched, or been missing when compared to the reference assignments (see Results and Discussion). The same reference assignments were used for the evaluation of all algorithms.

Generation of simulated data sets

To assess the performance of BARASA on datasets of lower quality, the MBP crosspeak lists were processed to randomly retain spin systems and/or individual crosspeaks at specific probabilities depending on cross peak type. For each data quality condition 10 different independent data sets were randomly generated and BARASA was run on each of them. The results from each of these executions of BARASA were generated from the curation of 20 independent annealing runs. The performance of BARASA on data with artifactual peaks was evaluated using a depleted data set and adding randomly generated cross peaks such that 20% of all C α , Cβ, and CO cross-peaks were artifacts. Each artifact peak was generated in the following manner. A random residue from the protein sequence, containing an amide group and desired peak type (C α , Cβ, CO, C(i-1), Cβ(i-1), or CO(i-1)) was chosen. Chemical shifts for each dimension of the cross peak were randomly generated from a Gaussian distribution with a mean and standard deviation equal to the mean and standard deviation value of that atom of that residue type in the BMRB. All artifact peaks were given the maximum peak intensity of their peak lists to ensure they would not be cached during the run.

BARASA was implemented in C++ and can be built on all major computing platforms (MacOS, Linux, and Windows). BARASA possesses a command line interface, as well as a GUI implemented using the wxWidgets library and utilizes the Boost libraries. For this study, the simulations were run on 2019 6-core MacBook Pro (Intel processor) with up to 12 annealing runs running in parallel.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

Resonance assignments for IGPS and IL-1Ra have been deposited to the BMRB under accession numbers 51347 , and 51352 , respectively. Cross peak lists and protein sequences for IL-1β, IL-1Ra, IGPS, MBP, Cy1, ecTS, v5domain, and huIDR, which form the foundation of the analysis here, are included in the Source Data file. BMRB statistics used to test BARASA are also included in the Source Data file. Assignments referenced in this study from the BMRB can be accessed via the following accession codes: 434 , 4354 , 19082 , 18927 , and 28135 . The experimental structures referenced in this study from the PDB can be accessed via the following accession codes: 9ILB , 2IRT , 1IGS , 1DMB , 7RY6 , and 1AOB . Supplementary Information is available and consists of fifteen tables and one figure listing resonance assignments made by BARASA, summary statistics of BARASA’s performance using SPARTA+ predicated chemical shifts, AlphaFold2 structural models or in the presence of artifact peaks.  Source data are provided with this paper.

Code availability

BARASA will be made generally available for non-commercial use through, preferably, NMRbox 50 [ https://nmrbox.nmrhub.org/ ] or, less preferred, by contacting [email protected] for Linux or OSX compatible executables.

Ikeya, T. et al. Solution NMR views of dynamical ordering of biomacromolecules. Biochem. Biophys. Acta 1862 , 287–306 (2018).

Article   CAS   Google Scholar  

Shimada, I., Ueda, T., Kofuku, Y., Eddy, M. T. & Wuthrich, K. GPCR drug discovery: integrating solution NMR data with crystal and cryo-EM structures. Nat. Rev. Drug Disc. 18 , 59–82 (2019).

Alderson, T. R. & Kay, L. E. NMR spectroscopy captures the essential role of dynamics in regulating biomolecular function. Cell 184 , 577–595 (2021).

Article   CAS   PubMed   Google Scholar  

Camacho-Zarco, A. R. et al. NMR provides unique insight into the functional dynamics and interactions of intrinsically disordered proteins. Chem. Rev. 122 , 9331–9356 (2022).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Wüthrich, K. Sequential individual resonance assignments in the 1H-NMR spectra of polypeptides and proteins. Biopolymers 22 , 131–138 (1983).

Article   PubMed   Google Scholar  

Wüthrich, K., Wider, G., Wagner, G. & Braun, W. Sequential resonance assignments as a basis for determination of spatial protein structures by high resolution proton nuclear magnetic resonance. J. Mol. Biol. 155 , 311–319 (1982).

Billeter, M., Braun, W. & Wüthrich, K. Sequential resonance assignments in protein 1H nuclear magnetic resonance spectra. Computation of sterically allowed proton-proton distances and statistical analysis of proton-proton distances in single crystal protein conformations. J. Mol. Biol. 155 , 321–346 (1982).

Englander, S. W. & Wand, A. J. Main chain directed strategy for the assignment of 1H NMR spectra of proteins. Biochemistry 26 , 5953–5958 (1985).

Article   Google Scholar  

Di Stefano, D. L. & Wand, A. J. Two-dimensional 1H NMR study of human ubiquitin: a main chain directed assignment and structure analysis. Biochemistry 26 , 7272–7281 (1987).

Wand, A. J. & Nelson, S. J. Refinement of the main chain directed assignment strategy for the analysis of 1H NMR spectra of proteins. Biophys. J. 59 , 1101–1112 (1991).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Nelson, S. J., Schneider, D. M. & Wand, A. J. Implementation of the main chain directed assignment strategy. Computer assisted approach. Biophys. J. 59 , 1113–1122 (1991).

Ikura, M., Kay, L. E. & Bax, A. A novel approach for sequential assignment of 1H, 13C, and 15N spectra of larger proteins: Heteronuclear triple-resonance three-dimensional NMR spectroscopy. Application to calmodulin. Biochemistry 29 , 4659–4667 (1990).

Montelione, G. T. & Wagner, G. Conformation-independent sequential NMR connections in polypeptides by H1-C13-N15 triple-resonance experiments. J. Magn. Reson. 87 , 183–188 (1990).

ADS   CAS   Google Scholar  

Driscoll, P. C., Marius Clore, G., Marion, D., Wingfield, P. T. & Gronenborn, A. M. Complete resonance assignment for the polypeptide backbone of interleukin 1ß using three-dimensional heteronuclear NMR spectroscopy. Biochemistry 29 , 3542–3556 (1990).

Sattler, M., Schleucher, J. & Griesinger, C. Heteronuclear multidimensional NMR experiments for the structure determination of proteins in solution employing pulsed field gradients. Prog. NMR Spectr. 34 , 93–158 (1999).

Frueh, D. P. Practical aspects of NMR signal assignment in larger and challenging proteins. Prog. NMR Spectr. 78 , 47–75 (2014).

Gardner, K. H. & Kay, L. E. The use of 2H, 13C, 15N multidimensional NMR to study the structure and dynamics of proteins. Annu. Rev. Biophys. Biomol. Struct. 27 , 357–406 (1998).

Palmer, A. G. Chemical exchange in biomacromolecules: past, present, and future. J. Magn. Reson. 241 , 3–17 (2014).

Tjandra, N. & Bax, A. Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. Science 278 , 1111–1114 (1997).

Article   ADS   CAS   PubMed   Google Scholar  

Salmon, L. & Blackledge, M. Investigating protein conformational energy landscapes and atomic resolution dynamics from NMR dipolar couplings: A review. Rep. Prog. Phys. 78 , 126601–126630 (2015).

Article   ADS   PubMed   Google Scholar  

Clore, G. M. & Gronenborn, A. M. Applications of three- and four-dimensional heteronuclear NMR spectroscopy to protein structure determination. Prog. NMR Spectr. 23 , 43–92 (1991).

Zimmerman, D. E. et al. Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol. 269 , 592–610 (1997).

Moseley, H. N. B., Monleon, D. & Montelione, G. T. Automatic determination of protein backbone resonance assignments from triple resonance nuclear magnetic resonance data. Methods Enzymol. 339 , 91–108 (2001).

Baran, M. C., Huang, Y. J., Moseley, H. N. B. & Montelione, G. T. Automated analysis of protein NMR assignments and structures. Chem. Rev. 104 , 3541–3555 (2004).

Pervushin, K., Riek, R., Wider, G. & Wüthrich, K. Attenuated T2 relaxation by mutual cancellation of dipole-dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. Proc. Nat. Acad. Sci. USA 94 , 12366–12371 (1997).

Hitchens, T. K., Lukin, J. A., Zhan, Y., McCallum, S. A. & Rule, G. S. MONTE: An automated Monte Carlo based approach to nuclear magnetic resonance assignment of proteins. J. Biomol. Nmr. 25 , 1–9 (2003).

Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Science 220 , 671–680 (1983).

Article   ADS   MathSciNet   CAS   PubMed   MATH   Google Scholar  

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21 , 1087–1092 (1953).

Article   ADS   CAS   MATH   Google Scholar  

Clubb, R. T., Thanabal, V. & Wagner, G. A constant-time three-dimensional triple-resonance pulse scheme to correlate intraresidue 1HN, 15N, and 13C′ chemical shifts in 15N13C-labelled proteins. J. Man. Reson. 97 , 213–217 (1992).

Article   ADS   CAS   Google Scholar  

Grzesiek, S. & Bax, A. Improved 3D triple-resonance NMR techniques applied to a 31 kDa protein. J. Magn. Reson. 96 , 432–440 (1992).

Bax, A. & Ikura, M. An efficient 3D NMR technique for correlating the proton and 15N backbone amide resonances with the α-carbon of the preceding residue in uniformly15N/13C enriched proteins. J. Biomol. Nmr. 1 , 99–104 (1991).

Wittekind, M. & Mueller, L. HNCACB, a high-sensitivity 3D NMR experiment to correlate amide-proton and nitrogen resonances with the alpha- and beta-carbon resonances in proteins. J. Magn. Reson. Ser. B 101 , 201–205 (1993).

Grzesiek, S. & Bax, A. Correlating backbone amide and side chain resonances in larger proteins by multiple relayed triple resonance NMR. J. Am. Chem. Soc. 114 , 6291–6293 (1992).

Schmidt, E. & Güntert, P. A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134 , 12817–12829 (2012).

Han, B., Liu, Y., Ginzinger, S. W. & Wishart, D. S. SHIFTX2: significantly improved protein chemical shift prediction. J. Biomol. Nmr. 50 , 43–57 (2011).

Kjaergaard, M. & Poulsen, F. M. Sequence correction of random coil chemical shifts: Correlation between neighbor correction factors and changes in the Ramachandran distribution. J. Biomol. Nmr. 50 , 157–165 (2011).

Kjaergaard, M., Brander, S. & Poulsen, F. M. Random coil chemical shift for intrinsically disordered proteins: Effects of temperature and pH. J. Biomol. Nmr. 49 , 139–149 (2011).

Shen, Y. & Bax, A. SPARTA plus: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J. Biomol. Nmr. 48 , 13–22 (2010).

Hyberts, S. G., Milbradt, A. G., Wagner, A. B., Arthanari, H. & Wagner, G. Application of iterative soft thresholding for fast reconstruction of NMR data non-uniformly sampled with multidimensional Poisson Gap scheduling. J. Biomol. NMR 52 , 315–327 (2012).

Lee, W. et al. I-PINE web server: an integrative probabilistic NMR assignment system for proteins. J. Biomol. Nmr. 73 , 213–222 (2019).

Mishra, S. H. et al. Global protein dynamics as communication sensors in peptide synthetase domains. Sci. Adv. 8 , eabn6549 (2022).

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 , 583–589 (2021).

Nerli, S., De Paula, V. S., McShan, A. C. & Sgourakis, N. G. Backbone-independent NMR resonance assignments of methyl probes in large proteins. Nat. Commun. 12 , 691–691 (2021).

Xu, Y. & Matthews, S. MAP-XSII: an improved program for the automatic assignment of methyl resonances in large proteins. J. Biomol. NMR 55 , 179–187 (2013).

Chao, F. A. et al. FLAMEnGO 2.0: an enhanced fuzzy logic algorithm for structure-based assignment of methyl group resonances. J. Magn. Reson 245 , 17–23 (2014).

Monneau, Y. R. et al. Automatic methyl assignment in large proteins by the MAGIC algorithm. J. Biomol. NMR 69 , 215–227 (2017).

Pritisanac, I., Wurz, J. M., Alderson, T. R. & Guntert, P. Automatic structure-based NMR methyl resonance assignment in large proteins. Nat. Commun. 10 , 4922 (2019).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Pritisanac, I. et al. Automatic assignment of methyl-NMR spectra of supramolecular machines using graph theory. J. Am. Chem. Soc. 139 , 9523–9533 (2017).

Delaglio, F. et al. NMRPipe: A multidimensional spectral processing system based on UNIX pipes. J. Biomol. Nmr. 6 , 277–293 (1995).

Maciejewski, M. W. et al. NMRbox: A resource for biomolecular NMR computation. Biophys. J. 112 , 1529–1534 (2017).

Lee, W., Tonelli, M. & Markley, J. L. NMRFAM-SPARKY: Enhanced software for biomolecular NMR spectroscopy. Bioinformatics 31 , 1325–1327 (2015).

Ulrich, E. L. et al. BioMagResBank. Nucleic Acids Res. 36 , D402–D408 (2008).

Berman, H. M. et al. The protein data bank. Nucleic Acid Res. 28 , 235–242 (2000).

Gardner, K. H. Solution NMR studies of a 42 KDa Escherichia coli maltose binding protein/β-cyclodextrin complex: Chemical shift assignments and analysis. J. Am. Chem. Soc. 120 , 11738–11748 (1998).

Sapienza, P. J. & Lee, A. L. Backbone and ILV methyl resonance assignments of E. coli thymidylate synthase bound to cofactor and a nucleotide analogue. Biomol. NMR Assign. 8 , 195–199 (2014).

Yang, Y. & Igumenova, T. I. The C-Terminal V5 domain of protein kinase Cα Is intrinsically disordered, with propensity to associate with a membrane mimetic. PLoS ONE 8 , 65699–65699 (2013).

Article   ADS   Google Scholar  

Camacho-Zarco, A. R. et al. Molecular basis of host-adaptation interactions between influenza virus polymerase PB2 subunit and ANP32A. Nat. Commun. 11 , 3656 (2020).

Download references

Acknowledgements

We are grateful to Dominque Frueh and colleagues for providing crosspeak lists for Cy1 and for fruitful discussions and to Andrew Lee, Martin Blackledge and Tatyana Igumenova for providing crosspeak and/or spin system lists for ecTS, hIDD and V5dm, respectively. We also thank the Texas A&M High Performance Research Computing Center for access to computational resources for the prediction of the Cy1 structure and to NMRbox for access to NMRPipe and other data processing packages. This work was supported by grants from the Mathers Foundation (MF-1809-00155), the National Institutes of Health (GM129076) and Texas A&M University to A.J.W. and by a postdoctoral fellowship from the Gulf Coast Consortium provided by the Cancer Prevention and Research Institute of Texas (RP210043) to A.C.B.

Author information

Authors and affiliations.

Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX, 77843, USA

Anthony C. Bishop, Glorisé Torres-Montalvo, Kyle Mimun & A. Joshua Wand

Graduate Group in Biochemistry & Molecular Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19014, USA

Sravya Kotaru & A. Joshua Wand

Department of Chemistry, Texas A&M University, College Station, TX, 77843, USA

A. Joshua Wand

Department of Molecular & Cellular Medicine, Texas A&M University, College Station, TX, 77843, USA

You can also search for this author in PubMed   Google Scholar

Contributions

A.C.B. and A.J.W. conceived the algorithm. A.C.B. wrote the computer code to implement the algorithm. A.C.B., S.K., G.T.-M., and A.J.W. tested BARASA. A.C.B., S.K., and G.T.-M. prepared isotopically enriched protein, collected, processed and analyzed NMR data for IL-1β, IGPS, and IL-1Ra, respectively. S.K. and G.T.-M. manually assigned IGPS and IL-1Ra, respectively. K.M prepared isotopically enriched MBP and analyzed MBP NMR data. A.C.B. collected and processed MBP NMR data. A.C.B and G.T.-M ran the test cases through FLYA, AutoAssign and I-PINE. A.C.B. and A.J.W. wrote the manuscript with input from all authors.

Corresponding author

Correspondence to A. Joshua Wand .

Ethics declarations

Competing interests.

The authors declare the following competing interests. Texas A&M AgriLife has secured federal copyright of BARASA and will market the program. As inventors, A.C.B. and A.J.W. will receive a share of royalties generated by commercial use. There are no other competing interests.

Peer review

Peer review information.

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Bishop, A.C., Torres-Montalvo, G., Kotaru, S. et al. Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing. Nat Commun 14 , 1556 (2023). https://doi.org/10.1038/s41467-023-37219-z

Download citation

Received : 13 April 2022

Accepted : 06 March 2023

Published : 21 March 2023

DOI : https://doi.org/10.1038/s41467-023-37219-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Backbone and methyl side-chain resonance assignments of the single chain fab fragment of trastuzumab.

  • Donald Gagné
  • James M. Aramini

Biomolecular NMR Assignments (2024)

Breaking boundaries: TINTO in POKY for computer vision-based NMR walking strategies

  • Andrea Estefania Lopez Giraldo
  • Zowie Werner
  • Woonghee Lee

Journal of Biomolecular NMR (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

nmr spectra assignment

COMMENTS

  1. Assignment of 1H-NMR spectra

    H-NMR spectra. On this page we will deal with how to interpret an NMR spectrum. The meaning of assignment in the title is to assign each peak to a proton in the molecule under investigation. The examples here are of 1D proton assignments. For more complex examples, see the 2D assignments of 12,14-di t butylbenzo [g]chrysene and cholesteryl acetate.

  2. 6.6: ¹H NMR Spectra and Interpretation (Part I)

    Fig. 6.6d The 1H NMR spectrum of methyl acetate with signals assignment Chemical Shift of Protons Near π Electrons — Anisotropy Effect The chemical shift values of aromatic protons and vinylic protons (those directly bonded to an alkene carbon) resonate much further downfield (higher frequency, higher chemical shift) than can be accounted ...

  3. NMR: Structural Assignment

    This action is not available. Assignment of structures is a central problem which NMR is well suit to address. Explains how both 13C NMR spectra and low and high resolution proton NMR spectra can be used to help to work out the ….

  4. 5.10: Interpreting Proton NMR Spectra

    Example \(\PageIndex{2}\) Using the chemical formula and 1 H NMR spectrum, determine the structure of your unknown molecule.. Chemical Formula: C 5 H 9 ClO. Unknown molecule 1 H NMR spectrum: The ratio of protons is 2:2:2:3.. Solution. Unknown molecule: First, if the molecular formula is known, then start by calculating the degree of unsaturation (DU), which will tell how many double bond ...

  5. NMR Spectroscopy Principles, Interpreting an NMR Spectrum and Common

    The assignment of the NMR spectra is usually performed with the help of NMR charts or diagrams that facilitate the identification of the NMR signals. Hydrogens or carbons that are highly shielded, such as the ones of methyl groups, have low chemical shift values. However, hydrogens attached to very electronegative groups (e.g., carboxylic acids ...

  6. NMRium

    NMR spectra processing for everybody. Unrestrained access to first-class online software for NMR spectra processing. It is free and you can get started right away from your browser. Start now Demo. Quick Tour. " NMRium stands as the most efficient tool for data inspection across every stage of our analytical pipeline. Professor Julien Wist.

  7. NMR-Challenge.com: An Interactive Website with Exercises in Solving

    Each assignment contains NMR spectra of an unknown compound, with the task for the students being to determine its structure. Basic assignments contain only one-dimensional 1 H and 13 C NMR spectra; advanced assignments also offer two-dimensional correlation spectra. The web application is enriched with an interactive chemical structure drawing ...

  8. Rapid protein assignments and structures from raw NMR spectra with the

    Using only NMR spectra and the protein sequence as input, our machine learning-based method, ARTINA, delivers signal positions, resonance assignments, and structures strictly without human ...

  9. Protein NMR Resonance Assignment

    This facilitated NMR to be independent of X-ray crystallography, and the solution structures of proteins were determined by NMR using the assignment of proton signals and proton-proton distance information. The limited resolution of two-dimensional (2D) 1 H NMR spectra, however, restricted the molecular weights of target proteins to less than 8 ...

  10. Solid-state NMR spectroscopy

    Specifically, this consists of sample preparation, experimental set-up, acquisition of NMR spectra, spectral assignment, and the choice of experiments to measure structural and dynamic parameters.

  11. Assignment of Protein NMR Spectra Using Heteronuclear NMR ...

    1 Introduction. The assignment of resonances in the complex nuclear magnetic resonance (NMR) spectrum of a protein is the first step in any NMR study of protein structure, function or dynamics. Before 1980, the assignment was achieved using one-dimensional (1D) NMR and was based, to a large extent, on the assumption that the structure of the ...

  12. Assignment NMR exercises

    Simulate and predict NMR spectra directly from your webbrowser using standard HTML5. You can also simulate 13C, 1H as well as 2D spectra like COSY, HSQC, HMBC. Second order effect like AB, ABX, AA'XX' can be simulated as well.

  13. Automated NMR resonance assignments and structure determination using a

    A novel approach for sequential assignment of proton, carbon-13, and nitrogen-15 spectra of larger proteins: heteronuclear triple-resonance three-dimensional NMR spectroscopy. Application to ...

  14. NMR

    NMR - Interpretation is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by LibreTexts. NMR interpretation plays a pivotal role in molecular identifications. As interpreting NMR spectra, the structure of an unknown compound, as well as known structures, can be assigned by several ….

  15. NMRtist

    NMRtist is a cloud computing service for the fully automated analysis of protein NMR spectra (e.g. peak picking, chemical shift assignment, structure determination) using deep learning-based approaches. Each project created in NMRtist receives 30 GB of private storage, which can be filled by experimental data and analyzed using the available applications.

  16. Assigning NMR spectra of RNA, peptides and small organic molecules

    NMR assignment typically involves analysis of peaks across multiple NMR spectra. Chemical shifts of peaks are measured before being assigned to atoms using a variety of methods. These approaches quickly become complicated by overlap, ambiguity, and the complexity of correlating assignments among multiple spectra.

  17. Time-optimized protein NMR assignment with an integrative ...

    Assigning NMR spectra of large proteins with more than 200 residues presents a fundamental challenge in the field of NMR spectroscopy, due to several factors affecting both measurement process (line broadening, lower sensitivity, and larger conformational heterogeneity) and data interpretation (signal overlap and shift assignment ambiguity).

  18. Computational Protocol for the Spectral Assignment of NMR ...

    In this way, this protocol accommodates the validation of previous experimental assignments and can serve to complement new NMR measurements. Solid-state nuclear magnetic resonance spectroscopy is routinely used in the field of covalent organic frameworks to elucidate or confirm the structure of the synthesized samples and to understand dynamic ...

  19. Automatic Assignment of 1H NMR Spectra

    The assignment of 1H NMR spectra of small molecules is an everyday task within organic chemistry, which is usually tackled in a manual way. The chemist typically identifies the most relevant regions (aka multiplets) in the spectrum and assigns them to atoms in the putative molecular structure. A partial assignment is usually attempted and the ...

  20. A guide to small-molecule structure assignment through computation of

    For a typical structure assignment of a small organic molecule (e.g., fewer than ∼10 non-H atoms or up to ∼180 a.m.u. and ∼20 conformers), this protocol can be completed in ∼2 h of active ...

  21. Practical aspects of NMR signal assignment in larger and challenging

    Many NMR assignment software packages feature automated routines for performing this so-called sequence alignment and provide a ranking of polypeptide fragments that match the spin systems. For larger proteins, many factors complicate this procedure. ... Overall, non-uniform sampling of NMR spectra has now gained in popularity, and features for ...

  22. 6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

    This page titled 6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Serge L. Smirnov and James McCarty. In the previous Chapter we described 2D NMR spectroscopy, which offers significantly greater spectral resolution than basic 1D spectra.

  23. Complete 1H and 13C NMR spectral assignment of d-glucofuranose

    Here, complete 1 H and 13 C NMR spectral analysis of α- and β- d -glucofuranose was performed, including signal assignment, chemical shifts, and coupling constants. Selective and non-selective 1D and 2D NMR experiments were used for the analysis, complemented by spin simulations and iterative spectral analysis.

  24. Robust automated backbone triple resonance NMR assignments of ...

    A novel approach for sequential assignment of 1H, 13C, and 15N spectra of larger proteins: Heteronuclear triple-resonance three-dimensional NMR spectroscopy. Application to calmodulin ...

  25. MS and NMR Analysis of Isotopically Labeled Chloramination Disinfection

    FT-ICR MS and NMR analysis of an isotopically labeled complex mixture of water disinfection byproducts formed by chloramine disinfection of model phenolic acids is described. A new molecular formula assignment procedure using the CoreMS Python library able to assign isotopically enriched formulas is proposed. Statistical analysis of the assigned formulas showed that the number of compounds ...

  26. Stereochemical assignment of four monoterpene ...

    @article{Huong2024StereochemicalAO, title={Stereochemical assignment of four monoterpene glucoside derivatives from Turpinia montana Kurz by NMR study combined with CD spectroscopy.}, author={Le Thanh Huong and Tran Tuan Anh and Nguyen Thi Minh Hang and Vu Mai Thao and Nguyen Xuan Nhiem and Nguyen Hai Dang}, journal={Magnetic resonance in ...