The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies

  • Open access
  • Published: 27 March 2017
  • Volume 136 , pages 665–677, ( 2017 )

Cite this article

You have full access to this open access article

research paper about genetic mutation

  • Peter D. Stenson 1 ,
  • Matthew Mort 1 ,
  • Edward V. Ball 1 ,
  • Katy Evans 1 ,
  • Matthew Hayden 1 ,
  • Sally Heywood 1 ,
  • Michelle Hussain 1 ,
  • Andrew D. Phillips 1 &
  • David N. Cooper 1  

26k Accesses

919 Citations

12 Altmetric

Explore all metrics

The Human Gene Mutation Database (HGMD ® ) constitutes a comprehensive collection of published germline mutations in nuclear genes that underlie, or are closely associated with human inherited disease. At the time of writing (March 2017), the database contained in excess of 203,000 different gene lesions identified in over 8000 genes manually curated from over 2600 journals. With new mutation entries currently accumulating at a rate exceeding 17,000 per annum, HGMD represents de facto the central unified gene/disease-oriented repository of heritable mutations causing human genetic disease used worldwide by researchers, clinicians, diagnostic laboratories and genetic counsellors, and is an essential tool for the annotation of next-generation sequencing data. The public version of HGMD ( http://www.hgmd.org ) is freely available to registered users from academic institutions and non-profit organisations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via QIAGEN Inc.

Similar content being viewed by others

research paper about genetic mutation

Bioinformatics: new tools and applications in life science and personalized medicine

research paper about genetic mutation

The impact of consanguinity on human health and disease with an emphasis on rare diseases

research paper about genetic mutation

Next-Generation Sequencing: Advantages, Disadvantages, and Future

Avoid common mistakes on your manuscript.

Introduction

The Human Gene Mutation Database (HGMD ® ) represents an attempt to collate all known gene lesions underlying human inherited disease together with disease-associated/functional polymorphisms published in the peer-reviewed literature. The mutation data catalogued by HGMD (summarised by mutation type) are shown in Table  1 .

HGMD has never sought to include either somatic or mitochondrial mutations, which are well covered by COSMIC (Forbes et al. 2015 ) and MitoMap (Lott et al. 2013 ), respectively. Nor does HGMD attempt to provide comprehensive coverage of pharmacological variants (except for those variants where evidence supporting a functional impairment has been provided); PharmGKB ( https://www.pharmgkb.org/ ; Thorn et al. 2013 ) is a more comprehensive resource for these data. Finally, HGMD is not intended to be a general genetic variation database; users interested in such variants should visit dbSNP ( http://www.ncbi.nlm.nih.gov/SNP/ ; Sherry et al. 2001 ), the NHLBI Exome Variant Server ( http://evs.gs.washington.edu/EVS/ ) or the Exome Aggregation Consortium (ExAC; http://exac.broadinstitute.org/ ; Lek et al. 2016 ).

HGMD was originally established in 1996 for the scientific study of mutational mechanisms in human genes believed to cause inherited disease (Cooper et al. 2010 ; Stenson et al. 2014 ). However, over the last 20 years it has acquired a much broader utility as the central unified repository for disease-related functional genetic variation in the germline. It is now routinely accessed and utilised by next-generation sequencing (NGS) project researchers, human molecular geneticists, molecular biologists, clinicians and genetic counsellors as well as by those specialising in biopharmaceuticals, bioinformatics and personalised genomics.

The public version of HGMD ( http://www.hgmd.org ) is freely available to registered users from academic institutions/non-profit organisations. This version is, however, maintained in a basic form that is only updated twice annually, is permanently a minimum of 3.5 years out of date, and does not contain any of the additional annotations or extra features present in HGMD Professional (see below). The Professional version is available to both commercial and academic/non-profit users via subscription from QIAGEN ( https://www.qiagenbioinformatics.com/ ) as either an online or a locally installed/downloadable version that is updated quarterly and includes a variety of additional annotations and extra features, such as GRCh38/hg38 and GRCh37/hg19 genomic chromosomal coordinates, HGVS nomenclature, Variant Call Format (VCF), additional literature reports, advanced search features, conservation data and functional predictions.

Source of mutation data

All HGMD mutation data have been obtained from the scientific literature and are manually curated on an ongoing basis. Identification of relevant literature reports is carried out via a combination of manual journal screening and automated text mining. The database currently contains >203,000 mutation entries obtained from over 57,000 primary literature reports (supported by 29,000 additional literature reports), which were published in more than 2600 different journals. The number of articles screened (both for novel mutations and additional annotations) appears to have reached a plateau, (Fig.  1 ); however, the number of mutations reported (per reference) continues to increase steadily. It is likely that the continuing development of high-throughput NGS methods will lead to an increased rate of deposition of disease-associated genetic variants in the published literature.

Annual numbers of cited literature references added to HGMD. *2017 figures not yet complete

Classes of variant listed in HGMD

There are six different classes of variant listed in HGMD (Fig.  2 ). Disease-causing mutations (DM) are entered into HGMD where the authors of the corresponding report(s) have established that the reported mutation(s) are involved (or very likely to be involved) in conferring the associated clinical phenotype upon the individuals concerned. The DM classification may, however, also appear with a question mark (DM?), denoting a probable/possible pathological mutation, reported as likely to be disease causing in the corresponding report, but where (i) the author has indicated that there may be some degree of doubt or uncertainty; (ii) the HGMD curators believe greater interpretational caution is warranted, or (iii) subsequent evidence has appeared in the literature which has called the initial putatively deleterious nature of the variant into question (e.g. a negative functional, case–control or population-scale sequencing study). The DM and DM? variant classes may include mutations that are believed to contribute to disease susceptibility in a multi-factorial manner (e.g. autism or schizophrenia), exhibit complex polygenic inheritance or possess an environmental trigger component to their pathogenicity. It can be seen from Fig.  2 that the proportion of reported mutations belonging to the DM? category has steadily increased over the last decade; we speculate that this is because authors, journal editors and referees (also database curators!) alike have become much more cautious than they used to be in ascribing pathogenicity to the putatively disease-associated variants that have been identified. This increase in caution appears to closely coincide with the advent of NGS and the consequent deluge of genetic variants that must be filtered and prioritised.

Annual mutation totals subdivided by variant class. *2017 figures not yet complete

Three categories of polymorphism are included in the database (combined into ‘polymorphisms’ in Fig.  2 ). Disease-associated polymorphisms (DP) are entered into HGMD where there is evidence for a significant association with a disease/clinical phenotype along with additional evidence that the polymorphism is itself likely to be of functional relevance (e.g. as a consequence of genic location, evolutionary conservation, transcription factor binding potential, etc.), although there may be no direct evidence (e.g. from an expression study) for a functional effect. The functional polymorphisms (FP) class includes those sequence changes for which a direct functional effect has been demonstrated (e.g. by means of an in vitro reporter gene assay or alternatively by protein structure, function or expression studies), but with no disease association reported as yet. Disease-associated polymorphisms with supporting functional evidence (DFP) must meet both of the above criteria in that the polymorphism should not only have been reported to be significantly associated with disease, but should also display direct evidence of being of functional relevance. The polymorphism data present in HGMD should be viewed with a degree of caution owing to (i) the possibility that an observed disease association may be simply due to a linkage disequilibrium effect and (ii) the fact that in vitro studies are not invariably accurate indicators of in vivo functionality (Cirulli and Goldstein 2007 ; Dimas et al. 2009 ). Retired records (R) are variants that have been removed from HGMD if found to have been erroneously included ab initio, or if the variant has been subject to retraction/correction in the literature resulting in the record becoming obsolete, merged or otherwise invalid.

The various HGMD variant classes described above should not be cross-correlated with the ‘benign to pathogenic’ 5-point classification system adopted by the ACMG consortium (Green et al. 2013 ). Although, by their very nature, there will be some overlap, these two classification systems are not directly interchangeable. The primary purpose of the ACMG guidelines appears to be to minimise false positives in a clinical setting, whereas HGMD aims to include mutation data based on the cogency and credibility of the associated literature, with a curation policy that opts to minimise false negatives by being broadly inclusive, whilst attempting to highlight potential false positives to users (e.g. via an allele frequency flag). Attempting to cross-correlate the two classification systems (e.g. by automatically considering HGMD DM to be equivalent to ACMG class 5) is likely to be potentially misleading at best, and may well lead to users drawing incorrect or inappropriate conclusions (Pinard et al. 2016 ).

Polymorphic copy number variations (CNVs) represent an important subset of potentially functional disease-associated variation (Mikhail 2014 ; Usher and McCarroll, 2015 ). While HGMD does not wish to replicate the excellent curatorial work of other resources (e.g. the Database of Genomic Variants http://dgv.tcag.ca/dgv/app/home , DECIPHER http://decipher.sanger.ac.uk/ and Copy Number Variation in Disease http://202.97.205.78/CNVD/ ), we do include such variants where they fulfil certain criteria. HGMD will include such variants if they have been shown to be of functional significance, associated with disease, and involve a single characterised gene or small group of genes that have been directly implicated in the disease association. Such variants would then be entered into the database under one of the above-mentioned polymorphism categories, depending upon the supporting evidence provided by the authors of the article in question.

The HGMD curators have adopted a policy of continual reassessment of the curated content within the database. If and when newly published information relevant to a specific mutation entry becomes available (e.g. additional case reports or alternate clinical or laboratory phenotypes, population frequency data or functional studies), the mutation entry may be revised or re-classified. Where new information becomes available which suggests that a given disease-causing mutation (DM) is likely to be of questionable pathological relevance or even a neutral polymorphism (on the basis of additional case reports, genome/population screening studies, negative case–control studies, etc.), it may be flagged with a question mark (DM?), re-categorised under one of the categories of polymorphism, or retired from the database altogether (R) if it turns out to have been erroneously included ab initio. The HGMD curators re-categorised or retired over 800 variants in 2015 with almost 26,000 existing records having at least one relevant additional reference added in the same year. Users of HGMD may utilise a feedback/comments function in order to inform the HGMD curators of relevant new or missing information, or to request the correction, recategorisation or removal of a listed variant.

Zygosity information (i.e. heterozygous, homozygous or compound heterozygous) for individual mutations in HGMD has not been recorded. Reasons for this include (i) the fact that this information is not always unequivocally provided in the corresponding literature reference; (ii) the possibility that a given mutation may be pathogenic irrespective of the zygosity in which it is found; (iii) the clinical consequences of zygosity may often be modified by other genetic variants either in cis or in trans ; (iv) digenic or polygenic inheritance of other pathogenic variants or disease modifiers and (v) variable or reduced penetrance which ensures that the genotype is not invariably predictive of the clinical phenotype (Cooper et al. 2013 ). Sometimes the same mutation may be present in the heterozygous, compound heterozygous or homozygous states in different patients; in such cases, information on zygosity may not be easy to provide and may be even more difficult to interpret. Thus, information pertaining to zygosity would not always be helpful or informative with regard to ascertaining or predicting the clinical phenotype, and indeed might even prove inaccurate or misleading.

HGMD users should not assume that just because a sequence variant is labelled “DM”, it automatically follows that it is known or believed to be pathogenic in all individuals harbouring it (i.e. that the variant exhibits 100% penetrance). Indeed, many “disease-causing mutations” display reduced or variable penetrance for a variety of different reasons (reviewed by Cooper et al. 2013 ). Further, population sequencing programmes (such as the 1000 Genomes Project and ExAC) are now identifying considerable numbers of “DM” mutations in apparently healthy individuals (MacArthur et al. 2012 ; Xue et al. 2012 ; Lek et al. 2016 ). Such lesions should not be regarded automatically as being clinically irrelevant, even when they occur with significant frequency, because it is quite possible that these mutations either represent low-penetrance, mild or late onset, or more complex disease susceptibility alleles, as opposed to neutral variants (Cooper et al. 2013 ), or alternatively reside within transcripts that exhibit a degree of translational plasticity (Jagannathan and Bradley 2016 ).

It is HGMD curation policy to err on the side of inclusion and enter a variant into the database even if its pathological relevance may be questionable (while indicating this fact to our users wherever feasible), rather than run the risk of inadvertently excluding a variant that may be directly (or indirectly) relevant to disease. We have taken several different steps to highlight such equivocation in HGMD, viz. the DM? variant class, a dbSNP 1000 Genomes frequency flag (to highlight those HGMD variants that are also present in dbSNP, with allele frequency information included; see below) and the provision of additional literature citations which either support or cast doubt upon the pathogenicity of a particular variant (Fig.  3 ). This latter point is particularly pertinent in the clinical setting, where a greater burden of proof may be required as a prerequisite for use in diagnostic and predictive medicine, and when considering the return of incidental findings to patients after testing (Green et al. 2012 , 2013 ; Ng et al. 2013 ; Gonsalves et al. 2013 ; Dewey et al. 2014 ; Tabor et al. 2014 ; Gambin et al. 2015 ; Jurgens et al. 2015 ).

Example of an HGMD Professional entry

Additional literature references are an important source of contextual information, and play a vital role in querying or confirming the pathogenicity of HGMD variants. Types of additional reference include functional studies, additional case reports, additional phenotypes and population case–control studies. The number of additional references in HGMD has grown steadily as a proportion of the total number of references and accounts for approximately 30–40% of the number of literature references screened and entered into HGMD over the last 3–5 years (Fig.  1 ). The number of literature references reporting novel variants appears to have reached a plateau over the last few years; however, the number of variants being reported per reference is still increasing, from 2.5 mutations per reference in the 1990s to over 4.0 in the last two years. We expect this trend to continue as ever larger numbers of patient population-scale sequencing studies are completed and published (Ellingford et al. 2016 ; Susswein et al. 2016 ; Lopes et al. 2015 ).

HGMD Professional

HGMD Professional serves as the subscription version of HGMD, and is available to both commercial and academic customers under license from QIAGEN Inc. HGMD Professional allows access to up-to-date mutation data with a quarterly release cycle; this version is therefore essential for checking the novelty of newly found mutations. HGMD Professional contains many features not available in the public version. More powerful search tools in the form of an expanded search engine with full text Boolean searching are provided. A batch search mode has been developed to allow users to query HGMD using gene (e.g. OMIM IDs, Entrez IDs), variant (e.g. dbSNP IDs, chromosomal coordinates, VCF format) and dataset (e.g. PubMed ID) oriented lists. Users can employ these tools to perform additional searches for gene-specific (e.g. chromosomal locations, gene names/aliases and gene ontology), mutation-specific (e.g. chromosomal coordinates, HGVS nomenclature, dbSNP ID) or citation-specific (e.g. first author, publication year, PubMed ID) information. Chromosomal coordinates (hg19/hg38) and HGVS nomenclature are provided for the vast majority of our nucleotide substitutions (99.8% coverage) and other micro-lesions (97.6% coverage). Provision of consistently accurate mutation descriptions is especially important in the era of NGS sequencing (Yen et al. 2017 ) and has helped to make HGMD an invaluable tool for the analysis of population-scale NGS datasets such as the 1000 Genomes Project (1000 Genomes Project Consortium 2015 ) and ExAC (Lek et al. 2016 ). Additional information is also provided on a mutation-specific basis (see Fig.  3 ) including curatorial comments (for example, if the mutation data presented in the original publication required in-house correction or author clarification [5–10% of all entries], or if the clinical phenotype is associated with a more complex, i.e. digenic or in- cis inheritance pattern), additional reports comprising functional characterisation, further phenotypic information, comparative biochemical parameters, evolutionary conservation and SIFT (Sim et al. 2012 ) and MutPred (Li et al. 2009 ) pathogenicity predictions. More recently, the functional predictions and nucleotide conservation data from dbNSFP2.0 (Liu et al. 2013 ), a database of all potential non-synonymous single-nucleotide variants in the human genome, have been included. These additional annotations are updated on a regular basis.

HGMD clinical phenotypes have been annotated against the Unified Medical Language System (UMLS) using a combination of manual curation and natural language processing. The UMLS is a compilation of biomedical ontologies and vocabularies catalogued into a single resource (e.g. OMIM phenotype data, Medical Subject Headings (MeSH) and other disease ontologies), and may be found at http://www.nlm.nih.gov/research/umls/ . HGMD phenotype data have been mapped to approximately 18 different UMLS high-level concepts (Fig.  4 ). These UMLS mappings provide users with a more accurate and expanded phenotype search. Thus, searches using alternative disease names should return the same result-set, e.g. a search for “breast cancer” should yield identical results to a search for “malignant neoplasm of breast”. In addition, utilising the UMLS allows for powerful semantic searching (e.g. searches for all mutations linked to “blood disorders” or “immune disorders”). The UMLS ontology mappings have been utilised in a variety of different NGS sequencing studies (see below).

Overview of UMLS high-level disease concept mappings present in HGMD

Another feature involves the highlighting of HGMD entries where the pathogenicity of the variant may have been cast into doubt by virtue of its high allele frequency. HGMD Professional displays a frequency flag when a listed variant is to be found in dbSNP, and population frequency data from the 1000 Genomes Project are provided. In addition, HGMD will soon include allele frequencies derived from the more recent ExAC study (Lek et al. 2016 ). As well as searching and viewing mutation data, users of HGMD Professional may utilise a feedback facility to submit corrections to the database curators or to request additional features (see Fig.  3 to view a sample HGMD Professional variant entry).

HGMD Professional also includes an Advanced Search facility to enhance mutation searching, viewing and retrieval. Datasets may be combined (for example, micro-deletions, micro-insertions and indels) to enable powerful searching across comparable types of mutation. A variety of search parameters are available, including functional features [e.g. in vitro and/or in silico characterised transcription factor binding sites, post-translational modifications, microRNA binding sites, upstream open reading frames (ORFs), and catalytic residues] to search for the gain or loss of a specific feature as a consequence of mutation; type of amino acid substitution; nucleotide substitution; size and/or sequence composition of micro-deletions, micro-insertions or indels; pre- or user-defined sequence motifs (both those created and those abolished by the mutation in question); dbSNP number; keywords found in the article title or abstract. The Advanced Search also includes a batch mode termed “Mutation Mart” to query HGMD via multiple identifiers including dbSNP, Entrez gene ( http://www.ncbi.nlm.nih.gov/gene ) and PubMed. HGMD Professional is available to subscribers either as an online-only package or in downloadable form enabling users to incorporate HGMD data into their local variant analysis pipelines ( https://www.qiagenbioinformatics.com/products/human-gene-mutation-database/ ).

Focus on NGS

HGMD data are available in VCF format allowing easy visualisation, for example by using the Integrative Genomics Viewer (Robinson et al. 2011 ), or incorporation into custom data analysis pipelines (Dorschner et al. 2013 ; Gambin et al. 2015 ; Johnston et al. 2015 ; Lek et al. 2016 ). This facility allows users to maximise their use of HGMD data in both a clinical diagnostic and research setting. The provision of disease UMLS concept mappings (including OMIM, SNOMED, MeSH and HPO) also greatly enhances both the web-based HGMD search facility and the downloadable package, allowing the stratification of variants according to recognised disease concepts.

When using HGMD Professional to annotate large NGS datasets, and depending on the context (e.g. an inherited disease screen), it is often useful to annotate the dataset with a subset of HGMD variants (e.g. those which fall into the DM and DM? categories). Any variants found concurrently in this subset and the dataset being tested may then be further prioritised by variant class; hence, DM variants could be ranked higher than DM? variants if so desired. We have plans to introduce a literature-based variant scoring system to allow NGS researchers and clinicians to improve their prioritisation of DM/DM? variants found in their result sets. This system will annotate additional references as being supportive, neutral or not supportive of the inclusion of the variant in HGMD, thereby allowing users to rank those variants that possess additional supporting literature evidence (e.g. those with a published functional study) more highly, in addition to de-prioritising variants that have additional literature evidence questioning their pathological relevance. This new information will be available in both the online and download versions of the next release of HGMD Professional (see Fig.  3 ).

One of the problems encountered by NGS researchers and clinicians is the mis-annotation of variants as pathogenic or disease-causing. A small number of literature reports have been published where common variants have not being properly filtered out at an early stage, thereby increasing the number of mis-categorised variants appearing in the literature. HGMD has instigated plans to mitigate this problem, including the pre-screening of entries against the population frequency data present in ExAC (in progress) and the introduction of a literature-based scoring system (see above).

Other variant databases

Several other databases are available that attempt to record disease-causing or disease-associated (i.e. pathogenic) variation. These include the Online Mendelian Inheritance in Man, OMIM ( http://www.omim.org/ ; Amberger et al. 2015 ), ClinVar ( http://www.ncbi.nlm.nih.gov/clinvar/ ; Landrum et al. 2016 ), dbSNP ( http://www.ncbi.nlm.nih.gov/SNP/ ; Sherry et al. 2001 ), LOVD ( http://grenada.lumc.nl/LSDB_list/lsdbs ; Fokkema et al. 2011 ) and a variety of locus-specific mutation databases (LSDBs) ( http://www.hgvs.org/dblist/glsdb.html ). OMIM does not provide statistics for allelic variants on its website; however, 25,115 germline OMIM variants appear to have been added to ClinVar, which itself currently contains a total of 53,211 pathogenic or likely pathogenic germline variants, whereas dbSNP contains 49,675 pathogenic or likely pathogenic clinically significant variants (all databases accessed December 30th 2016). In comparison, HGMD currently contains 193,904 DM and DM? variant entries in 6770 genes. Owing to the highly dispersed nature of the LSDBs and the potential for duplication between databases, accurate statistics with regard to like-for-like bona fide germline disease-causing (i.e. not merely neutral) variation is difficult to obtain. Since OMIM only records a limited number of variants deemed newsworthy per gene, and ClinVar still lacks depth (in terms of variant and literature coverage) and obtains a significant proportion (~40% of the above-mentioned total) of its pathogenic variant data via direct submission from clinical testing laboratories, HGMD is the only database of inherited human pathological variants that can claim to approach comprehensive coverage of the peer-reviewed literature (Peterson et al. 2013 ). Since both ClinVar and the LSDBs contain unpublished (i.e. non-peer reviewed) mutation data, the question has arisen as to whether HGMD should also include these data (Patrinos et al. 2012 ). However, both ClinVar and the LSDBs have encountered problems pertaining to data quality, submission, provenance and consent. A recent study (Abouelhoda et al. 2016 ) found that a higher proportion (1.1% vs. 0.59%) of variants in ClinVar required reclassification when compared to HGMD Professional (Abouelhoda et al. 2016 , Table 1). The reclassification data presented by the authors of this study have already been incorporated into HGMD Professional. At present, however, it does not appear that any revisions have been made to ClinVar as a result of this study. Therefore, we have opted not to include data from these databases at this time.

How HGMD is utilised

The registered users of the HGMD public website (>101,000 as of March 2017) performed more than 260,000 queries in 2016. HGMD data may not be downloaded in their entirety from the public website; however, data may be made available at the discretion of the curators for non-commercial research purposes. Potential collaborators who wish to access HGMD data in full are required to sign a confidentiality agreement.

HGMD data have been used to perform a series of meta-analyses on different types of gene mutation causing human inherited disease. These studies have helped to improve our understanding of mutational spectra and the molecular mechanisms underlying human inherited disease (Cooper et al. 2011 ). They have served to demonstrate not only that human gene mutation is an inherently non-random process, but also that the nature, location and frequency of different types of mutation are shaped in large part by the local DNA sequence environment (Cooper et al. 2011 ). Indeed, HGMD data have been instrumental in demonstrating that electron transfer reactions (Bacolla et al. 2013 ), base-pair flexibility (Bacolla et al. 2015 ) and non-B DNA forming sequences (Kamat et al. 2016 ) all contribute to sequence context-dependent mutagenesis causing inherited disease. HGMD mutation data were used to demonstrate that many in-frame pathogenic variations perturb protein–protein interactions (Das et al. 2014 ). HGMD mutations have also been used to demonstrate that proteins linked to autosomal dominant diseases exhibit more clustering of rare missense mutations than those linked to autosomal recessive diseases (Turner et al. 2015 ). Finally, HGMD mutations have been mapped to protein 3D structures in order to study the loss and gain of various types of functional attribute, thereby quantifying the impact of disease-causing amino acid substitutions on catalytic activity, metal binding, macromolecular binding, ligand binding, allosteric regulation and post-translational modification (Lugo-Martinez et al. 2016 ).

HGMD data have been used extensively in several international collaborative research projects including the Genotype-Tissue Expression (GTEx) project (Rivas et al. 2015 ), the ExAC project (Lek et al. 2016 ) and the 1000 Genomes project (Marth et al. 2011 ; MacArthur et al. 2012 ; 1000 Genomes Project Consortium 2015 ), where a surprising number of HGMD variants were found in apparently healthy individuals. They have also been used in the comparative analysis of orthologous sequences in model genomes including those of gorilla (Scally et al. 2012 ), mountain gorilla (Xue et al. 2015 ), cynomolgus and Chinese macaques (Yan et al. 2011 ), Rhesus macaque (Rhesus Macaque Genome Sequencing and Analysis Consortium 2007 ) and rat (Rat Genome Sequencing Project Consortium 2004 ), in which many apparently disease-causing mutations in human were found as wild-type (‘compensated mutations’) (Azevedo et al. 2015 , 2016 ).

In a clinical setting, HGMD is widely utilised by many groups in ongoing NGS diagnostic (Bell et al. 2011 ; Johnston et al. 2012 ; Calvo et al. 2012 ; Makrythanasis et al. 2014 ; Karageorgos et al. 2015 ; Wilfert et al. 2016 ; Walsh et al. 2017 ) and human genome sequencing (Tong et al. 2010 ; Kim et al. 2009 ; Telenti et al. 2016 ) programmes. HGMD has also been used by a number of different groups to aid the development of a wide variety of post-NGS variant interpretation and exome prioritisation algorithms including MutPred (Li et al. 2009 ), MutPred Splice (Mort et al. 2014 ), PROVEAN (Choi et al. 2012 ), CAROL (Lopes et al. 2012 ), regSNPs (Teng et al. 2012 ), CRAVAT (Douville et al. 2013 ), NEST (Carter et al. 2013 ), FATHMM (Shihab et al. 2013 ), FATHMM-MKL (Shihab et al. 2015 ), PinPor (Zhang et al. 2014 ), MutationTaster2 (Schwarz et al. 2014 ), Phen-Gen (Javed et al. 2014 ), VEST-indel (Douville et al. 2016 ), Gene Damage Index (Itan et al. 2015 ), DDIG-in (Folkman et al. 2015 ), RSVP (Peterson et al. 2016 ), ExonImpact (Li et al. 2017 ), IntSplice (Shibata et al. 2016 ), snvForest (Wu et al. 2015 ), IMHOTEP (Knecht et al. 2017 ) and M-CAP (Jagadeesh et al. 2016 ). A list of some of the articles which have utilised HGMD data or expertise in their analyses can be found on the HGMD website ( http://www.hgmd.cf.ac.uk/docs/articles.html ).

Data sharing

A limited HGMD data set, containing both chromosomal coordinates and HGMD identifiers, has been made available via academic data exchange programmes to the European Bioinformatics Institute (EBI)/Ensembl (Flicek et al. 2013 ) and the University of California, Santa Cruz (UCSC) (Meyer et al. 2013 ) and may be viewed in these projects’ respective genome browsers. Data from HGMD Professional have additionally been made available to subscribers of Ingenuity Variant Analysis™ (QIAGEN) and Alamut (Interactive Biosoftware), but are also accessible as part of the HGMD Professional stand-alone package (QIAGEN). Allowing free access to the bulk of the mutation data present in HGMD, while generating sufficient income to support maintenance and development via commercial distribution, represents a business model that is intended to maximise the availability of HGMD at the same time as ensuring its long-term sustainability. Although we are obliged to be prudent with regard to data sharing with public data repositories, we have always taken the view that making as much data publicly available as possible is generally beneficial to HGMD as well as to its users worldwide.

Future plans

The provision of chromosomal coordinates (both GRCh37 and 38) for the vast majority of coding region micro-lesions in HGMD is now complete. Expanding this provision to include micro-lesions in non-coding regions and the gross (in progress) and complex lesion (where feasible) datasets is a high priority, We plan to add other commonly utilised NGS formats such as General Feature Format (GFF) ( http://www.sanger.ac.uk/resources/software/gff/ ) and Browser Extensible Data (BED) format to complement the Variant Call Format (VCF) (Danecek et al. 2011 ) data currently available in HGMD Professional. The provision of allele frequency data from large-scale NGS projects such as ExAC ( http://exac.broadinstitute.org/ ), more complete references (i.e. including article titles) and HGVS protein level descriptions for HGMD micro-lesions are also priorities. Provision of genomic reference sequences based on the NCBI RefSeqGene project (Pruitt et al. 2014 ), links to available protein structures and homology models, and expanding our coverage of secondary references (additional case reports and functional studies) are also regarded as priorities, as well as updating our set of functional predictions using the new dbNSFP v3.0 dataset (Liu et al. 2016 ).

HGMD provides the user with a unique resource that can be utilised not only to obtain evidence to support the pathological authenticity and/or novelty of detected gene lesions and to acquire an overview of the mutational spectra for specific genes, but also as a knowledgebase for use in the bioinformatics and whole genome screening projects that underpin personalised genomics, next-generation sequencing research and diagnostic medicine.

1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR (2015) A global reference for human genetic variation. Nature 526:68–74

Article   Google Scholar  

Abouelhoda M, Faquih T, El-Kalioby Alkuraya FS (2016) Revisiting the morbid genome of Mendelian disorders. Genome Biol 17:235

Article   PubMed   PubMed Central   Google Scholar  

Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A (2015) OMIM.org: Online Mendelian Inheritance in Man (OMIM ® ), an online catalog of human genes and genetic disorders. Nucleic Acids Res 43:D789–D798

Article   PubMed   Google Scholar  

Azevedo L, Serrano C, Amorim A, Cooper DN (2015) Trans-species polymorphism in humans and the great apes is generally maintained by balancing selection that modulates the host immune response. Hum Genom 9:21

Azevedo L, Mort M, Costa AC, Silva RM, Quelhas D, Amorim A, Cooper DN (2016) Improving the in silico assessment of pathogenicity for compensated variants. Eur J Hum Genet 25:2–7

Bacolla A, Temiz NA, Yi M, Ivanic J, Cer RZ, Donohue DE, Ball EV, Mudunuri US, Wang G, Jain A, Volfovsky N, Luke BT, Stephens RM, Cooper DN, Collins JR, Vasquez KM (2013) Guanine holes are prominent targets for mutation in cancer and inherited disease. PLoS Genet 9:e1003816

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bacolla A, Zhu X, Chen H, Howells K, Cooper DN, Vasquez KM (2015) Local DNA dynamics shape mutational patterns of mononucleotide repeats in human genomes. Nucl Acids Res 43:5065–5080

Bell CJ, Dinwiddie DL, Miller NA, Hateley SL, Ganusova EE, Mudge J, Langley RJ, Zhang L, Lee CC, Schilkey FD, Sheth V, Woodward JE, Peckham HE, Schroth GP, Kim RW, Kingsmore SF (2011) Carrier testing for severe childhood recessive diseases by next-generation sequencing. Sci Transl Med 3:65ra4

Calvo SE, Compton AG, Hershman SG, Lim SC, Lieber DS, Tucker EJ, Laskowski A, Garone C, Liu S, Jaffe DB, Christodoulou J, Fletcher JM, Bruno DL, Goldblatt J, Dimauro S, Thorburn DR, Mootha VK (2012) Molecular diagnosis of infantile mitochondrial disease with targeted next-generation sequencing. Sci Transl Med 4:118ra10

Carter H, Douville C, Stenson PD, Cooper DN, Karchin R (2013) Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genom 14(Suppl 3):S3

Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS ONE 7:e46688

Cirulli ET, Goldstein DB (2007) In vitro assays fail to predict in vivo effects of regulatory polymorphisms. Hum Mol Genet 16:1931–1939

Article   CAS   PubMed   Google Scholar  

Cooper DN, Chen JM, Ball EV, Howells K, Mort M, Phillips AD, Chuzhanova N, Krawczak M, Kehrer-Sawatzki H, Stenson PD (2010) Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics. Hum Mutat 31:631–655

Cooper DN, Bacolla A, Férec C, Vasquez KM, Kehrer-Sawatzki H, Chen JM (2011) On the sequence-directed nature of human gene mutation: the role of genomic architecture and the local DNA sequence environment in mediating gene mutations underlying human inherited disease. Hum Mutat 32:1075–1099

Cooper DN, Krawczak M, Polychronakos C, Tyler-Smith C, Kehrer-Sawatzki H (2013) Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Hum Genet 132:1077–1130

Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158

Das J, Lee HR, Sagar A, Fragoza R, Liang J, Wei X, Wang X, Mort M, Stenson PD, Cooper DN, Yu H (2014) Elucidating common structural features of human pathogenic variations using large-scale atomic-resolution protein networks. Hum Mutat 35:585–593

Dewey FE, Grove ME, Pan C, Goldstein BA, Bernstein JA, Chaib H, Merker JD, Goldfeder RL, Enns GM, David SP, Pakdaman N, Ormond KE, Caleshu C, Kingham K, Klein TE, Whirl-Carrillo M, Sakamoto K, Wheeler MT, Butte AJ, Ford JM, Boxer L, Ioannidis JP, Yeung AC, Altman RB, Assimes TL, Snyder M, Ashley EA, Quertermous T (2014) Clinical interpretation and implications of whole-genome sequencing. JAMA 311:1035–1045

Dimas AS, Deutsch S, Stranger BE, Montgomery SB, Borel C, Attar-Cohen H, Ingle C, Beazley C, Gutierrez Arcelus M, Sekowska M, Gagnebin M, Nisbett J, Deloukas P, Dermitzakis ET, Antonarakis SE (2009) Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325:1246–1250

Dorschner MO, Amendola LM, Turner EH, Robertson PD, Shirts BH, Gallego CJ, Bennett RL, Jones KL, Tokita MJ, Bennett JT, Kim JH, Rosenthal EA, Kim DS, National Heart, Lung, and Blood Institute Grand Opportunity Exome Sequencing Project, Tabor HK, Bamshad MJ, Motulsky AG, Scott CR, Pritchard CC, Walsh T, Burke W, Raskind WH, Byers P, Hisama FM, Nickerson DA, Jarvik GP (2013) Actionable, pathogenic incidental findings in 1000 participants’ exomes. Am J Hum Genet 93:631–640

Douville C, Carter H, Kim R, Niknafs N, Diekhans M, Stenson PD, Cooper DN, Ryan M, Karchin R (2013) CRAVAT: cancer-related analysis of variants toolkit. Bioinformatics 29:647–648

Douville C, Masica DL, Stenson PD, Cooper DN, Gygax DM, Kim R, Ryan M, Karchin R (2016) Assessing the pathogenicity of insertion and deletion variants with the variant effect scoring tool (VEST-Indel). Hum Mutat 37:28–35

Ellingford JM, Barton S, Bhaskar S, O’Sullivan J, Williams SG, Lamb JA, Panda B, Sergouniotis PI, Gillespie RL, Daiger SP, Hall G, Gale T, Lloyd IC, Bishop PN, Ramsden SC, Black GC (2016) Molecular findings from 537 individuals with inherited retinal disease. J Med Genet 53(11):761–767

Article   PubMed Central   Google Scholar  

Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, García-Girón C, Gordon L, Hourlier T, Hunt S, Juettemann T, Kähäri AK, Keenan S, Komorowska M, Kulesha E, Longden I, Maurel T, McLaren WM, Muffato M, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, Riat HS, Ritchie GR, Ruffier M, Schuster M, Sheppard D, Sobral D, Taylor K, Thormann A, Trevanion S, White S, Wilder SP, Aken BL, Birney E, Cunningham F, Dunham I, Harrow J, Herrero J, Hubbard TJ, Johnson N, Kinsella R, Parker A, Spudich G, Yates A, Zadissa A, Searle SM (2013) Ensembl 2013. Nucleic Acids Res 41:D48–D55

Fokkema IF, Taschner PE, Schaafsma GC, Celli J, Laros JF, den Dunnen JT (2011) LOVD v. 2.0: the next generation in gene variant databases. Hum Mutat 32:557–563

Folkman L, Yang Y, Li Z, Stantic B, Sattar A, Mort M, Cooper DN, Liu Y, Zhou Y (2015) DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels. Bioinformatics 31:1599–1606

Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S, Kok CY, Jia M, De T, Teague JW, Stratton MR, McDermott U, Campbell PJ (2015) COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res 43(Database issue):D805–D811

Gambin T, Jhangiani SN, Below JE, Campbell IM, Wiszniewski W, Muzny DM, Staples J, Morrison AC, Bainbridge MN, Penney S, McGuire AL, Gibbs RA, Lupski JR, Boerwinkle E (2015) Secondary findings and carrier test frequencies in a large multiethnic sample. Genome Med 7:54

Gonsalves SG, Ng D, Johnston JJ, Teer JK, NISC Comparative Sequencing Program, Stenson PD, Cooper DN, Mullikin JC, Biesecker LG (2013) Using exome data to identify malignant hyperthermia susceptibility mutations. Anesthesiology 119:1043–1053

Green RC, Berg JS, Berry GT, Biesecker LG, Dimmock DP, Evans JP, Grody WW, Hegde MR, Kalia S, Korf BR, Krantz I, McGuire AL, Miller DT, Murray MF, Nussbaum RL, Plon SE, Rehm HL, Jacob HJ (2012) Exploring concordance and discordance for return of incidental findings from clinical sequencing. Genet Med 14:405–410

Green RC, Berg JS, Grody WW, Kalia SS, Korf BR, Martin CL, McGuire AL, Nussbaum RL, O’Daniel JM, Ormond KE, Rehm HL, Watson MS, Williams MS, Biesecker LG (2013) ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet Med 15:565–574

Itan Y, Shang L, Boisson B, Patin E, Bolze A, Moncada-Vélez M, Scott E, Ciancanelli MJ, Lafaille FG, Markle JG, Martinez-Barricarte R, de Jong SJ, Kong XF, Nitschke P, Belkadi A, Bustamante J, Puel A, Boisson-Dupuis S, Stenson PD, Gleeson JG, Cooper DN, Quintana-Murci L, Claverie JM, Zhang SY, Abel L, Casanova JL (2015) The human gene damage index as a gene-level approach to prioritizing exome variants. Proc Natl Acad Sci USA 112:13615–13620

Jagadeesh KA, Wenger AM, Berger MJ, Guturu H, Stenson PD, Cooper DN, Bernstein JA, Bejerano G (2016) M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat Genet 48:1581–1586

Jagannathan S, Bradley RK (2016) Translational plasticity facilitates the accumulation of nonsense genetic variants in the human population. Genome Res 26:1639–1650

Javed A, Agrawal S, Ng PC (2014) Phen-Gen: combining phenotype and genotype to analyze rare disorders. Nat Methods 11:935–937

Johnston JJ, Rubinstein WS, Facio FM, Ng D, Singh LN, Teer JK, Mullikin JC, Biesecker LG (2012) Secondary variants in individuals undergoing exome sequencing: screening of 572 individuals identifies high-penetrance mutations in cancer-susceptibility genes. Am J Hum Genet 91:97–108

Johnston JJ, Lewis KL, Ng D, Singh LN, Wynter J, Brewer C, Brooks BP, Brownell I, Candotti F, Gonsalves SG, Hart SP, Kong HH, Rother KI, Sokolic R, Solomon BD, Zein WM, Cooper DN, Stenson PD, Mullikin JC, Biesecker LG (2015) Individualized iterative phenotyping for genome-wide analysis of loss-of-function mutations. Am J Hum Genet 96:913–925

Jurgens J, Ling H, Hetrick K, Pugh E, Schiettecatte F, Doheny K, Hamosh A, Avramopoulos D, Valle D, Sobreira N (2015) Assessment of incidental findings in 232 whole-exome sequences from the Baylor-Hopkins Center for Mendelian Genomics. Genet Med 17:782–788

Kamat MA, Bacolla A, Cooper DN, Chuzhanova N (2016) A role for non-B DNA forming sequences in mediating microlesions causing human inherited disease. Hum Mutat 37:65–73

Karageorgos I, Mizzi C, Giannopoulou E, Pavlidis C, Peters BA, Zagoriti Z, Stenson PD, Mitropoulos K, Borg J, Kalofonos HP, Drmanac R, Stubbs A, van der Spek P, Cooper DN, Katsila T, Patrinos GP (2015) Identification of cancer predisposition variants in apparently healthy individuals using a next-generation sequencing-based family genomics approach. Hum Genomics 9:12

Kim JI, Ju YS, Park H, Kim S, Lee S, Yi JH, Mudge J, Miller NA, Hong D, Bell CJ, Kim HS, Chung IS, Lee WC, Lee JS, Seo SH, Yun JY, Woo HN, Lee H, Suh D, Lee S, Kim HJ, Yavartanoo M, Kwak M, Zheng Y, Lee MK, Park H, Kim JY, Gokcumen O, Mills RE, Zaranek AW, Thakuria J, Wu X, Kim RW, Huntley JJ, Luo S, Schroth GP, Wu TD, Kim H, Yang KS, Park WY, Kim H, Church GM, Lee C, Kingsmore SF, Seo JS (2009) A highly annotated whole-genome sequence of a Korean individual. Nature 460:1011–1015

CAS   PubMed   PubMed Central   Google Scholar  

Knecht C, Mort M, Junge O, Cooper DN, Krawczak M, Caliebe A (2017) IMHOTEP-a composite score integrating popular tools for predicting the functional consequences of non-synonymous sequence variants. Nucleic Acids Res 45:e13

PubMed   Google Scholar  

Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J, Jang W, Katz K, Ovetsky M, Riley G, Sethi A, Tully R, Villamarin-Salomon R, Rubinstein W, Maglott DR (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44:D862–D868

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N, DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D, Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas BP, Tiao G, Tusie-Luna MT, Weisburd B, Won HH, Yu D, Altshuler DM, Ardissino D, Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ, Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D, McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P, Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur DG, Exome Aggregation Consortium (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285–291

Li B, Krishnan VG, Mort ME, Xin F, Kamati KK, Cooper DN, Mooney SD, Radivojac P (2009) Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 25:2744–2750

Li M, Feng W, Zhang X, Yang Y, Wang K, Mort M, Cooper DN, Wang Y, Zhou Y, Liu Y (2017) ExonImpact: prioritizing pathogenic alternative splicing events. Hum Mutat 38:16–24

Liu X, Jian X, Boerwinkle E (2013) dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum Mutat 34(9):E2393–E2402

Liu X, Wu C, Li C, Boerwinkle E (2016) dbNSFP v3.0: a one-stop database of functional predictions and annotations for human nonsynonymous and splice-site SNVs. Hum Mutat 37:235–241

Lopes MC, Joyce C, Ritchie GR, John SL, Cunningham F, Asimit J, Zeggini E (2012) A combined functional annotation score for non-synonymous variants. Hum Hered 73:47–51

Lopes LR, Syrris P, Guttmann OP, O’Mahony C, Tang HC, Dalageorgou C, Jenkins S, Hubank M, Monserrat L, McKenna WJ, Plagnol V, Elliott PM (2015) Novel genotype-phenotype associations demonstrated by high-throughput sequencing in patients with hypertrophic cardiomyopathy. Heart 101:294–301

Lott MT, Leipzig JN, Derbeneva O, Xie HM, Chalkia D, Sarmady M, Procaccio V, Wallace DC (2013) mtDNA variation and analysis using Mitomap and Mitomaster. Curr Protoc Bioinform 44:1.23.1-26

Google Scholar  

Lugo-Martinez J, Pejaver V, Pagel KA, Jain S, Mort M, Cooper DN, Mooney SD, Radivojac P (2016) The loss and gain of functional amino acid residues is a common mechanism causing human inherited disease. PLoS Comput Biol 12:e1005091

MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, Albers CA, Zhang ZD, Conrad DF, Lunter G, Zheng H, Ayub Q, DePristo MA, Banks E, Hu M, Handsaker RE, Rosenfeld JA, Fromer M, Jin M, Mu XJ, Khurana E, Ye K, Kay M, Saunders GI, Suner MM, Hunt T, Barnes IH, Amid C, Carvalho-Silva DR, Bignell AH, Snow C, Yngvadottir B, Bumpstead S, Cooper DN, Xue Y, Romero IG, 1000 Genomes Project Consortium, Wang J, Li Y, Gibbs RA, McCarroll SA, Dermitzakis ET, Pritchard JK, Barrett JC, Harrow J, Hurles ME, Gerstein MB, Tyler-Smith C (2012) A systematic survey of loss-of-function variants in human protein-coding genes. Science 335:823–828

Makrythanasis P, Nelis M, Santoni FA, Guipponi M, Vannier A, Béna F, Gimelli S, Stathaki E, Temtamy S, Mégarbané A, Masri A, Aglan MS, Zaki MS, Bottani A, Fokstuen S, Gwanmesia L, Aliferis K, Bustamante Eduardo M, Stamoulis G, Psoni S, Kitsiou-Tzeli S, Fryssira H, Kanavakis E, Al-Allawi N, Sefiani A, Al Hait S, Elalaoui SC, Jalkh N, Al-Gazali L, Al-Jasmi F, Bouhamed HC, Abdalla E, Cooper DN, Hamamy H, Antonarakis SE (2014) Diagnostic exome sequencing to elucidate the genetic basis of likely recessive disorders in consanguineous families. Hum Mutat 35:1203–1210

Marth GT, Yu F, Indap AR, Garimella K, Gravel S, Leong WF, Tyler-Smith C, Bainbridge M, Blackwell T, Zheng-Bradley X, Chen Y, Challis D, Clarke L, Ball EV, Cibulskis K, Cooper DN, Fulton B, Hartl C, Koboldt D, Muzny D, Smith R, Sougnez C, Stewart C, Ward A, Yu J, Xue Y, Altshuler D, Bustamante CD, Clark AG, Daly M, DePristo M, Flicek P, Gabriel S, Mardis E, Palotie A, Gibbs R, 1000 Genomes Project (2011) The functional spectrum of low-frequency coding variation. Genome Biol 12:R84

Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, Raney BJ, Pohl A, Malladi VS, Li CH, Lee BT, Learned K, Kirkup V, Hsu F, Heitner S, Harte RA, Haeussler M, Guruvadoo L, Goldman M, Giardine BM, Fujita PA, Dreszer TR, Diekhans M, Cline MS, Clawson H, Barber GP, Haussler D, Kent WJ (2013) The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res 41(Database issue):D64–D69

Mikhail FM (2014) Copy number variations and human genetic disease. Curr Opin Pediatr 26:646–652

Mort M, Sterne-Weiler T, Li B, Ball EV, Cooper DN, Radivojac P, Sanford JR, Mooney SD (2014) MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing. Genome Biol 15:R19

Ng D, Johnston JJ, Teer JK, Singh LN, Peller LC, Wynter JS, Lewis KL, Cooper DN, Stenson PD, Mullikin JC, Biesecker LG (2013) Interpreting secondary cardiac disease variants in an exome cohort. Circ Cardiovasc Genet 6:337–346

Patrinos GP, Cooper DN, van Mulligen E, Gkantouna V, Tzimas G, Tatum Z, Schultes E, Roos M, Mons B (2012) Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain. Hum Mutat 33:1503–1512

Peterson TA, Doughty E, Kann MG (2013) Towards precision medicine: advances in computational approaches for analysis of human variants. J Mol Biol 425:4047–4063

Peterson TA, Mort M, Cooper DN, Radivojac P, Kann MG, Mooney SD (2016) Regulatory single-nucleotide variant predictor increases predictive performance of functional regulatory variants. Hum Mutat 37:1137–1143

Pinard A, Miltgen M, Blanchard A, Mathieu H, Desvignes JP, Salgado D, Fabre A, Arnaud P, Barré L, Krahn M, Grandval P, Olschwang S, Zaffran S, Boileau C, Béroud C, Collod-Béroud G (2016) Actionable genes, core databases, and locus-specific databases. Hum Mutat 37:1299–1307

Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O’Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, DiCuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42(Database issue):D756–D763

Rat Genome Sequencing Project Consortium (2004) Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428:493–521

Rhesus Macaque Genome Sequencing and Analysis Consortium (2007) Evolutionary and biomedical insights from the rhesus macaque genome. Science 316:222–234

Rivas MA, Pirinen M, Conrad DF, Lek M, Tsang EK, Karczewski KJ, Maller JB, Kukurba KR, DeLuca DS, Fromer M, Ferreira PG, Smith KS, Zhang R, Zhao F, Banks E, Poplin R, Ruderfer DM, Purcell SM, Tukiainen T, Minikel EV, Stenson PD, Cooper DN, Huang KH, Sullivan TJ, Nedzel J, GTEx Consortium, Geuvadis Consortium, Bustamante CD, Li JB, Daly MJ, Guigo R, Donnelly P, Ardlie K, Sammeth M, Dermitzakis ET, McCarthy MI, Montgomery SB, Lappalainen T, MacArthur DG (2015) Effect of predicted protein-truncating genetic variants on the human transcriptome. Science 348:666–669

Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26

Scally A, Dutheil JY, Hillier LW, Jordan GE, Goodhead I, Herrero J, Hobolth A, Lappalainen T, Mailund T, Marques-Bonet T, McCarthy S, Montgomery SH, Schwalie PC, Tang YA, Ward MC, Xue Y, Yngvadottir B, Alkan C, Andersen LN, Ayub Q, Ball EV, Beal K, Bradley BJ, Chen Y, Clee CM, Fitzgerald S, Graves TA, Gu Y, Heath P, Heger A, Karakoc E, Kolb-Kokocinski A, Laird GK, Lunter G, Meader S, Mort M, Mullikin JC, Munch K, O’Connor TD, Phillips AD, Prado-Martinez J, Rogers AS, Sajjadian S, Schmidt D, Shaw K, Simpson JT, Stenson PD, Turner DJ, Vigilant L, Vilella AJ, Whitener W, Zhu B, Cooper DN, de Jong P, Dermitzakis ET, Eichler EE, Flicek P, Goldman N, Mundy NI, Ning Z, Odom DT, Ponting CP, Quail MA, Ryder OA, Searle SM, Warren WC, Wilson RK, Schierup MH, Rogers J, Tyler-Smith C, Durbin R (2012) Insights into hominid evolution from the gorilla genome sequence. Nature 483:169–175

Schwarz JM, Cooper DN, Schuelke M, Seelow D (2014) MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods 11:361–362

Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311

Shibata A, Okuno T, Rahman MA, Azuma Y, Takeda J, Masuda A, Selcen D, Engel AG, Ohno K (2016) IntSplice: prediction of the splicing consequences of intronic single-nucleotide variations in the human genome. J Hum Genet 61:633–640

Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GL, Edwards KJ, Day IN, Gaunt TR (2013) Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat 34:57–65

Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day IN, Gaunt TR, Campbell C (2015) An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics 31:1536–1543

Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 40:W452–W457

Stenson PD, Mort M, Ball EV, Shaw K, Phillips A, Cooper DN (2014) The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet 133:1–9

Susswein LR, Marshall ML, Nusbaum R, Vogel Postula KJ, Weissman SM, Yackowski L, Vaccari EM, Bissonnette J, Booker JK, Cremona ML, Gibellini F, Murphy PD, Pineda-Alvarez DE, Pollevick GD, Xu Z, Richard G, Bale S, Klein RT, Hruska KS, Chung WK (2016) Pathogenic and likely pathogenic variant prevalence among the first 10,000 patients referred for next-generation cancer panel testing. Genet Med 18:823–832

Tabor HK, Auer PL, Jamal SM, Chong JX, Yu JH, Gordon AS, Graubert TA, O’Donnell CJ, Rich SS, Nickerson DA, NHLBI Exome Sequencing Project, Bamshad MJ (2014) Pathogenic variants for Mendelian and complex traits in exomes of 6,517 European and African Americans: implications for the return of incidental results. Am J Hum Genet 95:183–193

Telenti A, Pierce LC, Biggs WH, di Iulio J, Wong EH, Fabani MM, Kirkness EF, Moustafa A, Shah N, Xie C, Brewerton SC, Bulsara N, Garner C, Metzker G, Sandoval E, Perkins BA, Och FJ, Turpaz Y, Venter JC (2016) Deep sequencing of 10,000 human genomes. Proc Natl Acad Sci USA 113:11901–11906

Teng M, Ichikawa S, Padgett LR, Wang Y, Mort M, Cooper DN, Koller DL, Foroud T, Edenberg HJ, Econs MJ, Liu Y (2012) regSNPs: a strategy for prioritizing regulatory single nucleotide substitutions. Bioinformatics 28:1879–1886

Thorn CF, Klein TE, Altman RB (2013) PharmGKB: the pharmacogenomics knowledge base. Methods Mol Biol 1015:311–320

Tong P, Prendergast JG, Lohan AJ, Farrington SM, Cronin S, Friel N, Bradley DG, Hardiman O, Evans A, Wilson JF, Loftus B (2010) Sequencing and analysis of an Irish human genome. Genome Biol 11:R91

Turner TN, Douville C, Kim D, Stenson PD, Cooper DN, Chakravarti A, Karchin R (2015) Proteins linked to autosomal dominant and autosomal recessive disorders harbor characteristic rare missense mutation distribution patterns. Hum Mol Genet 24:5995–6002

Usher CL, McCarroll SA (2015) Complex and multi-allelic copy number variation in human disease. Brief Funct Genom 14:329–338

Walsh R, Thomson KL, Ware JS, Funke BH, Woodley J, McGuire KJ, Mazzarotto F, Blair E, Seller A, Taylor JC, Minikel EV, Exome Aggregation Consortium, MacArthur DG, Farrall M, Cook SA, Watkins H (2017) Reassessment of Mendelian gene pathogenicity using 7,855 cardiomyopathy cases and 60,706 reference samples. Genet Med 19:192–203

Wilfert AB, Chao KR, Kaushal M, Jain S, Zöllner S, Adams DR, Conrad DF (2016) Genome-wide significance testing of variation from single case exomes. Nat Genet 48:1455–1461

Wu M, Wu J, Chen T, Jiang R (2015) Prioritization of nonsynonymous single nucleotide variants for exome sequencing studies via integrative learning on multiple genomic data. Sci Rep 5:14955

Xue Y, Chen Y, Ayub Q, Huang N, Ball EV, Mort M, Phillips AD, Shaw K, Stenson PD, Cooper DN, Tyler-Smith C, The 1000 Genomes Project Consortium (2012) Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am J Hum Genet 91:1022–1032

Xue Y, Prado-Martinez J, Sudmant PH, Narasimhan V, Ayub Q, Szpak M, Frandsen P, Chen Y, Yngvadottir B, Cooper DN, de Manuel M, Hernandez-Rodriguez J, Lobon I, Siegismund HR, Pagani L, Quail MA, Hvilsom C, Mudakikwa A, Eichler EE, Cranfield MR, Marques-Bonet T, Tyler-Smith C, Scally A (2015) Mountain gorilla genomes reveal the impact of long-term population decline and inbreeding. Science 348:242–245

Yan G, Zhang G, Fang X, Zhang Y, Li C, Ling F, Cooper DN, Li Q, Li Y, van Gool AJ, Du H, Chen J, Chen R, Zhang P, Huang Z, Thompson JR, Meng Y, Bai Y, Wang J, Zhuo M, Wang T, Huang Y, Wei L, Li J, Wang Z, Hu H, Yang P, Le L, Stenson PD, Li B, Liu X, Ball EV, An N, Huang Q, Zhang Y, Fan W, Zhang X, Li Y, Wang W, Katze MG, Su B, Nielsen R, Yang H, Wang J, Wang X, Wang J (2011) Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat Biotechnol 29:1019–1023

Yen JL, Garcia S, Montana A, Harris J, Chervitz S, Morra M, West J, Chen R, Church DM (2017) A variant by any name: quantifying annotation discordance across tools and clinical databases. Genome Med 9:7

Zhang X, Lin H, Zhao H, Hao Y, Mort M, Cooper DN, Zhou Y, Liu Y (2014) Impact of human pathogenic micro-insertions and micro-deletions on post-transcriptional regulation. Hum Mol Genet 23:3024–3034

Download references

Author information

Authors and affiliations.

School of Medicine, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK

Peter D. Stenson, Matthew Mort, Edward V. Ball, Katy Evans, Matthew Hayden, Sally Heywood, Michelle Hussain, Andrew D. Phillips & David N. Cooper

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Peter D. Stenson or David N. Cooper .

Ethics declarations

Conflict of interest.

The authors wish to declare an interest in so far as HGMD is financially supported by QIAGEN Inc. through a License agreement with Cardiff University.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Stenson, P.D., Mort, M., Ball, E.V. et al. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet 136 , 665–677 (2017). https://doi.org/10.1007/s00439-017-1779-6

Download citation

Received : 24 February 2017

Accepted : 14 March 2017

Published : 27 March 2017

Issue Date : June 2017

DOI : https://doi.org/10.1007/s00439-017-1779-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Unify Medical Language System
  • Mutation Data
  • Human Gene Mutation Database
  • Variant Call Format
  • Unify Medical Language System Concept
  • Find a journal
  • Publish with us
  • Track your research

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

DNA mutation motifs in the genes associated with inherited diseases

Roles Investigation, Software, Visualization, Writing – original draft, Writing – review & editing

Affiliations CEITEC—Central European Institute of Technology, Masaryk University, Kamenice 5, Brno, Czech Republic, Department of Condensed Matter Physics, Faculty of Science, Masaryk University, Kotlářská 2, Brno, Czech Republic

Roles Methodology, Software, Supervision, Writing – review & editing

Affiliations CEITEC—Central European Institute of Technology, Masaryk University, Kamenice 5, Brno, Czech Republic, National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 5, Brno, Czech Republic

Roles Formal analysis, Methodology, Software

Affiliation CEITEC—Central European Institute of Technology, Masaryk University, Kamenice 5, Brno, Czech Republic

Roles Methodology, Software

Roles Formal analysis, Writing – review & editing

Affiliation Department of Condensed Matter Physics, Faculty of Science, Masaryk University, Kotlářská 2, Brno, Czech Republic

Roles Data curation, Validation

Affiliation Centre of Molecular Biology and Gene Therapy, University Hospital Brno and Masaryk University, Jihlavská 20, Brno, Czech Republic

Roles Conceptualization, Investigation, Project administration, Supervision, Validation, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

ORCID logo

  • Michal Růžička, 
  • Petr Kulhánek, 
  • Lenka Radová, 
  • Andrea Čechová, 
  • Naďa Špačková, 
  • Lenka Fajkusová, 
  • Kamila Réblová

PLOS

  • Published: August 2, 2017
  • https://doi.org/10.1371/journal.pone.0182377
  • Reader Comments

Fig 1

Mutations in human genes can be responsible for inherited genetic disorders and cancer. Mutations can arise due to environmental factors or spontaneously. It has been shown that certain DNA sequences are more prone to mutate. These sites are termed hotspots and exhibit a higher mutation frequency than expected by chance. In contrast, DNA sequences with lower mutation frequencies than expected by chance are termed coldspots. Mutation hotspots are usually derived from a mutation spectrum, which reflects particular population where an effect of a common ancestor plays a role. To detect coldspots/hotspots unaffected by population bias, we analysed the presence of germline mutations obtained from HGMD database in the 5-nucleotide segments repeatedly occurring in genes associated with common inherited disorders, in particular, the PAH , LDLR , CFTR , F8 , and F9 genes. Statistically significant sequences (mutational motifs) rarely associated with mutations (coldspots) and frequently associated with mutations (hotspots) exhibited characteristic sequence patterns, e.g. coldspots contained purine tract while hotspots showed alternating purine-pyrimidine bases, often with the presence of CpG dinucleotide. Using molecular dynamics simulations and free energy calculations, we analysed the global bending properties of two selected coldspots and two hotspots with a G/T mismatch. We observed that the coldspots were inherently more flexible than the hotspots. We assume that this property might be critical for effective mismatch repair as DNA with a mutation recognized by MutSα protein is noticeably bent.

Citation: Růžička M, Kulhánek P, Radová L, Čechová A, Špačková N, Fajkusová L, et al. (2017) DNA mutation motifs in the genes associated with inherited diseases. PLoS ONE 12(8): e0182377. https://doi.org/10.1371/journal.pone.0182377

Editor: Tamar Schlick, New York University, UNITED STATES

Received: March 27, 2017; Accepted: July 17, 2017; Published: August 2, 2017

Copyright: © 2017 Růžička et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This research was financially supported by Grant Agency of the Czech Republic (GA16-11619S/2016) and by the Ministry of Education, Youth and Sports of the Czech Republic under the project CEITEC 2020 (LQ1601). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Computational resources were provided by the CESNET LM2015042 and the CERIT Scientific Cloud LM2015085, provided under the programme "Projects of Large Research, Development, and Innovations Infrastructures". This work was supported by The Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project „IT4Innovations National Supercomputing Center – LM2015070“.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Genomic integrity and stability is important for any living organism. Cells evolved various repair pathways to maintain the correct transmission of genetic information to the next generation, e.g. nucleotide excision repair (NER), base excision repair (BER), mismatch repair (MMR), homologous recombination repair and post-replication repair [ 1 ]. Mutations arising in parent’s germ cells are termed de novo mutations and can cause various inherited disorders. Errors in DNA can arise due to environmental factors or spontaneously, e.g. during DNA replication or due to deamination of 5-methylcytosine. With a fully functional repair system, the frequency of spontaneous error is ca. 1 per 10 9 –10 10 base pairs per replication [ 2 ]. In human genes, a mutational strand asymmetry was observed as a consequence of both transcription [ 3 – 5 ] and replication processes [ 6 – 8 ]. Detecting DNA damage on a transcribed DNA strand by RNA polymerase initiates transcription coupled repair (TCR), a sub-pathway of NER. This leads to strand asymmetry that can be expressed using GC and TA skew profiles. Strand asymmetry arising during replication is related to different synthesis and proofreading mechanisms of the leading strand (replicated continuously from the origin) and the lagging strand (replicated in discrete steps towards the origin). Previous studies have shown that certain DNA sequences are more prone to mutate [ 9 , 10 ]. These sites are termed hotspots and exhibit a higher mutation frequency than expected by chance. DNA sequences with lower frequencies than expected by chance are termed coldspots. Perhaps, the most well-known hotspot is the CpG dinucleotide associated with the C>T mutation, resulting in TpG or CpA (on the other strand) transitions [ 11 ]. During this change, cytosine is methylated and subsequently deaminated [ 12 ]. In spite of the effective repair pathways, BER and MMR, this mutation is very frequent [ 13 – 15 ]. It was proposed that due to high CpG site mutability, these dinucleotides occur less frequently in the genome than would be expected by chance [ 16 ]. Cytosine methylation is also one of the most frequent DNA modifications used to control gene expression in the cell, hence it represents a naturally occurring base modification [ 17 , 18 ]. Apart from the CpG sites, there are other sequences with higher mutation rates, e.g. CpHpG trinucleotide (where H stands for A , C or T ) [ 19 ] or GTAAGT motif [ 20 ]. It was observed that a sequence of ±2 nucleotides (nt) around a mismatch has an influence on the relative rates of single nucleotide variations (SNVs) causing human inherited disorders [ 15 , 21 ]. In addition, all mismatch types are not repaired with the same efficiency [ 22 ]. In the cell, mismatches and small insertion/deletion loops (IDLs) are primarily targeted by MutSα protein within the MMR pathway [ 2 ]. DNA with a mismatch is bent by ca. 60° in the MutSα/DNA complex (so the angle between the arms is 120°) [ 23 – 26 ]. A model where MutSα slides along DNA and scans for flexible regions corresponding to mismatches has been proposed [ 27 ]. It seems that DNA flexibility, which play a role in various cell processes [ 28 – 30 ], might also be a critical factor to detect mismatches in DNA. In the MutSα/DNA complex, a mismatch is further recognized by phenylalanine 432 and glutamine 434 from conserved Phe-X-Glu motif [ 26 , 31 ]. The interaction of MutSα with DNA is also coupled with its ATPase activity. There are two ATP non-equivalent hydrolytic sites, each located at MSH2 and MSH6 units of MutSα [ 26 ]. The ATP sites are part of the ATPase domains that belong to ABC-transporter superfamily [ 24 , 32 ]. Despite ATP and DNA binding sites being separated by a distance of about 70 Å, allosteric signalling between them exists [ 33 ]. The MutSα protein with ADP bound can diffuse freely on DNA and search for mismatches [ 34 ]. During this stage, MutSα can bind ATP, which is hydrolysed in one subunit faster than in the second one [ 35 ]. The ATP binding initiates the formation of a clamp composed of MSH2 and MSH6 DNA binding domains around the DNA. This clamp is relaxed after ATP hydrolysis so there is an open/close state of MutSα. After MutSα recognizes a mismatch, ATP hydrolysis is suppressed and MSH2 and MSH6 DNA binding domains form a closed clamp. The Phe-X-Glu motif contacts the mismatch and DNA is bent [ 35 ]. This event initiates interaction with the MulL protein and the subsequent repair process [ 36 ]. These conformational processes are probably associated with the release of energy from ATP hydrolysis, but the precise mechanism is not yet known.

To better understand the emergence of mutations in DNA genes, we performed a two-step study. Firstly, we analysed DNA sequences of five genes associated with common inherited disorders where large numbers of different SNVs were reported in the Human Gene Mutation Database (HGMD, http://www.hgmd.cf.ac.uk/ac/index.php ). This database contains known gene mutations responsible for human inherited diseases, plus disease-associated functional polymorphisms. Mutation spectra for various populations are not provided there, i.e. each mutation appears just once in the database with its first reference in literature. We focused on mutations in the PAH gene (associated with hyperphenylalanineamia), LDLR gene (associated with hypercholesterolemia), CFTR gene (associated with cystic fibrosis), and F8 and F9 genes (associated with hemophilia A and B, respectively). In these genes, we identified repeatedly occurring 5-nucleotide (5-nt) sequences that are: i) rarely associated with mutations (coldspots) and ii) frequently associated with mutations (hotspots). Secondly, we investigated the bending properties of two hotspots and two coldspots using advanced computational techniques. Although the parameters characterizing trinucleotide bending with respect to nucleosome and DNase I have been derived [ 37 , 38 ], their utilization for our purpose is limited as we focus on specific DNA deformation with a mismatch base pair induced by the MutSα protein. As shown in a previous study focused on DNA A-tracts, one sequence can behave differently with respect to different deformations [ 29 ]. In particular, we employed Molecular Dynamics (MD) simulations implemented in the AMBER program package [ 39 ] and free energy calculations using the adaptive biasing method (ABF) [ 40 ] enhanced by the multiple walker approach (MWA) [ 41 ]. Based on our calculations, we were able to derive the free energy change needed to bend a straight DNA duplex with a coldspot or hotspot towards the bent geometry of DNA found in the MutSα/DNA complex.

Materials and methods

Analysis of mutations in dna sequences.

We analysed germinal mutations in five genes: PAH , LDLR , CFTR , F8 and F9 based on the HGMD. We focused primarily on mutations in exons. To increase the number of mutations for our analysis, we also considered the intron sequences between the exons where the mutations occurred. Because genes are sequenced in introns to different extents, number of mutations in these parts differs. We included intron segments of 7 nt long for PAH , 4 nt long for LDLR , 9 nt long for CFTR , 6 nt long for F8 and 5 nt long for F9 on both sides of the exons. In addition, 2 nt before the first codon (5’- untranslated region (5’-UTR)) and 2 nt after the last codon (3’- untranslated region (3’-UTR)) were also included for each gene, so that the first and last coding nucleotide occur in the middle of the segments included in the analysis (see below and Fig 1 ). The total lengths of analysed DNA sequences including the introns and 5’-UTR/3’-UTR are indicated in Table 1 .

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

Middle position of each segment is highlighted by a red letter. Coding sequence is in uppercase letters, first two nucleotides (5’-UTR) are in lowercase letters.

https://doi.org/10.1371/journal.pone.0182377.g001

thumbnail

https://doi.org/10.1371/journal.pone.0182377.t001

The DNA sequence of each gene was divided into 5-nt segments ( Fig 1 ). Subsequently, we analysed if the middle positions of the obtained segments (or their complement sequences) contained mutations, e.g. in the case of the AAAAT segment (written from 5’ to 3’ end throughout the text), we also looked for mutations in the complement sequence ATTTT . We created program in Java for this purpose. Four nucleobases combined in five positions allow 1024 unique 5-nt segments to form. Since we considered the segments together with their complements, the total number of unique 5-nt segments (motifs) is only 512. Table 1 shows the number of different 5-nt motifs detected in the analysed DNA sequences. Missense/nonsense and splicing mutations were taken from the HGMD. We started this analysis in 2014 and used HGMD mutation data from this year. To verify our approach, we repeated the analysis with new HGMD mutation data from 2016, which contains about 8% more mutations than in 2014 in the 5 analysed genes (see Table 1 ). For the 2014 and 2016 HGMD dataset, we carried out the following analysis. Various substitutions in the middle position (e.g. C>T , A , G ) were recorded as one mutation. Our idea was to find out if a particular 5-nt segment is associated with mutations or not. These data were statistically evaluated to select representative 5-nt segments frequently containing mutations in the middle position (hotspots), and segments where the middle position rarely contained mutations (coldspots). Statistical evaluation was performed by R package ( www.r-project.org ). The proportional test was applied with an expected 75% probability for a hotspot or 25% for a coldspot, respectively. Individual p-values for a 5-nt segment and its complement in each gene were further combined into a single p-value by the Fisher method. We considered statistically significant segments with Fisher combined p-value < 0.1. Obtained coldspots and hotspots from the 2014 and 2016 dataset were compared. Further, for the 2016 mutation dataset we performed additional statistical analysis where we considered all detected substitutions in the middle position. Each position in this approach can possess three substitutions and one original nucleotide. We again used a proportional test with an expected 75% probability for a hotspot or 25% for a coldspot, respectively. Individual p-values for a 5-nt segment and its complement were combined into a single p-value by the Fisher method.

A nucleosome positioning was analyzed using a novel algorithm that combines a statistical mechanics model and knowledge of periodically occurring dinucleotides in histone octamers [ 42 ].

MD simulations and free energy calculations

To determine the bending properties of DNA with a mismatch base pair, we performed a conformational transition between the straight DNA duplex built by the nab module of AMBER 14 [ 39 ] as a right handed B-DNA and the bent DNA conformation found in the complex with MutSα [ 26 ] (PDB ID: 2O8B). Due to the complexity of the problem, the bending was performed in absence of MutSα. DNA with a mismatch G/T pair built by the nab module was solvated with TIP3P water molecules [ 43 ] in a truncated octahedral periodic box (with minimal distance 10 Å to the walls) and neutralized with sodium counterions [ 44 ] using the xleap module of AMBER, a new version of the DNA force field parmbsc1 [ 45 ] was used. Firstly, the system was equilibrated in three steps. Minimization of the system was performed in 3000 steps, then the system was heated to 300K during 100 ps at a constant volume and finally, a 500 ps long simulation was run at 300 K and 100 kPa at the NpT conditions. After that we ran production dynamics and the system was simulated for 150 ns at the NpT conditions. During the equilibration and production phases, dynamics of terminal base pairs was limited by the wall distance restraints imposed on hydrogen bonds between terminal bases to avoid the formation of flanking bases that would influence the dynamics of the remaining DNA structure. Five restart files from the production MD simulation (each taken after 30 ns) were used as the starting coordinates for the subsequent parallel ABF method accelerated by the MWA [ 40 , 41 , 46 ]. This ensured independence of the starting configurations for ABF simulations. All ABF simulations were performed in the modified PMEMD program from AMBER, connected with PMFLib [ 47 ] implementing both ABF and MWA methods. The total sampling time of one ABF simulation was 200 ns, which provides converged free energy profiles (we also tested 400 ns long simulations that provided close to identical results, see S1 Fig for comparison). The collective variable used in the free energy calculations was the mass weighted root-mean square distance (RMSD) to a target structure, which was derived from the bent X-ray DNA structure. We tested two coldspots and two hotspots that differ in a sequence of the central 5 nt-segment ( Fig 2 ). Therefore, bases in the central 5-nt long segments of the bent X-ray DNA structure were mutated in silico . Possible clashes introduced by changes in the sequence were removed by careful optimization while the overall shape was maintained by positional restraints towards the bent X-ray DNA structure. To avoid possible bias which could be introduced by this in silico procedure, three types of RMSD collective variable were tested. They differ in the definition of atoms, which were employed in the structure superimposition and calculation of RMSD. Set A included all of the entire DNA structure’s heavy atoms corresponding to residues 1 to 30. Set B included all of the heavy atoms from terminal 5-nt long segments and the heavy atoms of the backbone from the central 5-nt long segments corresponding to the AMBER mask notation ((:1–5,11–20,26–30) | (:6–10,21–25@P,OP1,OP2,O3',O5',C3',C4',C5')) and, finally, set C included the DNA backbone heavy atoms corresponding to the AMBER mask notation (:1–30@P,OP1,OP2,O3',O5',C3',C4',C5'). In the bent X-ray DNA structure, the central G/T pair is modestly opened towards minor groove, showing a shear parameter of about 5 Å (this parameter is one of the six helical parameters describing the orientation of bases in a base pair) [ 48 ], while in the relaxed DNA conformations where it is positioned inside the duplex this parameter has a value of about -2.2 Å. Therefore, we imposed weak wall restraints to keep the shear parameter in the range from -10 Å to 0 Å so that the G/T stayed stable inside the DNA duplex without changing the hydrogen pairing during DNA bending. As observed in test simulations, opening of the G/T pair represents irreversible events which influence the calculated free energy profiles. The free energy profiles were calculated for RMSD in the interval from 1.45 to 5.45 Å. The relaxed DNA is shown as a minimum on the free energy profile, representing a stable thermodynamic state. On the contrary, the bent structure is not thermodynamically stable because of the absence of MutSα in our model. Therefore, we had to select a RMSD value which would represent the bend state and serve to compare coldspots and hotspots. To guess a fair estimate, we have analysed behaviour of the relaxed DNA state observed in the unrestrained production dynamics. RMSD towards an average DNA structure was calculated over the entire production dynamics. Depending on the system and the atom set employed in calculating RMSD, the RMSD value fluctuated from 1 to 4 Å, with maximum occurrence at about 1.55 Å for set A and B ( S2 Fig ) . Set C exhibits a slightly higher value of about 1.80 Å. We assumed the similar effect of thermal fluctuations on RMSD deviation from the target bent DNA structures and used a value of 1.55 Å as an RMSD threshold representing the bent DNA.

thumbnail

The sequence corresponds to DNA complexed with MutSα except the central 5-nt segment (highlighted by the colored box) which is either a coldspot (yellow box) or hotspot (pink box).

https://doi.org/10.1371/journal.pone.0182377.g002

We also analyzed bending of DNA duplex with G/T using another collective variable used in the work of Sharma, et al [ 49 ]. This collective variable is represented by the angle between helical arms of the DNA duplex defined by three centers of masses defined by nucleotide residues (2–5, 26–29), (6–10, 21–25), and (11–14, 17–20). This collective variable was tested in a range from 90° to 170°. Bend structures, however, did not resemble X-ray conformation of DNA in complex with MutSα, but rather showed smooth deformations along the helical axis of DNA (see S3 Fig for comparison with the bent X-ray DNA and a selected MD structure from the ABF calculation, where the collective variable was RMSD). Therefore, the bending angle was not used as a collective variable in our study. The MD trajectories were processed using the ptraj module of AMBER and visualized using the VMD program [ 50 ].

Selected mutation motifs (coldspots and hotspots) and their signatures

DNA 5-nt segments with their complements were extracted from five studied genes. The number of identified unique segments in each gene is shown in Table 1 . It can be seen that not all combinations (512) were found in these genes. Even F8 and CFTR genes with the longest DNA sequences do not contain all possible combinations. There are several motifs which occur especially rarely, e.g. CCGCG/CGCGG, CGACG/CGTCG, ATACG/CGTAT, CGCGA/TCGCG, ACGCG/CGCGT, CGCGC/GCGCG . These motifs were detected just in 1 out of 5 genes and are rich in CpG dinucleotide. Their occurrence is most likely suppressed in DNA sequences due to the reasons discussed in the Introduction. S1 and S2 Tables show all segments detected in PAH , LDLR , F8 , F9 and CFTR genes and a number of mutations found in the middle position in the 2014 and 2016 mutation datasets, respectively. With the use of proportional tests and the Fisher method, we calculated the combined p-value for each segment, which indicates its significance to be a coldspot or hotspot. Table 2 shows the top 20 coldspots with the lowest Fisher combined p-values in the 2014 and 2016 datasets and Table 3 shows the top 20 hotspots with the lowest Fisher combined p-values in the 2014 and 2016 datasets.

thumbnail

https://doi.org/10.1371/journal.pone.0182377.t002

thumbnail

https://doi.org/10.1371/journal.pone.0182377.t003

Coldspots derived from the 2014 dataset are almost identical to coldspots derived from the 2016 dataset. In particular, 18 out of 20 coldspots can be found in both datasets. Similarly, 19 hotspots from the 2014 dataset can be found in the 2016 dataset. Thus, although the number of detected mutations increased in the genes during the two years, it did not significantly affect the final selection of coldspots and hotspots.

Herein, we discuss features of coldspots/hotspots found with the 2016 dataset. The most apparent feature of the detected coldspots is the presence of consecutive purines (purine tract) ( Table 2 ). In particular, 18 out of 20 coldspots contain four or five purine tracts. Such a pattern is not seen in the hotspot sequences, where just one motif out of 20 contains a four purine tract ( TGGAA ). The identified hotspots frequently show CpG dinucleotide in the middle position (detected in 8 out of 20 motifs). In the top 20 coldspot sequences, this dinucleotide was not detected. Further, we used a sequence logo tool [ 51 ] to visualize the sequence pattern for the 20 top coldspots and hotspots. Since each motif consists of forward and reverse part, only sequences (either forward or reverse) containing a purine base in the middle position were analysed. It revealed that adenine is the prevailing middle base in coldspots and it is surrounded by either adenines or guanines ( Fig 3 ). In the case of hotspots, the prevailing central purine base is guanine and other positions are highly variable ( Fig 3 ).

thumbnail

https://doi.org/10.1371/journal.pone.0182377.g003

We also used another strategy to detect coldspots and hotspots. In particular, we considered all substitutions in the 5-nt motifs' middle position. The top 20 detected coldspots basically agree with our first analysis, where more substitutions in the middle position were counted as one mutation. In particular, 17 out of 20 coldspots with the lowest p-value were also found based on our first analysis ( Table 2 and S3 Table ). Considering hotspots, the top 20 motifs show rather high combined p-values ( S3 Table ). Based on this second statistical approach, significant hotspots should contain many different substitutions on all of their occurrences in the genes. We observed that this is a rather rare event. As in our first approach, we considered statistically significant motifs with a combined p-value <0.1. There are only five hotspots which satisfy this criterion (first three motifs out of these five were also found with our previous approach ( Table 3 and S3 Table ). Thus, this second statistical approach might be more useful when more data (substitutions) are available in the HGMD.

In Fig 4 , we show a number of mutations (different substitutions counted as one), the number of all substitutions, and occurrences of the top 20 coldspots/hotspots detected using our first statistical approach (results in Tables 2 and 3 ) with the HGMD 2016 dataset. It can be seen that coldspots occur more frequently than hotspots and contain significantly less mutations, particularly, coldspots occur approximately two times more frequently and contain two times less mutations than hotspots. The figure also shows that hotspots more often possess different substitutions than coldspots (see red and blue columns in the Fig 4 ). In addition, for the top 20 coldspots/hotspots we extracted the type of base substitutions. In the case of coldspots where adenine occurs predominantly in the middle position, the most frequent substitution is A→G , while in the hotspots where guanine occurs predominantly in the middle position we observed that G→A is the most frequent substitution ( S4 Fig ). This corresponds to the general assumption that transitions are more frequent than transversions [ 52 ].

thumbnail

https://doi.org/10.1371/journal.pone.0182377.g004

Coldspots and hotspots in the TP53 gene

We analyzed the behaviour of the top twenty coldspots and hotspots selected based on the 5 genes in the TP53 gene, which also contains many different germline mutations along the nucleotide sequence. Germline TP53 mutations were taken from the HGMD database (germline TP53 mutations are also presented in the IARC database http://p53.iarc.fr/ together with TP53 somatic mutations). We analyzed mutations in the exons and included 4-nt long intron segments plus 2 nt before the first codon and 2 nt after the last codon as for analysis of the five genes. The total TP53 length was 1255 nt and there were 302 mutations, see S4 Table . We observed that only 1% of coldspot sequences are associated with mutations while in the case of hotspots it is 21% ( S5 Table ). Regarding sequences which are neither hotspots nor coldspots (they have a combined p-value > 0.1 in both coldspot and hotspot datasets), 12% of them are associated with mutations. This is higher than for coldspots but lower than for hotspots (if we consider all motifs with a combined p-value > 0.1 in both datasets, we also get that 12% of them are associated with mutations). These findings indicate our motif selection can also be applied to other genes.

Bending analysis of coldspots and hotspots

We investigated the bending properties of two coldspots, AAGAA and CAGTG , and two hotspots, AGGTA and TGGAA . The selected coldspots were detected in the statistical analysis of the HGMD 2014 dataset ( Table 2 ). The first coldspot was also identified within the first 20 coldspots in the 2016 dataset, the second one occurs at the 23 rd position in the HGMD 2016 dataset, but its combined Fisher p-value is still very low (9.22E-05). The selected hotspots were both detected in the 2014 and 2016 datasets within the top 20 hotspots ( Table 3 ). We intentionally selected hotspots that do not contain CpG dinucleotide in the middle position. The presence of such hotspots among our top 20 hotspots could be due to high CpG site mutability rather than due to intrinsic flexibility, which was tested in our calculations.

The bending was quantified by the free energy change ( Fig 5 ) that is necessary to bring a relaxed straight DNA to a conformation observed in the bent X-ray DNA structure found in the MutSα/DNA complex. As a collective variable describing the necessary geometrical change, we employed RMSD towards the target bent DNA structure. The relaxed DNA, which is thermodynamically stable state, is exhibited as a minimum on the free energy profiles around 4.5 Å, while the bend DNA lies in the region around 1.55 Å, which is a typical deviation of RMSD from the target structure due to thermal atom fluctuations. The free energy profiles show that the two hotspots with G/T pairs are stiffer than the two coldspots ( Fig 5 ). For instance, the free energy necessary to bend a straight DNA is 16.3 for AGGTA and 16.1 for TGGAA while for AAGAA and CAGTG it is only 14.2 and 14.3 kcal mol -1 , respectively.

thumbnail

https://doi.org/10.1371/journal.pone.0182377.g005

Due to the different composition of central 5-nt long segments, the target bent DNA structures were derived from the experimental bent X-ray structure by in silico mutagenesis. To examine, if this procedure could negatively influence the observed DNA bending properties, we employed three different sets of atoms in the RMSD calculation. The first set (set A) contains all the DNA’s heavy atom ( Fig 5 ), while the atoms of modified nucleobases in the central regions were excluded in the second set (set B). The free energy profiles calculated for set B show nearly the same results ( S5 Fig ) as observed for set A. It indicates that the preparation of the target structure does not alter the bending properties and observed hotspots and coldspot flexibility, hence the flexibility is indeed an intrinsic property of the nucleotide sequence. We also tested the third set (set C), where only the DNA’s backbone atoms were considered ( S6 Fig ). Even here, where nucleobase sequences cannot alter calculation of RMSD and thus the free energy profiles, the difference between coldspots and hotspots is visible. However, the discrimination of coldspots and hotspots is significantly smaller mainly because the bending of DNA was not as sharp as in the two other atom sets.

Nucleosome positioning

Repair of DNA sequences in linker regions is more efficient than in the nucleosomes [ 53 ], therefore we analyzed the putative positioning of nucleosomes on two coldspots ( AAGAA and CAGTG ), and two hotspots ( AGGTA and TGGAA ) in exons of the PAH gene (the same motifs employed in the bending analysis). S7 Fig shows the localization of these motifs in the PAH exons mapped onto the predicted nucleosome positions. We observed a roughly equal distribution of both coldspots and hotspots in the nucleosomes and in the linker regions. In particular, the predicted localization of the selected coldspots and hotspots inside/outside the nucleosome is the following: AAGAA (4/7), CAGTG (6/5), AGGTA (1/3), TGGAA (4/6). These findings indicate no association between the repair process of these motifs and their localization in a chromatin.

In the five genes ( PAH , LDLR , CFTR , F8 , and F9 ) leading to common inherited disorders we detected sequences (mutational motifs) rarely associated with SNVs (coldspots) and frequently associated with SNVs (hotspots). Our approach is based on the analysis of mutations obtained from the HGMD database in the 5-nt segments which repeatedly occur in DNA sequences. It contrasts with the common strategy, where hotspots are derived from a mutation spectrum, which is a distribution of frequencies of every type of mutation along nucleotide sequences of a target gene [ 10 , 54 ]. The occurrence of mutations in the mutation spectra, however, often reflects the particular population where an effect of a common ancestor plays a role, i.e. frequent mutations originate from a single mutation event (founder effect) and are spread throughout the population as observed in our [ 55 ] and other studies [ 56 , 57 ]. Utilizing mutation spectra to detect coldspots is not convenient [ 10 ].

Based on our approach, we observed that the majority of the top 20 coldspots (18 out of 20) contain purine tract, i.e. four to five nt long purine sequence. These sites were associated with a minimum of SNVs. In contrast, the purines tracts were not seen in the top 20 hotspots. These sequences often contain alternating purine -pyrimidine bases. Nine out of the top 20 hotspots showed CpG dinucleotide in the middle position ( Table 3 ), which can explain their higher mutation rate.

We performed MD simulations and free energy calculations to better understand the bending properties of DNA with mutations. We analyzed AAGAA and CAGTG coldspots and AGGTA and TGGAA hotspots, which does not contain CpG ( Fig 2 ). We hypothesized that higher mutability of hotspots could be due to their higher stiffness. Indeed, we observed that the selected coldspots with mismatch G/T are about 2 kcal mol -1 more flexible than the selected hotspots with G/T ( Fig 5 ). This supports the idea that flexible sequences could be more effectively repaired by MMR. We did not analyse bending properties of the DNA duplex with a central canonical pair as there is no experimentally bent structure with this pair that could serve as a target conformation for our calculations.

As mentioned above, the selected coldspots consist of purine tracts where mononucleotide stretches frequently occur, e.g. AAAGA , AAAAA , AGAAA , etc ( Table 2 ). It is known that such sequences tend to cause DNA polymerase slippage during DNA replication, which results in IDLs [ 58 ]. With the use of HGMD we can find a number of IDLs in our coldspots. It is known that the mutation rate of IDLs is lower than that of substitutions, however, homonucleotide tracts represent an exception [ 59 ]. We assume that associating coldspots with IDLs is due to the increased mutation rate of these sites for this error type, even though mismatches in these sequences might be effectively repaired. The rate of mutations in IDLs can be further modulated by the length of a repetitive tract and base composition [ 60 – 62 ].

In summary, our study detected DNA mutation motifs rarely associated with germinal SNVs (coldspots) and motifs which are very frequently associated with SNVs (hotspots). For two selected coldspots it was shown that they are inherently more flexible than hotspots. To conclude that coldspots are generally more flexible than hotspots, so they could be more effectively repaired by MMR, the bending properties of more systems with various mismatches have to be investigated. We are going to analyse other sequences in the next study. We would also like to focus on hotspots with CpG dinucleotide. It is possible that certain sequences with CpG will be flexible while others can be very stiff, which would make them super-hotspots in combination with the high mutability of CpG . We assume that knowledge of DNA motifs, which are extremely ineffectively repaired, can help to identify potentially causal mutations in introns. For these mutations, analysis of transcripts or minigene assays could be done to reveal their impact [ 63 , 64 ]. More importantly, this is essential in cases when no mutations for patients with a clear clinical phenotype are detected.

Supporting information

S1 table. dna motifs (5-nt segments and their complements) detected in 5 genes and a number of mutations associated with middle positions in hgmd 2014 dataset..

https://doi.org/10.1371/journal.pone.0182377.s001

S2 Table. DNA motifs (5-nt segments and their complements) detected in 5 genes and number of mutations associated with middle positions in HGMD 2016 dataset.

https://doi.org/10.1371/journal.pone.0182377.s002

S3 Table. Top twenty coldspots and hotspots in 2016 dataset where all substitutions were taken into account.

https://doi.org/10.1371/journal.pone.0182377.s003

S4 Table. Updated Table 1 about TP53 gene.

https://doi.org/10.1371/journal.pone.0182377.s004

S5 Table. Number of occurrences and mutations of top 20 coldspots and top 20 hotspots in the TP53 gene.

https://doi.org/10.1371/journal.pone.0182377.s005

S1 Fig. Comparison of free energy profiles of 200 ns and 400 ns long ABF calculations run for two coldspots and two hotspots with G/T mismatch where we used set A for calculation of RMSD.

https://doi.org/10.1371/journal.pone.0182377.s006

S2 Fig. Distribution of RMSD values observed for a relaxed DNA in the unrestrained production dynamics of tested coldspots and hotspots.

The distribution is calculated with respect to an average DNA structure. Set A (red) and B (blue) show a maximum at 1.55 Å while set C (green) exhibits a maximum shifted to 1.8 Å.

https://doi.org/10.1371/journal.pone.0182377.s007

A) X-ray DNA structure from DNA/MutSα complex. B) Snapshot MD structure from ABF simulation where we used set B for calculation of RMSD, actual RMSD value of the snapshot is 1.55 Å. C) Snapshot MD structure from ABF simulation where collective variable was angle among three mass centres, actual angle value of the snapshot is 98°.

https://doi.org/10.1371/journal.pone.0182377.s008

S4 Fig. Nucleotide substitutions detected in the middle position in top 20 coldspots (left) and hotspots (right).

Each column shows individual substitutions in the motif analyzed in the 5 genes. Nucleotides with percentage indicate total sum of particular base substitution.

https://doi.org/10.1371/journal.pone.0182377.s009

S5 Fig. Free energy profiles for two coldspots and two hotspots with G/T pair where we used set B for calculation of RMSD.

At 1.55 Å the free energy change for AGGTA , TGGAA , AAGAA and CAGTG is 14.1, 14.2, 12.8, and 11.8 kcal mol -1 , respectively.

https://doi.org/10.1371/journal.pone.0182377.s010

S6 Fig. Free energy profiles for two coldspots and two hotspots with G/T pair where we used set C for calculation of RMSD.

At 1.55 Å the free energy for AGGTA , TGGAA , AAGAA and CAGTG is 19.5, 18.7, 17.9, and 17.5 kcal mol -1 , respectively.

https://doi.org/10.1371/journal.pone.0182377.s011

S7 Fig. Predicted positioning of nucleosomes on two coldspots (C), AAGAA and CAGTG , and two hotspots (H), AGGTA and TGGAA , in the exons of the PAH gene.

Nucleosome positions are visualized by yellow peaks, exon positions are colored by grey bars. Each plot shows 1000-nt long fragment. Motifs were considered on nucleosome with occupancy > 0.005.

https://doi.org/10.1371/journal.pone.0182377.s012

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 13. Cooper DN, Krawczak M (1993) Human Gene Mutation. Oxford: BIOS Scientific Publishers.
  • 28. Drsata T, Lankas F (2013) Theoretical models of DNA flexibility. Wiley Interdisciplinary Reviews-Computational Molecular Science. pp. 355–363.
  • 39. Case DA, Babin V, Berryman JT, Betz RM, Cai Q, Cerutti DS, et al., AMBER 14. University of California: San Francisco. 2014.
  • 47. Kulhanek P, Stepan J, Fuxreiter M, Mones L, Strelcova Z, Petrek M, PMFLib—A Toolkit for Free Energy Calculations; https://lcc.ncbr.muni.cz/whitezone/development/pmflib/index.html . 2013.
  • 52. Graur D, Li WH (2000) Fundamentals of molecular evolution. Sunderland, MA: Sinauer Associates. Sunderland, MA.

microcephaly

Research breakthrough on birth defect affecting brain size

UC Riverside-led study identifies molecular cellular mechanism linked to microcephaly

Nonsense-mediated RNA decay, or NMD, is an evolutionarily conserved molecular mechanism in which potentially defective messenger RNAs, or mRNAs (genetic material that instructs the body on how to make proteins), are degraded. Disruption of the NMD pathway can lead to neurological disorders, immune diseases, cancers, and other pathologies. Mutations in human NMD regulators are seen in neurodevelopmental disorders, including autism and intellectual disability.

Sika Zheng

Why NMD mutations are enriched in neurodevelopmental disorders remains a mystery. Sika Zheng, a professor of biomedical sciences in the School of Medicine and the founding director of the Center for RNA Biology and Medicine at the University of California, Riverside, has now led a study , published in the journal Neuron, that reveals the molecular cellular mechanism underlying NMD regulation of brain size and its dysregulation in causing microcephaly — a condition in which a baby’s head is much smaller than expected.

The team’s finding suggests that maintaining the neuronal NMD function is essential for early brain development to prevent microcephaly. According to Zheng, modulating NMD targets can be a potential treatment for microcephaly and other related neurodevelopmental diseases.

The study explains the functional roles of NMD in brain development and the underlying mechanistic action. It also demonstrates for the first time the link between mRNA decay regulation and brain size control. Additionally, it reveals the intricate connection between NMD and the most famous tumor suppressor gene, p53, suggesting possible new connections between NMD and cancer.

The research was supported by grants from the National Institutes of Health and the California Institute of Regenerative Medicine. The title of the research paper is “Epistatic Interactions between NMD and TRP53 Control Progenitor Cell Maintenance and Brain Size.” Zheng was joined in the study by Liang Chen of the University of Southern California, Chun-Wei Chen of the City of Hope, Gene Yeo of UC San Diego, and members of their labs. 

Below, Zheng answers questions about the research:

Q: Why has it been a challenge to understand why NMD mutations are enriched in neurodevelopmental disorder?

A: As a surveillance mechanism, NMD targets defective mRNA arising from random mutations or RNA processing errors. This randomness is not expected to create idiosyncratic features of neurodevelopmental disorders associated with NMD factors. Furthermore, NMD occurs in all cell types and tissues, but the brain seems particularly vulnerable to NMD defects. Animal models of neural-specific NMD defects, followed by in-depth mechanistic investigation, are needed to understand why NMD mutations are enriched in neurodevelopmental disorder, which has not been conducted until now in part because of the complexity of brain development and the technical challenges of dissecting the functional NMD substrates.

Q: How did you identify the molecular cellular mechanism underlying NMD regulation of brain size and its dysregulation causing microcephaly?

A: We generated various NMD deficiency animal models by knocking out a key NMD factor, Upf2 , in different cell types and determined their phenotypic differences. We found NMD deficiency causes more impacts on the proliferative neural progenitors, which is consistent with the notion that NMD function is essential for early brain development and its mutations are enriched in neurodevelopmental disorders. Importantly, we showed NMD deficiency in progenitor cells cause microcephaly, a novel finding that links an NMD decay pathway to brain size control. To dissect the underlying mechanisms and determine functional NMD substrates causing these phenotypes, we applied cutting-edge technologies, including CRISPR screening, RNA-Seq, CLIP-seq, and single cell RNA-seq. Only through this integrative approach did we succeed in revealing the molecular underpinning. 

Once we found the microcephaly phenotype, we used primary cell culture models to define the growth defects of Upf2 knockout, or Upf2 KO, in neural progenitors. Next, we applied CRISPRi screening to identify genes whose perturbation can interfere with Upf2 KO NPCs’ growth defect. Among hundreds of genes we screened, we identified a few candidates and followed on the top candidate, p53, which is well known to be a tumor suppressor gene that controls cell growth. Subsequently, we used various single cell approaches and transcriptomics technologies to determine precisely how NMD and p53 intersect in controlling cell growth.

The identification of p53 from the CRISPRi screen was a surprise to us because p53 is one of the most studied genes and many assume its genetic interactors are already identified and NMD has not made that list. Secondly, we showed deficiency in Trp53 , a gene that encodes p53 (or TRP53), did not globally reverse the NMD inhibition to rescue the growth defects. Instead, our finding suggests that the TRP53 activity intersects with NMD pathway to regulate the neuronal progenitors’ cell cycle progression. This intersection represents a very small proportion of the NMD substrates, indicating that most NMD substrates have marginal functional impacts on cell growth.

Q: The study was done using mouse cells. Can we say the results would be replicated in human cells?

A: Yes. NMD pathway is evolutionarily conserved and the key factor we focused on in the study, UPF2, is conserved between human and mouse. Other molecular players we identified in this study are also conserved. Their regulatory relationship has been replicated in human cells.

Q: What’s next for the research team?

A: We plan to build human stem cells carrying NMD mutations, so that we can modify NMD activity with drugs or regulate NMD targets to rescue phenotypes in human neurons. Additionally, we will explore whether NMD’s novel role contributes to the expansion of brain size during evolution.

Header image credit:  corbac40 /iStock/Getty Images Plus.

Related Stories

Tiny motor made from crystallized molecules

Molecular crystal motors move when exposed to light

Morris Maduro gets ready to watch solar eclipse

Extracurricular: Chasing this year’s spectacular solar eclipse

Virtual village

Virtual villages can promote social engagement and mental wellbeing

Atom

Physics research puts UCR on landscape of particle manipulation

Kent State University logo

  • FlashLine Login
  • Phone Directory
  • Maps & Directions
  • About Overview
  • Administration
  • Advisory Board
  • Alumni Relations
  • Campus Advisories
  • Campus History
  • Facts & Figures
  • Faculty Resources
  • Media & News
  • Offices & Departments
  • Staff Resources
  • Support the Campus
  • Academics Overview
  • Academic Departments
  • Academic Support Services
  • Advising Services
  • Class Schedules & Final Exams
  • Degrees, Majors & Minors
  • Global Education Initiatives
  • Graduation & Commencement
  • Honors Program
  • Summer Sessions
  • Writing Center
  • Admissions Overview
  • Admission Types & Tips
  • Transfer Students
  • Campus Tours
  • Admissions Events
  • Admissions Staff
  • Admissions Appointments
  • Newly Admitted Students
  • First Step: First-Year Advising & Registration
  • Senior Guest Program
  • For School Counselors
  • College Credit Plus
  • Financial Aid
  • Rising Scholars
  • Scholarships
  • Tuition & Fees
  • Ohio College Comeback
  • Life at Stark Overview
  • Annual Security Report
  • Campus Events
  • Computer Services
  • First Year Experience
  • Flash Bistro
  • Mental Health Resources & Support
  • Parking Services
  • Recreation & Wellness
  • Student Involvement & Organizations
  • Student Support Services
  • Veterans Services
  • Corporate University & Conference Center Overview
  • Corporate University
  • Conference Center
  • Directions & Maps
  • Locations Overview
  • East Liverpool
  • Regional Campuses
  • Other U.S. Academic Locations
  • Kent State Worldwide
  • Campus Safety Overview
  • ALICE Workshops
  • Fingerprinting & Background Checks
  • Step Up Speak Out

Martin Walschburger Hurtado undergraduate student research

Martin Walschburger Hurtado Studying Stone Man Disease through Research Collaboration between Kent State Stark and Walsh University

  • Share on Facebook
  • Share on Twitter
  • " class="social-sharing-google" title="Share on Google+"> Share on Google+
  • Share on LinkedIn
  • Share by Email

For the past seven years, Dr. Dinah Qutob of Kent State University at Stark and Dr. Adam Underwood from Walsh University have partnered to guide Stark County undergraduate students pursuing careers in biology, instilling in them a comprehensive understanding of how biological networks transition between health and disease states. Currently, they are collaborating through the Undergraduate Research Program to study the molecular mechanisms underlying rare diseases. Their recent work involves student research on the impact of specific proteins in cancer, as well as a genetic mutation that causes stone man disease.

Qutob and Underwood teach and mentor groups of up to 10 Kent State and Walsh students each year, providing them with essential skills needed for post-graduate programs in biomedical specializations such as neurobiology, immunology, molecular pharmacology and therapeutics. The students contribute as co-authors on peer-reviewed scientific papers and give presentations at regional and national conferences to showcase the impact of their work. The faculty pair recently applied for the National Institute of Health Research Enhancement Award (R15) which, if granted, will provide continued funding of the program. The NIH is the largest funder of medical research in the world.

The duo mentors a diverse student population and also prioritizes inclusivity with research to make a positive impact on the global community.

“It’s truly inspiring to witness the mutual support among our students. Our more senior, experienced students step up to mentor and train incoming students,” Qutob noted. “We have a diverse representation of undergraduates from various backgrounds, all united in their dedication to hard work and teamwork.”

Martin Walschburger Hurtado at research conference

One standout student in the program is Martin Walschburger Hurtado who graduated with a Bachelor of Science degree in organismal biology last December and aspires to earn a doctorate degree in mycology, the study of fungi.

Martin focused his research on the ACVR1 gene that is implicated in the pathogenesis of Fibrodysplasia Ossificans Progressiva (FOP). FOP, more commonly known as stone man disease, is the result of a rare genetic mutation within the ACVR1 gene that causes soft tissues such as muscle, tendons, cartilage, ligaments and connective tissue to undergo abnormal ossification and form more bone tissue. This disease has serious and potentially life-threatening complications, such as the progressive fusion of joints, and seems to occur most often in people from the United States. While there are currently no known treatments, fully understanding FOP is crucial to eventually developing effective treatments.

The research on this genetic mutation is still in the beginning stage, yet the team already effectively cloned the mutant and native forms of the ACVR1 gene into an expression vector, and successfully transfected eukaryotic cell lines. This helps researchers better understand the mutation’s biological implications in the development of life-altering complications and, hopefully, lead them to developing possible preventative measures.

Martin’s experience in the research program provided numerous lessons for him. “It shows that, while things are hard and even complicated, working, enduring and persevering through them allows you to accomplish things beyond what you thought you were capable of,” he reflected. “It’s fun and if you enjoy a challenge, I encourage people to go into it and experience it. It’s nice to have faculty that supports you and that you can work with, because that’s the greatest way to get undergraduate research under your belt.”

This undergraduate research project is also attracting attention from others in the scientific community, including two scientists from the Federal University of Minas Gerais in Belo Horizonte, Brazil: Dr. Helen Lima del Puerto, an expert in SOX research, and Dr. Almir Martins, specializing in ACVR1 research. They are currently collaborating with the program and guiding participants by sharing their experiences with scientific inquiry and research.

Mailing Address

Street address.

  • 330-499-9600
  • [email protected]
  • Kent State Kent Campus - facebook
  • Kent State Kent Campus - twitter
  • Kent State Kent Campus - youtube
  • Kent State Kent Campus - instagram
  • Kent State Kent Campus - flickr
  • Kent State Kent Campus - linkedin
  • Campus Safety
  • Jobs & Employment
  • For Faculty
  • For Our Alumni
  • Privacy Statement
  • University DACA Response
  • Website Feedback Form

ScienceDaily

Discovering cancers of epigenetic origin without DNA mutation

A research team including scientists from the CNRS 1 has discovered that cancer, one of the leading causes of death worldwide, can be caused entirely by epigenetic changes 2 , in other words, changes that contribute to how gene expression is regulated, and partly explain why, despite an identical genome, an individual develops very different cells (neurons, skin cells, etc.).

While studies have already described the influence of these processes in the development of cancer, this is the first time that scientists have demonstrated that genetic mutations are not essential for the onset of the disease. This discovery forces us to reconsider the theory that, for more than 30 years, has assumed that cancers are predominantly genetic diseases caused necessarily by DNA mutations that accumulate at the genome level 3 .

To show this, the research team focused on epigenetic factors that can alter gene activity. By causing epigenetic dysregulation 4 in Drosophila, and then restoring the cells to their normal state, scientists have found that part of the genome remains dysfunctional. This phenomenon induces a tumour state that is maintained autonomously and continues to progress, keeping in memory the cancerous status of these cells even though the signal that caused it has been restored.

These conclusions, to be published on April 24, 2024, in the journal Nature, open up new therapeutic avenues in oncology.

1 -- Working at the Institut de Génétique Humaine (CNRS/Université de Montpellier).

2 -- Epigenetics is the study of the mechanisms that allow the inheritance of different gene expression profiles in the presence of the same DNA sequence.

3 -- The genome is defined as the set of genetic material -- and therefore the entire DNA sequence -- contained in a cell or organism.

4 -- Scientists focused on epigenetic factors called Polycomb proteins, which regulate the expression of key genes, and are dysregulated in many human cancers. When these proteins are experimentally removed, the activity of the targeted genes is disrupted: some can activate their own transcription and self-maintain. When Polycomb proteins are integrated back into the cell, a subset of the genes are resistant to the proteins and remain dysregulated through cell division, allowing the cancer to continue its progression.

  • Human Biology
  • Epigenetics
  • Personalized Medicine
  • Brain Tumor
  • Gene Therapy
  • Prostate cancer
  • Gene therapy
  • Malignant melanoma
  • Ovarian cancer
  • Somatic cell

Story Source:

Materials provided by CNRS . Note: Content may be edited for style and length.

Journal Reference :

  • V. Parreno, V. Loubiere, B. Schuettengruber, L. Fritsch, C. C. Rawal, M. Erokhin, B. Győrffy, D. Normanno, M. Di Stefano, J. Moreaux, N. L. Butova, I. Chiolo, D. Chetverina, A.-M. Martinez, G. Cavalli. Transient loss of Polycomb components induces an epigenetic cancer fate . Nature , 2024; DOI: 10.1038/s41586-024-07328-w

Cite This Page :

Explore More

  • Simulations Support Dark Matter Theory
  • 3D Printed Programmable Living Materials
  • Emergence of Animals: Magnetic Field Collapse
  • Ice Shelves Crack from Weight of Meltwater Lakes
  • Countries' Plans to Remove CO2 Not Enough
  • Toward Robots With Human-Level Touch Sensitivity
  • 'Doubling' in Origin of Cancer Cells
  • New Catalyst for Using Captured Carbon
  • Random Robots Are More Reliable
  • Significant Discovery in Teleportation Research

Trending Topics

Strange & offbeat.

Disclaimer: Early release articles are not considered as final versions. Any changes will be reflected in the online version in the month the article is officially released.

Volume 30, Number 7—July 2024

Highly Pathogenic Avian Influenza A(H5N1) Clade 2.3.4.4b Virus Infection in Domestic Dairy Cattle and Cats, United States, 2024

Suggested citation for this article

We report highly pathogenic avian influenza A(H5N1) virus in dairy cattle and cats in Kansas and Texas, United States, which reflects the continued spread of clade 2.3.4.4b viruses that entered the country in late 2021. Infected cattle experienced nonspecific illness, reduced feed intake and rumination, and an abrupt drop in milk production, but fatal systemic influenza infection developed in domestic cats fed raw (unpasteurized) colostrum and milk from affected cows. Cow-to-cow transmission appears to have occurred because infections were observed in cattle on Michigan, Idaho, and Ohio farms where avian influenza virus–infected cows were transported. Although the US Food and Drug Administration has indicated the commercial milk supply remains safe, the detection of influenza virus in unpasteurized bovine milk is a concern because of potential cross-species transmission. Continued surveillance of highly pathogenic avian influenza viruses in domestic production animals is needed to prevent cross-species and mammal-to-mammal transmission.

Highly pathogenic avian influenza (HPAI) viruses pose a threat to wild birds and poultry globally, and HPAI H5N1 viruses are of even greater concern because of their frequent spillover into mammals. In late 2021, the Eurasian strain of H5N1 (clade 2.3.4.4b) was detected in North America ( 1 , 2 ) and initiated an outbreak that continued into 2024. Spillover detections and deaths from this clade have been reported in both terrestrial and marine mammals in the United States ( 3 , 4 ). The detection of HPAI H5N1 clade 2.3.4.4b virus in severe cases of human disease in Ecuador ( 5 ) and Chile ( 6 ) raises further concerns regarding the pandemic potential of specific HPAI viruses.

In February 2024, veterinarians were alerted to a syndrome occurring in lactating dairy cattle in the panhandle region of northern Texas. Nonspecific illness accompanied by reduced feed intake and rumination and an abrupt drop in milk production developed in affected animals. The milk from most affected cows had a thickened, creamy yellow appearance similar to colostrum. On affected farms, incidence appeared to peak 4–6 days after the first animals were affected and then tapered off within 10–14 days; afterward, most animals were slowly returned to regular milking. Clinical signs were commonly reported in multiparous cows during middle to late lactation; ≈10%–15% illness and minimal death of cattle were observed on affected farms. Initial submissions of blood, urine, feces, milk, and nasal swab samples and postmortem tissues to regional diagnostic laboratories did not reveal a consistent, specific cause for reduced milk production. Milk cultures were often negative, and serum chemistry testing showed mildly increased aspartate aminotransferase, gamma-glutamyl transferase, creatinine kinase, and bilirubin values, whereas complete blood counts showed variable anemia and leukocytopenia.

In early March 2024, similar clinical cases were reported in dairy cattle in southwestern Kansas and northeastern New Mexico; deaths of wild birds and domestic cats were also observed within affected sites in the Texas panhandle. In > 1 dairy farms in Texas, deaths occurred in domestic cats fed raw colostrum and milk from sick cows that were in the hospital parlor. Antemortem clinical signs in affected cats were depressed mental state, stiff body movements, ataxia, blindness, circling, and copious oculonasal discharge. Neurologic exams of affected cats revealed the absence of menace reflexes and pupillary light responses with a weak blink response.

On March 21, 2024, milk, serum, and fresh and fixed tissue samples from cattle located in affected dairies in Texas and 2 deceased cats from an affected Texas dairy farm were received at the Iowa State University Veterinary Diagnostic Laboratory (ISUVDL; Ames, IA, USA). The next day, similar sets of samples were received from cattle located in affected dairies in Kansas. Milk and tissue samples from cattle and tissue samples from the cats tested positive for influenza A virus (IAV) by screening PCR, which was confirmed and characterized as HPAI H5N1 virus by the US Department of Agriculture National Veterinary Services Laboratory. Detection led to an initial press release by the US Department of Agriculture Animal and Plant Health Inspection Service on March 25, 2024, confirming HPAI virus in dairy cattle ( 7 ). We report the characterizations performed at the ISUVDL for HPAI H5N1 viruses infecting cattle and cats in Kansas and Texas.

Materials and Methods

Milk samples (cases 2–5) and fresh and formalin-fixed tissues (cases 1, 3–5) from dairy cattle were received at the ISUVDL from Texas on March 21 and from Kansas on March 22, 2024. The cattle exhibited nonspecific illness and reduced lactation, as described previously. The tissue samples for diagnostic testing came from 3 cows that were euthanized and 3 that died naturally; all postmortem examinations were performed on the premises of affected farms.

The bodies of 2 adult domestic shorthaired cats from a north Texas dairy farm were received at the ISUVDL for a complete postmortem examination on March 21, 2024. The cats were found dead with no apparent signs of injury and were from a resident population of ≈24 domestic cats that had been fed milk from sick cows. Clinical disease in cows on that farm was first noted on March 16; the cats became sick on March 17, and several cats died in a cluster during March 19–20. In total, >50% of the cats at that dairy became ill and died. We collected cerebrum, cerebellum, eye, lung, heart, spleen, liver, lymph node, and kidney tissue samples from the cats and placed them in 10% neutral-buffered formalin for histopathology.

At ISUVDL, we trimmed, embedded in paraffin, and processed formalin-fixed tissues from affected cattle and cats for hematoxylin/eosin staining and histologic evaluation. For immunohistochemistry (IHC), we prepared 4-µm–thick sections from paraffin-embedded tissues, placed them on Superfrost Plus slides (VWR, https://www.vwr.com ), and dried them for 20 minutes at 60°C. We used a Ventana Discovery Ultra IHC/ISH research platform (Roche, https://www.roche.com ) for deparaffinization until and including counterstaining. We obtained all products except the primary antibody from Roche. Automated deparaffination was followed by enzymatic digestion with protease 1 for 8 minutes at 37°C and endogenous peroxidase blocking. We obtained the primary influenza A virus antibody from the hybridoma cell line H16-L10–4R5 (ATCC, https://www.atcc.org ) and diluted at 1:100 in Discovery PSS diluent; we incubated sections with antibody for 32 minutes at room temperature. Next, we incubated the sections with a hapten-labeled conjugate, Discovery anti-mouse HQ, for 16 minutes at 37°C followed by a 16-minute incubation with the horse radish peroxidase conjugate, Discovery anti-HQ HRP. We used a ChromoMap DAB kit for antigen visualization, followed by counterstaining with hematoxylin and then bluing. Positive controls were sections of IAV-positive swine lung. Negative controls were sections of brain, lung, and eyes from cats not infected with IAV.

We diluted milk samples 1:3 vol/vol in phosphate buffered saline, pH 7.4 (Gibco/Thermo Fisher Scientific, https://www.thermofisher.com ) by mixing 1 unit volume of milk and 3 unit volumes of phosphate buffered saline. We prepared 10% homogenates of mammary glands, brains, lungs, spleens, and lymph nodes in Earle’s balanced salt solution (Sigma-Aldrich, https://www.sigmaaldrich.com ). Processing was not necessary for ocular fluid, rumen content, or serum samples. After processing, we extracted samples according to a National Animal Health Laboratory Network (NAHLN) protocol that had 2 NAHLN-approved deviations for ISUVDL consisting of the MagMax Viral RNA Isolation Kit for 100 µL sample volumes and a Kingfisher Flex instrument (both Thermo Fisher Scientific).

We performed real-time reverse transcription PCR (rRT-PCR) by using an NAHLN-approved assay with 1 deviation, which was the VetMAX-Gold SIV Detection kit (Thermo Fisher Scientific), to screen for the presence of IAV RNA. We tested samples along with the VetMAX XENO Internal Positive Control to monitor the possible presence of PCR inhibitors. Each rRT-PCR 96-well plate had 2 positive amplification controls, 2 negative amplification controls, 1 positive extraction control, and 1 negative extraction control. We ran the rRT-PCR on an ABI 7500 Fast thermocycler and analyzed data with Design and Analysis Software 2.7.0 (both Thermo Fisher Scientific). We considered samples with cycle threshold (Ct) values <40.0 to be positive for virus.

After the screening rRT-PCR, we analyzed IAV RNA–positive samples for the H5 subtype and H5 clade 2.3.4.4b by using the same RNA extraction and NAHLN-approved rRT-PCR protocols as described previously, according to standard operating procedures. We performed PCR on the ABI 7500 Fast thermocycler by using appropriate controls to detect H5-specific IAV. We considered samples with Ct values <40.0 to be positive for the IAV H5 subtype.

We conducted genomic sequencing of 2 milk samples from infected dairy cattle from Texas and 2 tissue samples (lung and brain) from cats that died at a different Texas dairy. We subjected the whole-genome sequencing data to bioinformatics analysis to assemble the 8 different IAV segment sequences according to previously described methods ( 8 ). We used the hemagglutinin (HA) and neuraminidase (NA) sequences for phylogenetic analysis. We obtained reference sequences for the HA and NA segments of IAV H5 clade 2.3.4.4 from publicly available databases, including GISAID ( https://www.gisaid.org ) and GenBank. We aligned the sequences by using MAFFT version 7.520 software ( https://mafft.cbrc.jp/alignment/server/index.html ) to create multiple sequence alignments for subsequent phylogenetic analysis. We used IQTree2 ( https://github.com/iqtree/iqtree2 ) to construct the phylogenetic tree from the aligned sequences. The software was configured to automatically identify the optimal substitution model by using the ModelFinder Plus option, ensuring the selection of the most suitable model for the dataset and, thereby, improving the accuracy of the reconstructed tree. We visualized the resulting phylogenetic tree by using iTOL ( https://itol.embl.de ), a web-based platform for interactive tree exploration and annotation.

Gross Lesions in Cows and Cats

All cows were in good body condition with adequate rumen fill and no external indications of disease. Postmortem examinations of the affected dairy cows revealed firm mammary glands typical of mastitis; however, mammary gland lesions were not consistent. Two cows that were acutely ill before postmortem examination had grossly normal milk and no abnormal mammary gland lesions. The gastrointestinal tract of some cows had small abomasal ulcers and shallow linear erosions of the intestines, but those observations were also not consistent in all animals. The colon contents were brown and sticky, suggesting moderate dehydration. The feces contained feed particles that appeared to have undergone minimal ruminal fermentation. The rumen contents had normal color and appearance but appeared to have undergone minimal fermentation.

The 2 adult cats (1 intact male, 1 intact female) received at the ISUVDL were in adequate body and postmortem condition. External examination was unremarkable. Mild hemorrhages were observed in the subcutaneous tissues over the dorsal skull, and multifocal meningeal hemorrhages were observed in the cerebrums of both cats. The gastrointestinal tracts were empty, and no other gross lesions were observed.

Microscopic Lesions in Cows and Cats

Mammary gland lesions in cattle in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. A, B) Mammary gland tissue sections stained with hematoxylin and eosin. A) Arrowheads indicate segmental loss within open secretory mammary alveoli. Original magnification ×40. B) Arrowheads indicate epithelial degeneration and necrosis lining alveoli with intraluminal sloughing. Asterisk indicates intraluminal neutrophilic inflammation. Original magnification ×400. C, D) Mammary gland tissue sections stained by using avian influenza A immunohistochemistry. C) Brown staining indicates lobular distribution of avian influenza A virus. Original magnification ×40. D) Brown staining indicates strong nuclear and intracytoplasmic immunoreactivity of intact and sloughed epithelial cells within mammary alveoli. Original magnification ×400.

Figure 1 . Mammary gland lesions in cattle in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. A, B) Mammary gland...

The chief microscopic lesion observed in affected cows was moderate acute multifocal neutrophilic mastitis ( Figure 1 ); however, mammary glands were not received from every cow. Three cows had mild neutrophilic or lymphocytic hepatitis. Because they were adult cattle, other observed microscopic lesions (e.g., mild lymphoplasmacytic interstitial nephritis and mild to moderate lymphocytic abomasitis) were presumed to be nonspecific, age-related changes. We did not observe major lesions in the other evaluated tissues. We performed IHC for IAV antigen on all evaluated tissues; the only tissues with positive immunoreactivity were mastitic mammary glands from 2 cows that showed nuclear and cytoplasmic labeling of alveolar epithelial cells and cells within lumina ( Figure 1 ) and multifocal germinal centers within a lymph node from 1 cow ( Table 1 ).

Lesions in cat tissues in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. Tissue sections were stained with hematoxylin and eosin; insets show brown staining of avian influenza A viruses via immunohistochemistry by using the chromogen 3,3′-diaminobenzidine tetrahydrochloride. Original magnification ×200 for all images and insets. A) Section from cerebral tissue. Arrowheads show perivascular lymphocytic encephalitis, gliosis, and neuronal necrosis. Inset shows neurons. B) Section of lung tissue showing lymphocytic and fibrinous interstitial pneumonia with septal necrosis and alveolar edema; arrowheads indicate lymphocytes. Inset shows bronchiolar epithelium, necrotic cells, and intraseptal mononuclear cells. C) Section of heart tissue. Arrowhead shows interstitial lymphocytic myocarditis and focal peracute myocardial coagulative necrosis. Inset shows cardiomyocytes. D) Section of retinal tissue. Arrowheads show perivascular lymphocytic retinitis with segmental neuronal loss and rarefaction in the ganglion cell layer. Asterisks indicate attenuation of the inner plexiform and nuclear layers with artifactual retinal detachment. Insets shows all layers of the retina segmentally within affected areas have strong cytoplasmic and nuclear immunoreactivity to influenza A virus.

Figure 2 . Lesions in cat tissues in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. Tissue sections were stained with...

Both cats had microscopic lesions consistent with severe systemic virus infection, including severe subacute multifocal necrotizing and lymphocytic meningoencephalitis with vasculitis and neuronal necrosis, moderate subacute multifocal necrotizing and lymphocytic interstitial pneumonia, moderate to severe subacute multifocal necrotizing and lymphohistiocytic myocarditis, and moderate subacute multifocal lymphoplasmacytic chorioretinitis with ganglion cell necrosis and attenuation of the internal plexiform and nuclear layers ( Table 2 ; Figure 2 ). We performed IHC for IAV antigen on multiple tissues (brain, eye, lung, heart, spleen, liver, and kidney). We detected positive IAV immunoreactivity in brain (intracytoplasmic, intranuclear, and axonal immunolabeling of neurons), lung, and heart, and multifocal and segmental immunoreactivity within all layers of the retina ( Figure 2 ).

PCR Data from Cows and Cats

We tested various samples from 8 clinically affected mature dairy cows by IAV screening and H5 subtype-specific PCR ( Table 3 ). Milk and mammary gland homogenates consistently showed low Ct values: 12.3–16.9 by IAV screening PCR, 17.6–23.1 by H5 subtype PCR, and 14.7–20.0 by H5 2.3.4.4 clade PCR (case 1, cow 1; case 2, cows 1 and 2; case 3, cow 1; and case 4, cow 1). We forwarded the samples to the National Veterinary Services Laboratory, which confirmed the virus was an HPAI H5N1 virus strain.

When available, we also tested tissue homogenates (e.g., lung, spleen, and lymph nodes), ocular fluid, and rumen contents from 6 cows by IAV and H5 subtype-specific PCR ( Table 3 ). However, the PCR findings were not consistent. For example, the tissue homogenates and ocular fluid tested positive in some but not all cows. In case 5, cow 1, the milk sample tested negative by IAV screening PCR, but the spleen homogenate tested positive by IAV screening, H5 subtype, and H5 2.3.4.4 PCR. For 2 cows (case 3, cow 1; and case 4, cow 1) that had both milk and rumen contents available, both samples tested positive for IAV. Nevertheless, all IAV-positive nonmammary gland tissue homogenates, ocular fluid, and rumen contents had markedly elevated Ct values in contrast to the low Ct values for milk and mammary gland homogenate samples.

We tested brain and lung samples from the 2 cats (case 6, cats 1 and 2) by IAV screening and H5 subtype-specific PCR ( Table 3 ). Both sample types were positive by IAV screening PCR; Ct values were 9.9–13.5 for brain and 17.4–24.4 for lung samples, indicating high amounts of virus nucleic acid in those samples. The H5 subtype and H5 2.3.4.4 PCR results were also positive for the brain and lung samples; Ct values were consistent with the IAV screening PCR ( Table 3 ).

Phylogenetic Analyses

We assembled the sequences of all 8 segments of the HPAI viruses from both cow milk and cat tissue samples. We used the hemagglutinin (HA) and neuraminidase (NA) sequences specifically for phylogenetic analysis to delineate the clade of the HA gene and subtype of the NA gene.

Phylogenetic analysis of hemagglutinin gene sequences in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. Colors indicate different clades. Red text indicates the virus gene sequences from bovine milk and cats described in this report, confirming those viruses are highly similar and belong to H5 clade 2.3.4.4b. The hemagglutinin sequences from this report are most closely related to A/avian/Guanajuato/CENAPA-18539/2023|EPI_ISL_18755544|A_/_H5 (GISAID, https://www.gisaid.org) and have 99.66%–99.72% nucleotide identities.

Figure 3 . Phylogenetic analysis of hemagglutinin gene sequences in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. Colors indicate different...

For HA gene analysis, both HA sequences derived from cow milk samples exhibited a high degree of similarity, sharing 99.88% nucleotide identity, whereas the 2 HA sequences from cat tissue samples showed complete identity at 100%. The HA sequences from the milk samples had 99.94% nucleotide identities with HA sequences from the cat tissues, resulting in a distinct subcluster comprising all 4 HA sequences, which clustered together with other H5N1 viruses belonging to clade 2.3.4.4b ( Figure 3 ). The HA sequences were deposited in GenBank (accession nos. PP599465 [case 2, cow 1], PP599473 [case 2, cow 2], PP692142 [case 6, cat 1], and PP692195 [case 6, cat 2]).

Phylogenetic analysis of neuraminidase gene sequences in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. Colors indicate different subtypes. Red text indicates the virus gene sequences from bovine milk and cats described in this report, confirming those viruses belong to the N1 subtype. The neuraminidase sequences from this report had 99.52%–99.59% nucleotide identities to sequences from viruses isolated from a chicken and wild birds in 2023.

Figure 4 . Phylogenetic analysis of neuraminidase gene sequences in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. Colors indicate different...

For NA gene analysis, the 2 NA sequences obtained from cow milk samples showed 99.93% nucleotide identity. Moreover, the NA sequences derived from the milk samples exhibited complete nucleotide identities (100%) with those from the cat tissues. The 4 NA sequences were grouped within the N1 subtype of HPAI viruses ( Figure 4 ). The NA sequences were deposited in GenBank (accession nos. PP599467 [case 2, cow 1], PP599475 [case 2, cow 2], PP692144 [case 6, cat 1], and PP692197 [case 6, cat 2]).

This case series differs from most previous reports of IAV infection in bovids, which indicated cattle were inapparently infected or resistant to infection ( 9 ). We describe an H5N1 strain of IAV in dairy cattle that resulted in apparent systemic illness, reduced milk production, and abundant virus shedding in milk. The magnitude of this finding is further emphasized by the high death rate (≈50%) of cats on farm premises that were fed raw colostrum and milk from affected cows; clinical disease and lesions developed that were consistent with previous reports of H5N1 infection in cats presumably derived from consuming infected wild birds ( 10 – 12 ). Although exposure to and consumption of dead wild birds cannot be completely ruled out for the cats described in this report, the known consumption of unpasteurized milk and colostrum from infected cows and the high amount of virus nucleic acid within the milk make milk and colostrum consumption a likely route of exposure. Therefore, our findings suggest cross-species mammal-to-mammal transmission of HPAI H5N1 virus and raise new concerns regarding the potential for virus spread within mammal populations. Horizontal transmission of HPAI H5N1 virus has been previously demonstrated in experimentally infected cats ( 13 ) and ferrets ( 14 ) and is suspected to account for large dieoffs observed during natural outbreaks in mink ( 15 ) and sea lions ( 16 ). Future experimental studies of HPAI H5N1 virus in dairy cattle should seek to confirm cross-species transmission to cats and potentially other mammals.

Clinical IAV infection in cattle has been infrequently reported in the published literature. The first report occurred in Japan in 1949, where a short course of disease with pyrexia, anorexia, nasal discharge, pneumonia, and decreased lactation developed in cattle ( 17 ). In 1997, a similar condition occurred in dairy cows in southwest England leading to a sporadic drop in milk production ( 18 ), and IAV seroconversion was later associated with reduced milk yield and respiratory disease ( 19 – 21 ). Rising antibody titers against human-origin influenza A viruses (H1N1 and H3N2) were later again reported in dairy cattle in England, which led to an acute fall in milk production during October 2005–March 2006 ( 22 ). Limited reports of IAV isolation from cattle exist; most reports occurred during the 1960s and 1970s in Hungary and in the former Soviet Union, where H3N2 was recovered from cattle experiencing respiratory disease ( 9 , 23 ). Direct detection of IAV in milk and the potential transmission from cattle to cats through feeding of unpasteurized milk has not been previously reported.

An IAV-associated drop in milk production in dairy cattle appears to have occurred during > 4 distinct periods and within 3 widely separated geographic areas: 1949 in Japan ( 17 ), 1997–1998 and 2005–2006 in Europe ( 19 , 21 ), and 2024 in the United States (this report). The sporadic occurrence of clinical disease in dairy cattle worldwide might be the result of changes in subclinical infection rates and the presence or absence of sufficient baseline IAV antibodies in cattle to prevent infection. Milk IgG, lactoferrin, and conglutinin have also been suggested as host factors that might reduce susceptibility of bovids to IAV infection ( 9 ). Contemporary estimates of the seroprevalence of IAV antibodies in US cattle are not well described in the published literature. One retrospective serologic survey in the United States in the late 1990s showed 27% of serum samples had positive antibody titers and 31% had low-positive titers for IAV H1 subtype-specific antigen in cattle with no evidence of clinical infections ( 24 ). Antibody titers for H5 subtype-specific antigen have not been reported in US cattle.

The susceptibility of domestic cats to HPAI H5N1 is well-documented globally ( 10 – 12 , 25 – 28 ), and infection often results in neurologic signs in affected felids and other terrestrial mammals ( 4 ). Most cases in cats result from consuming infected wild birds or contaminated poultry products ( 12 , 27 ). The incubation period in cats is short; clinical disease is often observed 2–3 days after infection ( 28 ). Brain tissue has been suggested as the best diagnostic sample to confirm HPAI virus infection in cats ( 10 ), and our results support that finding. One unique finding in the cats from this report is the presence of blindness and microscopic lesions of chorioretinitis. Those results suggest that further investigation into potential ocular manifestations of HPAI H5N1 virus infection in cats might be warranted.

The genomic sequencing and subsequent analysis of clinical samples from both bovine and feline sources provided considerable insights. The HA and NA sequences derived from both bovine milk and cat tissue samples from different Texas farms had a notable degree of similarity. Those findings strongly suggest a shared origin for the viruses detected in the dairy cattle and cat tissues. Further research, case series investigations, and surveillance data are needed to better understand and inform measures to curtail the clinical effects, shedding, and spread of HPAI viruses among mammals. Although pasteurization of commercial milk mitigates risks for transmission to humans, a 2019 US consumer study showed that 4.4% of adults consumed raw milk > 1 time during the previous year ( 29 ), indicating a need for public awareness of the potential presence of HPAI H5N1 viruses in raw milk.

Ingestion of feed contaminated with feces from wild birds infected with HPAI virus is presumed to be the most likely initial source of infection in the dairy farms. Although the exact source of the virus is unknown, migratory birds (Anseriformes and Charadriiformes) are likely sources because the Texas panhandle region lies in the Central Flyway, and those birds are the main natural reservoir for avian influenza viruses ( 30 ). HPAI H5N1 viruses are well adapted to domestic ducks and geese, and ducks appear to be a major reservoir ( 31 ); however, terns have also emerged as an important source of virus spread ( 32 ). The mode of transmission among infected cattle is also unknown; however, horizontal transmission has been suggested because disease developed in resident cattle herds in Michigan, Idaho, and Ohio farms that received infected cattle from the affected regions, and those cattle tested positive for HPAI H5N1 ( 33 ). Experimental studies are needed to decipher the transmission routes and pathogenesis (e.g., replication sites and movement) of the virus within infected cattle.

In conclusion, we showed that dairy cattle are susceptible to infection with HPAI H5N1 virus and can shed virus in milk and, therefore, might potentially transmit infection to other mammals via unpasteurized milk. A reduction in milk production and vague systemic illness were the most commonly reported clinical signs in affected cows, but neurologic signs and death rapidly developed in affected domestic cats. HPAI virus infection should be considered in dairy cattle when an unexpected and unexplained abrupt drop in feed intake and milk production occurs and for cats when rapid onset of neurologic signs and blindness develop. The recurring nature of global HPAI H5N1 virus outbreaks and detection of spillover events in a broad host range is concerning and suggests increasing virus adaptation in mammals. Surveillance of HPAI viruses in domestic production animals, including cattle, is needed to elucidate influenza virus evolution and ecology and prevent cross-species transmission.

Dr. Burrough is a professor and diagnostic pathologist at the Iowa State University College of Veterinary Medicine and Veterinary Diagnostic Laboratory. His research focuses on infectious diseases of livestock with an emphasis on swine.

Acknowledgment

We thank the faculty and staff at the ISUVDL who contributed to the processing and analysis of clinical samples in this investigation, the veterinarians involved with clinical assessments at affected dairies and various conference calls in the days before diagnostic submissions that ultimately led to the detection of HPAI virus in the cattle, and the US Department of Agriculture National Veterinary Services Laboratory and NAHLN for their roles and assistance in providing their expertise, confirmatory diagnostic support, and communications surrounding the HPAI virus cases impacting lactating dairy cattle.

  • Caliendo  V , Lewis  NS , Pohlmann  A , Baillie  SR , Banyard  AC , Beer  M , et al. Transatlantic spread of highly pathogenic avian influenza H5N1 by wild birds from Europe to North America in 2021. Sci Rep . 2022 ; 12 : 11729 . DOI PubMed Google Scholar
  • Bevins  SN , Shriner  SA , Cumbee  JC Jr , Dilione  KE , Douglass  KE , Ellis  JW , et al. Intercontinental movement of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4 virus to the United States, 2021. Emerg Infect Dis . 2022 ; 28 : 1006 – 11 . DOI PubMed Google Scholar
  • Puryear  W , Sawatzki  K , Hill  N , Foss  A , Stone  JJ , Doughty  L , et al. Highly pathogenic avian influenza A(H5N1) virus outbreak in New England seals, United States. Emerg Infect Dis . 2023 ; 29 : 786 – 91 . DOI PubMed Google Scholar
  • Elsmo  EJ , Wünschmann  A , Beckmen  KB , Broughton-Neiswanger  LE , Buckles  EL , Ellis  J , et al. Highly pathogenic avian influenza A(H5N1) virus clade 2.3.4.4b infections in wild terrestrial mammals, United States, 2022. Emerg Infect Dis . 2023 ; 29 : 2451 – 60 . DOI PubMed Google Scholar
  • Bruno  A , Alfaro-Núñez  A , de Mora  D , Armas  R , Olmedo  M , Garcés  J , et al. First case of human infection with highly pathogenic H5 avian influenza A virus in South America: a new zoonotic pandemic threat for 2023? J Travel Med. 2023 ;30:taad032.
  • Pulit-Penaloza  JA , Brock  N , Belser  JA , Sun  X , Pappas  C , Kieran  TJ , et al. Highly pathogenic avian influenza A(H5N1) virus of clade 2.3.4.4b isolated from a human case in Chile causes fatal disease and transmits between co-housed ferrets. Emerg Microbes Infect . 2024 ; 2332667 . DOI PubMed Google Scholar
  • United States Department of Agriculture Animal and Plant Health Inspection Service . Federal and state veterinary, public health agencies share update on HPAI detection in Kansas, Texas dairy herds. 2024 [ cited 2024 Mar 29 ]. https://www.aphis.usda.gov/news/agency-announcements/federal-state-veterinary-public-health-agencies-share-update-hpai
  • Sharma  A , Zeller  MA , Souza  CK , Anderson  TK , Vincent  AL , Harmon  K , et al. Characterization of a 2016–2017 human seasonal H3 influenza A virus spillover now endemic to U.S. swine. MSphere . 2022 ; 7 : e0080921 . DOI PubMed Google Scholar
  • Sreenivasan  CC , Thomas  M , Kaushik  RS , Wang  D , Li  F . Influenza A in bovine species: a narrative literature review. Viruses . 2019 ; 11 : 561 . DOI PubMed Google Scholar
  • Sillman  SJ , Drozd  M , Loy  D , Harris  SP . Naturally occurring highly pathogenic avian influenza virus H5N1 clade 2.3.4.4b infection in three domestic cats in North America during 2023. J Comp Pathol . 2023 ; 205 : 17 – 23 . DOI PubMed Google Scholar
  • Klopfleisch  R , Wolf  PU , Uhl  W , Gerst  S , Harder  T , Starick  E , et al. Distribution of lesions and antigen of highly pathogenic avian influenza virus A/Swan/Germany/R65/06 (H5N1) in domestic cats after presumptive infection by wild birds. Vet Pathol . 2007 ; 44 : 261 – 8 . DOI PubMed Google Scholar
  • Keawcharoen  J , Oraveerakul  K , Kuiken  T , Fouchier  RAM , Amonsin  A , Payungporn  S , et al. Avian influenza H5N1 in tigers and leopards. Emerg Infect Dis . 2004 ; 10 : 2189 – 91 . DOI PubMed Google Scholar
  • Kuiken  T , Rimmelzwaan  G , van Riel  D , van Amerongen  G , Baars  M , Fouchier  R , et al. Avian H5N1 influenza in cats. Science . 2004 ; 306 : 241 . DOI PubMed Google Scholar
  • Herfst  S , Schrauwen  EJA , Linster  M , Chutinimitkul  S , de Wit  E , Munster  VJ , et al. Airborne transmission of influenza A/H5N1 virus between ferrets. Science . 2012 ; 336 : 1534 – 41 . DOI PubMed Google Scholar
  • Agüero  M , Monne  I , Sánchez  A , Zecchin  B , Fusaro  A , Ruano  MJ , et al. Highly pathogenic avian influenza A(H5N1) virus infection in farmed minks, Spain, October 2022. Euro Surveill . 2023 ; 28 : 2300001 . DOI PubMed Google Scholar
  • Leguia  M , Garcia-Glaessner  A , Muñoz-Saavedra  B , Juarez  D , Barrera  P , Calvo-Mac  C , et al. Highly pathogenic avian influenza A (H5N1) in marine mammals and seabirds in Peru. Nat Commun . 2023 ; 14 : 5489 . DOI PubMed Google Scholar
  • Saito  K . An outbreak of cattle influenza in Japan in the fall of 1949. J Am Vet Med Assoc . 1951 ; 118 : 316 – 9 . PubMed Google Scholar
  • Gunning  RF , Pritchard  GC . Unexplained sporadic milk drop in dairy cows. Vet Rec . 1997 ; 140 : 488 . PubMed Google Scholar
  • Brown  IH , Crawshaw  TR , Harris  PA , Alexander  DJ . Detection of antibodies to influenza A virus in cattle in association with respiratory disease and reduced milk yield. Vet Rec . 1998 ; 143 : 637 – 8 . PubMed Google Scholar
  • Crawshaw  TR , Brown  I . Bovine influenza. Vet Rec . 1998 ; 143 : 372 . PubMed Google Scholar
  • Gunning  RF , Brown  IH , Crawshaw  TR . Evidence of influenza A virus infection in dairy cows with sporadic milk drop syndrome. Vet Rec . 1999 ; 145 : 556 – 7 . DOI PubMed Google Scholar
  • Crawshaw  TR , Brown  IH , Essen  SC , Young  SCL . Significant rising antibody titres to influenza A are associated with an acute reduction in milk yield in cattle. Vet J . 2008 ; 178 : 98 – 102 . DOI PubMed Google Scholar
  • Lopez  JW , Woods  GT . Influenza virus in ruminants: a review. Res Commun Chem Pathol Pharmacol . 1984 ; 45 : 445 – 62 . PubMed Google Scholar
  • Jones-Lang  K , Ernst-Larson  M , Lee  B , Goyal  SM , Bey  R . Prevalence of influenza A virus (H1N1) antibodies in bovine sera. New Microbiol . 1998 ; 21 : 153 – 60 . PubMed Google Scholar
  • Briand  FX , Souchaud  F , Pierre  I , Beven  V , Hirchaud  E , Hérault  F , et al. Highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus in domestic cat, France, 2022. Emerg Infect Dis . 2023 ; 29 : 1696 – 8 . DOI PubMed Google Scholar
  • Frymus  T , Belák  S , Egberink  H , Hofmann-Lehmann  R , Marsilio  F , Addie  DD , et al. Influenza virus infections in cats. Viruses . 2021 ; 13 : 1435 . DOI PubMed Google Scholar
  • Songserm  T , Amonsin  A , Jam-on  R , Sae-Heng  N , Meemak  N , Pariyothorn  N , et al. Avian influenza H5N1 in naturally infected domestic cat. Emerg Infect Dis . 2006 ; 12 : 681 – 3 . DOI PubMed Google Scholar
  • Thiry  E , Zicola  A , Addie  D , Egberink  H , Hartmann  K , Lutz  H , et al. Highly pathogenic avian influenza H5N1 virus in cats and other carnivores. Vet Microbiol . 2007 ; 122 : 25 – 31 . DOI PubMed Google Scholar
  • Lando  AM , Bazaco  MC , Parker  CC , Ferguson  M . Characteristics of U.S. consumers reporting past year intake of raw (unpasteurized) milk: results from the 2016 Food Safety Survey and 2019 Food Safety and Nutrition Survey. J Food Prot . 2022 ; 85 : 1036 – 43 . DOI PubMed Google Scholar
  • Fourment  M , Darling  AE , Holmes  EC . The impact of migratory flyways on the spread of avian influenza virus in North America. BMC Evol Biol . 2017 ; 17 : 118 . DOI PubMed Google Scholar
  • Guan  Y , Smith  GJD . The emergence and diversification of panzootic H5N1 influenza viruses. Virus Res . 2013 ; 178 : 35 – 43 . DOI PubMed Google Scholar
  • de Araújo  AC , Silva  LMN , Cho  AY , Repenning  M , Amgarten  D , de Moraes  AP , et al. Incursion of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus, Brazil, 2023. Emerg Infect Dis . 2024 ; 30 : 619 – 21 . DOI PubMed Google Scholar
  • American Veterinary Medical Association . States with HPAI-infected dairy cows grows to six. USDA provides guidance for veterinarians, producers on protecting cattle from the virus. 2024 [ cited 2024 Apr 10 ]. https://www.avma.org/news/states-hpai-infected-dairy-cows-grows-six
  • Figure 1 . Mammary gland lesions in cattle in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. A, B) Mammary...
  • Figure 2 . Lesions in cat tissues in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. Tissue sections were stained...
  • Figure 3 . Phylogenetic analysis of hemagglutinin gene sequences in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. Colors indicate...
  • Figure 4 . Phylogenetic analysis of neuraminidase gene sequences in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. Colors indicate...
  • Table 1 . Microscopic lesions observed in cattle in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024
  • Table 2 . Microscopic lesions observed in cats in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024
  • Table 3 . PCR results from various specimens in study of highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024

Suggested citation for this article : Burrough ER, Magstadt DR, Petersen B, Timmermans SJ, Gauger PC, Zhang J, et al. Highly pathogenic avian influenza A(H5N1) clade 2.3.4.4b virus infection in domestic dairy cattle and cats, United States, 2024. Emerg Infect Dis. 2024 Jul [ date cited ]. https://doi.org/10.3201/eid3007.240508

DOI: 10.3201/eid3007.240508

Original Publication Date: April 29, 2024

Table of Contents – Volume 30, Number 7—July 2024

Please use the form below to submit correspondence to the authors or contact them at the following address:

Eric R. Burrough, Iowa State University Veterinary Diagnostic Laboratory, 1937 Christensen Dr, Ames, IA 50011, USA

Comment submitted successfully, thank you for your feedback.

There was an unexpected error. Message not sent.

Exit Notification / Disclaimer Policy

  • The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website.
  • Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.
  • You will be subject to the destination website's privacy policy when you follow the link.
  • CDC is not responsible for Section 508 compliance (accessibility) on other federal or private website.

Metric Details

Article views: 24125.

Data is collected weekly and does not include downloads and attachments. View data is from .

What is the Altmetric Attention Score?

The Altmetric Attention Score for a research output provides an indicator of the amount of attention that it has received. The score is derived from an automated algorithm, and represents a weighted count of the amount of attention Altmetric picked up for a research output.

This page has been archived and is no longer updated

Genetic Mutation

research paper about genetic mutation

What is a mutation?

View Terms of Use

Mutations are changes in the genetic sequence, and they are a main cause of diversity among organisms. These changes occur at many different levels, and they can have widely differing consequences. In biological systems that are capable of reproduction , we must first focus on whether they are heritable; specifically, some mutations affect only the individual that carries them, while others affect all of the carrier organism 's offspring , and further descendants. For mutations to affect an organism's descendants, they must: 1) occur in cells that produce the next generation, and 2) affect the hereditary material. Ultimately, the interplay between inherited mutations and environmental pressures generates diversity among species .

Although various types of molecular changes exist, the word " mutation " typically refers to a change that affects the nucleic acids. In cellular organisms, these nucleic acids are the building blocks of DNA , and in viruses they are the building blocks of either DNA or RNA . One way to think of DNA and RNA is that they are substances that carry the long-term memory of the information required for an organism 's reproduction. This article focuses on mutations in DNA, although we should keep in mind that RNA is subject to essentially the same mutation forces.

If mutations occur in non-germline cells, then these changes can be categorized as somatic mutations. The word somatic comes from the Greek word soma which means "body", and somatic mutations only affect the present organism's body. From an evolutionary perspective, somatic mutations are uninteresting, unless they occur systematically and change some fundamental property of an individual--such as the capacity for survival. For example, cancer is a potent somatic mutation that will affect a single organism's survival. As a different focus, evolutionary theory is mostly interested in DNA changes in the cells that produce the next generation.

Are Mutations Random?

The statement that mutations are random is both profoundly true and profoundly untrue at the same time. The true aspect of this statement stems from the fact that, to the best of our knowledge, the consequences of a mutation have no influence whatsoever on the probability that this mutation will or will not occur. In other words, mutations occur randomly with respect to whether their effects are useful. Thus, beneficial DNA changes do not happen more often simply because an organism could benefit from them. Moreover, even if an organism has acquired a beneficial mutation during its lifetime, the corresponding information will not flow back into the DNA in the organism's germline. This is a fundamental insight that Jean-Baptiste Lamarck got wrong and Charles Darwin got right.

However, the idea that mutations are random can be regarded as untrue if one considers the fact that not all types of mutations occur with equal probability. Rather, some occur more frequently than others because they are favored by low-level biochemical reactions. These reactions are also the main reason why mutations are an inescapable property of any system that is capable of reproduction in the real world. Mutation rates are usually very low, and biological systems go to extraordinary lengths to keep them as low as possible, mostly because many mutational effects are harmful. Nonetheless, mutation rates never reach zero, even despite both low-level protective mechanisms, like DNA repair or proofreading during DNA replication , and high-level mechanisms, like melanin deposition in skin cells to reduce radiation damage. Beyond a certain point, avoiding mutation simply becomes too costly to cells. Thus, mutation will always be present as a powerful force in evolution .

Types of Mutations

So, how do mutations occur? The answer to this question is closely linked to the molecular details of how both DNA and the entire genome are organized. The smallest mutations are point mutations, in which only a single base pair is changed into another base pair. Yet another type of mutation is the nonsynonymous mutation, in which an amino acid sequence is changed. Such mutations lead to either the production of a different protein or the premature termination of a protein.

As opposed to nonsynonymous mutations, synonymous mutations do not change an amino acid sequence, although they occur, by definition, only in sequences that code for amino acids. Synonymous mutations exist because many amino acids are encoded by multiple codons. Base pairs can also have diverse regulating properties if they are located in introns , intergenic regions, or even within the coding sequence of genes . For some historic reasons, all of these groups are often subsumed with synonymous mutations under the label "silent" mutations. Depending on their function, such silent mutations can be anything from truly silent to extraordinarily important, the latter implying that working sequences are kept constant by purifying selection . This is the most likely explanation for the existence of ultraconserved noncoding elements that have survived for more than 100 million years without substantial change, as found by comparing the genomes of several vertebrates (Sandelin et al ., 2004).

Mutations may also take the form of insertions or deletions, which are together known as indels. Indels can have a wide variety of lengths. At the short end of the spectrum, indels of one or two base pairs within coding sequences have the greatest effect, because they will inevitably cause a frameshift (only the addition of one or more three-base-pair codons will keep a protein approximately intact). At the intermediate level, indels can affect parts of a gene or whole groups of genes. At the largest level, whole chromosomes or even whole copies of the genome can be affected by insertions or deletions, although such mutations are usually no longer subsumed under the label indel. At this high level, it is also possible to invert or translocate entire sections of a chromosome, and chromosomes can even fuse or break apart. If a large number of genes are lost as a result of one of these processes, then the consequences are usually very harmful. Of course, different genetic systems react differently to such events.

Finally, still other sources of mutations are the many different types of transposable elements, which are small entities of DNA that possess a mechanism that permits them to move around within the genome. Some of these elements copy and paste themselves into new locations, while others use a cut-and-paste method. Such movements can disrupt existing gene functions (by insertion in the middle of another gene), activate dormant gene functions (by perfect excision from a gene that was switched off by an earlier insertion), or occasionally lead to the production of new genes (by pasting material from different genes together).

Effects of Mutations

A single mutation can have a large effect, but in many cases, evolutionary change is based on the accumulation of many mutations with small effects. Mutational effects can be beneficial, harmful, or neutral, depending on their context or location. Most non-neutral mutations are deleterious. In general, the more base pairs that are affected by a mutation, the larger the effect of the mutation, and the larger the mutation's probability of being deleterious.

To better understand the impact of mutations, researchers have started to estimate distributions of mutational effects (DMEs) that quantify how many mutations occur with what effect on a given property of a biological system. In evolutionary studies, the property of interest is fitness , but in molecular systems biology, other emerging properties might also be of interest. It is extraordinarily difficult to obtain reliable information about DMEs, because the corresponding effects span many orders of magnitude, from lethal to neutral to advantageous; in addition, many confounding factors usually complicate these analyses. To make things even more difficult, many mutations also interact with each other to alter their effects; this phenomenon is referred to as epistasis . However, despite all these uncertainties, recent work has repeatedly indicated that the overwhelming majority of mutations have very small effects (Figure 1; Eyre-Walker & Keightley, 2007). Of course, much more work is needed in order to obtain more detailed information about DMEs, which are a fundamental property that governs the evolution of every biological system.

Estimating Rates of Mutation

Many direct and indirect methods have been developed to help estimate rates of different types of mutations in various organisms. The main difficulty in estimating rates of mutation involves the fact that DNA changes are extremely rare events and can only be detected on a background of identical DNA. Because biological systems are usually influenced by many factors, direct estimates of mutation rates are desirable. Direct estimates typically involve use of a known pedigree in which all descendants inherited a well-defined DNA sequence. To measure mutation rates using this method, one first needs to sequence many base pairs within this region of DNA from many individuals in the pedigree, counting all the observed mutations. These observations are then combined with the number of generations that connect these individuals to compute the overall mutation rate (Haag-Liautard et al ., 2007). Such direct estimates should not be confused with substitution rates estimated over phylogenetic time spans.

Mutation rates can vary within a genome and between genomes. Much more work is required before researchers can obtain more precise estimates of the frequencies of different mutations. The rise of high-throughput genomic sequencing methods nurtures the hope that we will be able to cultivate a more detailed and precise understanding of mutation rates. Because mutation is one of the fundamental forces of evolution, such work will continue to be of paramount importance.

References and Recommended Reading

Drake, J. W., et al . Rates of spontaneous mutation. Genetics 148 , 1667–1686 (1998)

Eyre-Walker, A., & Keightley, P. D. The distribution of fitness effects of new mutations. Nature Reviews Genetics 8 , 610–618 (2007) doi:10.1038/nrg2146 ( link to article )

Haag-Liautard, C., et al . Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila . Nature 445 , 82–85 (2007) doi:10.1038/nature05388 ( link to article )

Loewe, L., & Charlesworth, B. Inferring the distribution of mutational effects on fitness in Drosophila . Biology Letters 2 , 426–430 (2006)

Lynch, M., et al . Perspective: Spontaneous deleterious mutation. Evolution 53 , 645–663 (1999)

Orr, H. A. The genetic theory of adaptation: A brief history. Nature Review Genetics 6 , 119–127 (2005) doi:10.1038/nrg1523 ( link to article )

Sandelin, A., et al . Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics 5 , 99 (2004)

  • Add Content to Group

Article History

Flag inappropriate.

Google Plus+

StumbleUpon

Email your Friend

research paper about genetic mutation

  •  |  Lead Editor:  Bob Moss

Topic Rooms

Within this Subject (34)

  • Applications in Biotechnology (4)
  • Discovery of Genetic Material (4)
  • DNA Replication (6)
  • Gene Copies (5)
  • Jumping Genes (4)
  • RNA (7)
  • Transcription & Translation (4)

Other Topic Rooms

  • Gene Inheritance and Transmission
  • Gene Expression and Regulation
  • Nucleic Acid Structure and Function
  • Chromosomes and Cytogenetics
  • Evolutionary Genetics
  • Population and Quantitative Genetics
  • Genes and Disease
  • Genetics and Society
  • Cell Origins and Metabolism
  • Proteins and Gene Expression
  • Subcellular Compartments
  • Cell Communication
  • Cell Cycle and Cell Division

ScholarCast

© 2014 Nature Education

  • Press Room |
  • Terms of Use |
  • Privacy Notice |

Send

Visual Browse

IMAGES

  1. (PDF) Gene Mutation Classification through Text Evidence Facilitating

    research paper about genetic mutation

  2. An Introduction to Genetic Analysis Chapter 15 Gene Mutation

    research paper about genetic mutation

  3. Gene Mutations: Causes and Effects

    research paper about genetic mutation

  4. How Genetic Changes Lead to Cancer

    research paper about genetic mutation

  5. Mutations in Dna Essay (500 Words)

    research paper about genetic mutation

  6. Genetic Mutations

    research paper about genetic mutation

VIDEO

  1. Scientists researching mutated COVID variant

  2. Mutation Theory

  3. DNA Topoisomerase

  4. Genetic risk, autoimmunity, and the gut microbiome

  5. Human Mutation

  6. Classification of Restriction enzymes

COMMENTS

  1. Human Molecular Genetics and Genomics

    Genomic research has also shown that PCSK9 loss-of-function mutations are more common in people of African ancestry than in other populations; such mutations reduce cholesterol levels and the risk ...

  2. Genetic variation across and within individuals

    DNA sequencing. Genetic variation. Germline variation and somatic mutation are intricately connected and together shape human traits and disease risks. Germline variants are present from ...

  3. Mutation—The Engine of Evolution: Studying Mutation and Its Role in the

    Abstract. Mutation is the engine of evolution in that it generates the genetic variation on which the evolutionary process depends. To understand the evolutionary process we must therefore characterize the rates and patterns of mutation. Starting with the seminal Luria and Delbruck fluctuation experiments in 1943, studies utilizing a variety of ...

  4. The origin of human mutation in light of genomic data

    Abstract. Despite years of active research into the role of DNA repair and replication in mutagenesis, surprisingly little is known about the origin of spontaneous human mutation in the germ line ...

  5. The origins, determinants, and consequences of human mutations

    Advances in DNA sequencing have enabled the identification of human germline and somatic mutations at a genome-wide scale.These studies have confirmed, refined, and extended our understanding on the origins, mechanistic basis, and empirical characteristics of human mutations, including both replicative and nonreplicative errors (), heterogeneity in the rates and spectrum of mutations within ...

  6. Mutation

    Mutation is the source of genetic diversity on which natural selection acts, therefore understanding the rates of mutations is crucial for understanding evolutionary trajectories.

  7. Mutation Research

    A section of Mutation Research Mutation Research (MR) provides a platform for publishing all aspects of DNA mutations and epimutations, from basic evolutionary aspects to translational applications in genetic and epigenetic diagnostics and therapy.Mutations are defined as all possible alterations in DNA sequence and sequence organization, from point mutations to genome structural variation ...

  8. Mutation Research

    A section of Mutation Research Mutation Research: Genetic Toxicology and Environmental Mutagenesis (MRGTEM) publishes papers advancing knowledge in the field of genetic toxicology. Papers are welcomed in the following areas: New developments in genotoxicity testing of chemical agents (e.g., in methodology of assay systems and interpretation of ...

  9. Human Mutation

    About This Journal. Human Mutation is a peer-reviewed journal that offers publication of original research, Reviews, Mutation Updates, Methods, Data Articles, and Informatics Articles on broad aspects of mutation research and bioinformatics in humans. Reports of novel DNA variations and their phenotypic consequences, novel disease genes and/or ...

  10. Molecular Effects of Mutations in Human Genetic Diseases

    Moreover, knowledge about the molecular effects of causal mutations, emerging at the interface of human genetics, computational biology, molecular biology, and biophysics, may provide insights into pathogenic mechanisms underlying diseases that can be targeted to develop novel therapeutic strategies. This Special Issue (SI) aimed to attract ...

  11. Rare Genetic Diseases: Nature's Experiments on Human Development

    Rare genetic diseases are the result of a continuous forward genetic screen that nature is conducting on humans. Here, we present epistemological and systems biology arguments highlighting the importance of studying these rare genetic diseases. We contend that the expanding catalog of mutations in ∼4,000 genes, which cause ∼6,500 diseases ...

  12. What is mutation? A chapter in the series: How microbes ...

    Mutations drive evolution and were assumed to occur by chance: constantly, gradually, roughly uniformly in genomes, and without regard to environmental inputs, but this view is being revised by discoveries of molecular mechanisms of mutation in bacteria, now translated across the tree of life. These mechanisms reveal a picture of highly regulated mutagenesis, up-regulated temporally by stress ...

  13. The Human Gene Mutation Database: towards a comprehensive ...

    The Human Gene Mutation Database (HGMD®) constitutes a comprehensive collection of published germline mutations in nuclear genes that underlie, or are closely associated with human inherited disease. At the time of writing (March 2017), the database contained in excess of 203,000 different gene lesions identified in over 8000 genes manually curated from over 2600 journals. With new mutation ...

  14. Changing fitness effects of mutations through long-term ...

    The benefits and costs of mutations that undergo natural selection can change depending on genetic interactions with subsequent mutations. In an enduring experiment, 12 lineages of Escherichia coli have been maintained for more than 75,000 generations, with each generation sampled and preserved. Couce et al. made transposon insertion libraries in ancestral and evolved strains taken at the ...

  15. Mutation Research/Mutation Research Genomics

    Jianzhong Wu. June 2001 View PDF. More opportunities to publish your research: Browse open Calls for Papers beta. Read the latest articles of Mutation Research/Mutation Research Genomics at ScienceDirect.com, Elsevier's leading platform of peer-reviewed scholarly literature.

  16. Mutational landscape of cancer-driver genes across human cancers

    The genetic mutations that contribute to the transformation of healthy cells into cancerous cells have been the subject of extensive research. The molecular aberrations that lead to cancer ...

  17. Stem cell divisions, somatic mutations, cancer etiology, and cancer

    Assume that this mutagen substantially increased the somatic mutation rate in normal stem cells, causing a 10-fold increase in cancer risk, i.e., 90% of all cancer cases on this planet were now attributable to E. Therefore, 90% of all cancer cases on Planet B would be preventable by avoiding exposure to E.

  18. DNA mutation motifs in the genes associated with inherited diseases

    Mutations in human genes can be responsible for inherited genetic disorders and cancer. Mutations can arise due to environmental factors or spontaneously. It has been shown that certain DNA sequences are more prone to mutate. These sites are termed hotspots and exhibit a higher mutation frequency than expected by chance. In contrast, DNA sequences with lower mutation frequencies than expected ...

  19. The population genetics of mutations: good, bad and indifferent

    Abstract. Population genetics is fundamental to our understanding of evolution, and mutations are essential raw materials for evolution. In this introduction to more detailed papers that follow, we aim to provide an oversight of the field. We review current knowledge on mutation rates and their harmful and beneficial effects on fitness and then ...

  20. Research breakthrough on birth defect affecting brain size

    The title of the research paper is "Epistatic Interactions between NMD and TRP53 Control Progenitor Cell Maintenance and Brain Size.". Zheng was joined in the study by Liang Chen of the University of Southern California, Chun-Wei Chen of the City of Hope, Gene Yeo of UC San Diego, and members of their labs. Below, Zheng answers questions ...

  21. Design of highly functional genome editors by modeling the ...

    Gene editing has the potential to solve fundamental challenges in agriculture, biotechnology, and human health. CRISPR-based gene editors derived from microbes, while powerful, often show significant functional tradeoffs when ported into non-native environments, such as human cells. Artificial intelligence (AI) enabled design provides a powerful alternative with potential to bypass ...

  22. Mutation

    A mutation is any detectable and heritable change in nucleotide sequence that causes a change in genotype and is transmitted to daughter cells and succeeding generations. Related Subjects Gene ...

  23. Untangling the Genetic and Environmental Complexities of Autism

    Volk became interested in gene-environment interactions in ASD after receiving an NIEHS grant to pursue postdoctoral research training at the University of Southern California. "When I first started researching autism, the prevalence of ASD in 8-year-olds was 1 in 110, and now it's around 1 in 36," Volk said.

  24. Martin Walschburger Hurtado Studying Stone Man Disease through Research

    Their recent work involves student research on the impact of specific proteins in cancer, as well as a genetic mutation that causes stone man disease.Qutob and Underwood teach and mentor groups of up to 10 Kent State and Walsh students each year, providing them with essential skills needed for post-graduate programs in biomedical ...

  25. Mutagenesis

    Mutagenesis is the process of generating a genetic mutation. This may occur spontaneously or be induced by mutagens. Researchers also use a number of techniques to create mutations, including ...

  26. Discovering cancers of epigenetic origin without DNA mutation

    A research team has discovered that cancer, one of the leading causes of death worldwide, can be caused entirely by epigenetic changes, in other words, changes that contribute to how gene ...

  27. COVID-19: comprehensive review on mutations and current vaccines

    The association of V483H and G476S mutations on human ACE2 receptor binding capacity in MERS and SARS research have been demonstrated (Cherian et al. 2021). Novel mutations of SARS-CoV-2 started showing up leading to the second wave of infections during December 2020 to March 2021. ... Though relatively uncommon, M protein gene mutations are ...

  28. Volume 30, Number 7—July 2024

    Research Highly Pathogenic Avian Influenza A(H5N1) Clade 2.3.4.4b Virus Infection in Domestic Dairy Cattle and Cats, United States, 2024 On This Page ... For HA gene analysis, both HA sequences derived from cow milk samples exhibited a high degree of similarity, sharing 99.88% nucleotide identity, whereas the 2 HA sequences from cat tissue ...

  29. Genetic Mutation

    Mutations are changes in the genetic sequence, and they are a main cause of diversity among organisms. These changes occur at many different levels, and they can have widely differing consequences ...

  30. Sustainability

    Optimizing production processes to conserve resources and reduce waste has become crucial in pursuing sustainable manufacturing practices. The solid wood panel industry, marked by substantial raw materials and energy consumption, stands at the forefront of addressing this challenge. This research delves into production scheduling and equipment utilization inefficiencies, offering innovative ...