Mutations in normal cells differ from those in cancer cells

The underlying causes of DNA mutations (alterations in one or more bases in the genome sequence) are known to include free radicals, irradiation, reactive oxygenated molecules, and “random events” (in part ionizing radiation to which every life form on the planet is exposed). DNA repair processes restore probably more than 99% of such mutations, but the fact that mutations and chromosomal rearrangements occur — is a necessary and important part of evolution. If such “fixed” alterations occur early enough in embryonic development, the changes are inherited by all of an organism’s cells. However, if these alterations arise later in adult life, it is more difficult to track such changes in a small number of cells in a specific tissue; thus, the extent of these alterations in normal tissues is poorly understood.

It is believed that cancer is initiated when cells acquire a minimum compendium of genetic alterations needed to trigger tumor formation. Understanding when such initiating mutations occur in normal cells — is crucial for enabling reconstruction of early events that lead to cancer. Authors [see attached article & editorial] analyzed the extent of mutations in human epithelial tissue (i.e. the lining) from the healthy esophagus, and how this relates to processes that drive cancer development.

Authors sequenced 74 cancer-associated genes in 844 tissue samples, taken from the upper esophagus of nine healthy donors. For 21 of these 844 samples, authors also carried out whole-genome sequencing (WGS). A previous study assessing mutations in healthy skin cells — had found between two and six mutations per million DNA nucleotides. In contrast, the attached article describes that mutations in esophageal cells arose at a roughly 10-fold lower rate than that reported for skin. This difference is probably not surprising, because skin cells are exposed to more DNA-damaging agents (e.g. ultraviolet light) than are esophageal cells.

Interestingly, however, compared with healthy skin — the healthy esophagus has more mutations in cancer-associated genes. Moreover, at least a subset of these altered genes was under strong positive selection (meaning that these genetic alterations promoted cell proliferation, leading to formation of cell clones). Authors found that driver-mutated clones emerge multifocally — from early childhood (and increase in number and size with aging, ultimately replacing almost the entire esophageal epithelium in the extremely elderly). Compared with mutations in esophageal cancer, there is marked over-representation of NOTCH1 and PPM1D (known cancer-associated genes) mutations in physiologically normal esophageal epithelia; these mutations often appear before late adolescence (but as early as early infancy!!) and significantly increase in number with heavy smoking and drinking.

Authors conclude that remodeling of the esophageal epithelium by driver-mutated clones is an inevitable consequence of normal aging, which — depending on lifestyle risks — may affect cancer development. But one remarkable finding of this study is how many “undesirable” mutations are sitting there, in almost everyone, without cancer developing. 🙂

DwN

Nature 17 Jan 2019; 565: 312–317 [Article] & pp 101–103 [News-N-Views]

COMMENT:
this is a good point worth mentioning. I should have noted that.
COMMENT:
This research project represents a massive amount of work.
But, the bottom line of this study is that the researchers used an N of 9 –– which is a very small number.

Posted in Center for Environmental Genetics | Comments Off on Mutations in normal cells differ from those in cancer cells

Assembly of complete genome from 910 Africans: ~10% more DNA than the current human reference genome !!!

From the time of the first publication of the “complete human genome sequence” [Feb 2001; which was hardly “complete”, and which champagne celebration party at the National Library of Medicine in Bethesda, I (fortunately or unfortunately) attended?], the human genome “consensus sequence” has undergone continual improvements — aimed at filling in all the “gaps” and correcting errors. The latest release, GRCh38, spans 3.1 gigabases (Gb; billion bases), with “just” 875 remaining gaps. The ongoing effort to improve the human reference genome, led by the international Genome Reference Consortium (GRC) has, in recent years, added alternative loci for genomic regions where variation cannot be captured by single-nucleotide variants (SNVs) or small insertions and deletions (indels). These alternative loci, which comprise 261 scaffolds in GRCh38, capture a small amount of population variation and improve read-mapping for “some” datasets.

Despite these efforts, the current human reference genome is derived primarily from a single individual, thus limiting its usefulness for genetic studies — especially among admixed populations, such as those representing the African diaspora (human migrations out of Africa). In recent years, a growing number of researchers have emphasized the importance of capturing and representing sequencing data from diverse populations and incorporating these data into the reference genome. The alternative loci in GRCh38 offer one possible way to add such diversity, although it is unclear whether such a solution is sustainable (as more and more distinct ethnic populations are sequenced).

The lack of diversity in the reference genome poses many challenges — when analyzing individuals whose genetic background does not match the reference. This problem may be addressed by using large databases of known SNVs, but this solution only addresses SNV differences and small indels and is not adequate for larger variants (i.e. copy number variants (CNVs) and large insertions & deletions of hundred or thousands of bases) Findings from the 1000 Genomes Project indicate that differences between populations are quite large; examination of 26 populations across five continents revealed that 86% of discovered variants were present in only one continental group. In that study, the five African populations (because they have existed for the longest period of time on this planet) examined had the highest number of variant sites, compared with the remaining 21 populations.

One way to address limitations of a single reference genome is to sequence and assemble reference genomes for other subpopulations. The 1000 Genomes Project, Genome in a Bottle, and other projects have assembled draft genomes from various populations — including Chinese, Korean, and Ashkenazi individuals. Other groups have used highly homogenous populations (e.g. Danish, Dutch, or Icelandic), together with assembly-based approaches, to discover SNVs and structural variants — including up to several megabases of non-reference sequence common to these populations. Authors [see attached article] used a deeply-sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome.

Authors aligned 1.19 trillion reads from the 910 individuals with the GRCh38 reference genome, collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). Authors then compared all contigs to one another — to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Their analysis revealed 296,485,284 bp, present in populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome..!! Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic (i.e. in between protein-coding genes).

DwN

Nat Genet Jan 2019; 51: 30–35

Posted in Center for Environmental Genetics | Comments Off on Assembly of complete genome from 910 Africans: ~10% more DNA than the current human reference genome !!!

Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes

As with the previous GEITP email, the topic of this study [see attached report; note there are a bazillion coauthors on this publication J] is again about polygenic risk score (PRS) — which is the latest advance/improvement on genome-wide association studies (GWAS). PRS is the “new kid on the block.”

As these GEITP pages have stated previously, breast cancer is a multifactorial trait [i.e. reflecting the contribution of hundreds if not thousands of genes, plus epigenetic factors, plus endogenous influences (including age of onset of breast development), perhaps environmental effects (diet, smoking, occupation), and perhaps even one’s microbiome]. Many GWAS by large consortia have been published — resulting in ~170 potential genes identified statistically, but the total heritability (variance revealed) is only ~40%. Thus, the numerous common-breast-cancer-susceptibility variants discovered via GWAS confer individually a small risk; however, their combined effect, when summarized as a PRS, can be substantial.

Such genomic profiles can be used to stratify women, according to their risk of developing breast cancer. This, in turn, holds the promise of improved breast cancer prevention and survival — by targeted screening or other preventive strategies in those women most likely to benefit. A 2015 study had derived a PRS, based on 77 established breast-cancer-susceptibility single-nucleotide variants (SNVs) and reported levels of risk stratification achieved by this PRS. Empirical validation and characterization of the PRS in large-scale epidemiological studies has, however, not been carried out previously. In addition, more informative PRSs would improve the clinical utility of risk prediction.

The aim of the present study [see attached] was to develop individual PRSs, optimized for prediction of estrogen receptor (ER)-specific disease — from the largest available GWAS dataset, and to empirically validate the PRSs in prospective studies. The development dataset was composed of 94,075 cases and 75,017 controls of European ancestry from 69 studies. Samples were genotyped via GWAS, and SNVs were selected by stepwise regression and/or least absolute shrinkage and selection operator (LASSO) [this performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces]. The best performing PRSs were then validated in an independent test set comprising 11,428 cases and 18,323 controls from 10 prospective studies — as well as 190,040 women from the UK Biobank (3,215 incident breast cancers).

For the best PRSs (313 SNVs), the odds ratio for overall disease in ten prospective studies was 1.61 (i.e. 61% greater than by chance alone), and the lifetime risk of overall breast cancer in the top one-tenth of the PRSs was 32.6%. Compared with women in the middle quintile (‘quintile’ = each of five segments in a population), those in the highest 1% of risk had 4.37-fold and 2.78-fold risks, and those in the lowest 1% of risk had 0.16-fold and 0.27-fold risks, of developing ER-positive and ER-negative disease, respectively. Authors conclude that the PRS is a powerful and reliable predictor of breast cancer risk that may improve breast cancer prevention programs.

DwN

Am J Hum Genet 3 Jan 2o19; 104: 21–34

Posted in Center for Environmental Genetics | Comments Off on Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes

Leveraging Polygenic Functional Enrichment to Improve GWAS Power

Genome-wide association studies (GWAS) represent the prevailing approach for identifying risk loci for common diseases and complex traits — such as schizophrenia, obesity, type-2 diabetes, drug efficacy or response to environmental toxicants. In the study design of GWAS, millions of single-nucleotide variants (SNVs) are assayed in a large cohort of individuals (e.g. N=50,000 or 500,000) and marginally tested for association with the trait chosen to be investigated. To safeguard against false-positive (Type I error) associations, practitioners must impose stringent P-value thresholds, which can limit power. Consequently, only a small fraction of total SNV-associated heritability is explained by these variants that are significant at genome-wide thresholds (e.g. P <5.0 x 10–8; also written as 5.0e–08). For a fixed GWAS sample size, the statistical power to detect significant associations is determined by effect-size, minor allele frequency (MAF), and levels of linkage disequilibrium [LD; the non-random association of alleles (each gene has two alleles, one on each strand of chromosome pair) at different loci in a given population] at causal and non-causal variants. These three parameters interact in non-trivial ways in the context of complex traits, as well as quantitative traits (e.g. height, body mass index, I.Q.). For example, it has been reported that after adjusting for MAF, SNVs having lower levels of LD (i.e. decreased non-random association) have larger causal effects. These observations are motivating the development of new strategies that leverage polygenic (i.e. many genes) signals to improve GWAS power. Emerging functional genomics data have revealed that certain categories of variants are enriched for disease heritability. Thus, incorporating functional information into association analyses has the potential to increase GWAS power. However, previous integrative methods for GWAS hypothesis-testing either assume sparse genetic architectures [the underlying genetic basis of a phenotypic trait and its variational properties. Phenotypic variation for quantitative traits is the result of the segregation of alleles at quantitative trait loci (QTL)] when estimating functional enrichment, or requiring knowledge or approximation of the true effect-size distribution, or are not producing P-values for each SNV as output. In addition, general-purpose methodologies for association-testing that can integrate prior information — have not yet been thoroughly evaluated in the context of GWAS-leveraging functional genomics data. Authors [see attached article] propose an approach that uses polygenic modeling to weight SNVs — according to how well they identify functional categories that are enriched for heritability. Their procedure takes, as input summary association statistics, along with pre-specified functional annotations (which can be overlapping and/or continuously valued), and outputs well-calibrated P-values. Authors use a broad set of 75 coding, conserved, regulatory, and LD-related annotations that have previously been shown to be enriched for disease heritability. Then authors incorporate the weights computed (by a weighted-Bonferroni procedure that we won't go into). Through extensive simulations and analysis of UK Biobank phenotypes, authors demonstrate that their approach [called functionally informed novel discovery of risk loci (FINDOR)] reproducibly identified an additional 583 GWAS loci (a 13% increase in genome-wide significant loci detected — including a 20% increase for disease traits) while, at the same time, controlling for false positives. Authors conclude that "leveraging functional enrichment", using their FINDOR method, was able to robustly increase GWAS power. These GEITP pages have discussed before this polygenic functional enrichment of GWAS data to enhance identification of relevant SNVs in multifactorial phenotypes, and we expect to see this approach, eventually, in studies of drug efficacy, risk of toxicity caused by environmental agents, etc. DwN Am J Hum Genet 3 Jan 2o19; 104: 65–75

Posted in Center for Environmental Genetics | Comments Off on Leveraging Polygenic Functional Enrichment to Improve GWAS Power

Low-dose A-bomb radiation lengthen lifespan and decreases cancer mortality — compared with un-irradiated individuals

COMMENT:

Agreed, Jim. It is important for toxicologists and epidemiologists to realize that (very often, if not always): a small amount of a chemical — or in the case of this article, atom bomb radiation — can be beneficial; a larger dose will no longer be beneficial but still not toxic; and then a further increase in dose will be toxic and/or carcinogenic.

Examples that come to mind — off the top of my head — include carbon monoxide (CO), nitric oxide (NO), hydrogen sulfide (H2S), reactive oxygen species (ROS; O.), and 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD; the lay term is ‘dioxin’). Each of the first four of these five “environmental signals” (in small amounts) are now realized/appreciated that they function as a signaling molecule in crticial life processes — such as neuromodulation in the brain, smooth muscle relaxation in the vascular system, gut motility, electrical conductance in the heart, etc.

DwN

Posted in Center for Environmental Genetics | Comments Off on Low-dose A-bomb radiation lengthen lifespan and decreases cancer mortality — compared with un-irradiated individuals

Transcriptomics analysis to study dysregulation in autism spectrum disorder, schizophrenia, and bipolar disorder

Many large whole-genome sequenncing (WGS) consortia are searching for genetic pathways in clinical disorders such as autism spectrum disorder (ASD), schizophrenia (SCZ), and bipolar disorder (BD) — hoping to develop novel drugs to treat these three common psychiatric disorders that usually result in lifelong disability. Genome-wide association studies (GWAS) have identified hundreds of causal genetic variants that are (statistically robustly) associated with these disorders, plus thousands more that likely contribute to their pathogenesis. However, neurobiological mechanisms (through which genetic variation imparts risk to ASD, SCZ and/or BD) remain still largely unknown.

The majority of disease-associated genetic variation lies in noncoding regions (segments of DNA having no genes that result in protein products) enriched for noncoding RNAs (ncRNAs) and cis-regulatory [cis = near to a gene(s)] elements that regulate gene expression and splicing of their related coding gene targets. Such regulatory relationships show substantial heterogeneity across human cell types, tissues, and developmental stages and sometimes are even species-specific. Recognizing the importance of understanding transcriptional regulation and noncoding genome function, several consortia have undertaken large-scale efforts to provide maps of the transcriptome (portions of protein-coding DNA that are transcribed and result in an RNA or protein product) and its genetic and epigenetic regulation across human tissues.

Although some studies have included central nervous system (CNS) tissues, a more comprehensive analysis focusing on the brain in both healthy and disease states is necessary to accelerate our understanding of the molecular mechanisms of these disorders. Authors [see attaced article] present results of the analysis of RNA-sequencing (RNA-seq) data from the PsychENCODE Consortium, integrating genetic and genomic data from more than 2000 well-curated, high-quality postmortem brain samples from individuals with SCZ, BD, and ASD (as well as controls).

Coexpression networks are able to isolate disease-specific neuronal alterations, as well as microglial, astrocyte, and interferon-response modules — defining previously unidentified neural-immune mechanisms. Authors integrated genetic and genomic data to perform a transcriptome-wide association study, prioritizing disease loci that are likely mediated by cis effects on brain expression.This thorough transcriptome-wide characterization of molecular pathology across three major psychiatric disorders is an excellent start, in providing a baseline resource for future studies with regard to mechanistic insight and therapeutic development.

DwN

Science 14 Dec 2o18; 362: 1265

Posted in Center for Environmental Genetics | Comments Off on Transcriptomics analysis to study dysregulation in autism spectrum disorder, schizophrenia, and bipolar disorder

Sphynolactone-7 is a potent stimulant — that induces parasitic plant germination, causing it to die

This topic might seem a little bit bizarre, or obtuse — as far as the gene-environment interactions theme of these GEITP pages. On the other hand, it might reflect my subconscious desire to have been a plant molecular biologist.

However, in my opinion, “the environment” here is the parasite and “the genes” are those in the roots of the host crop that respond to that environmental signal. Striga hermonthica (Striga) are parasitic weeds (e.g. broomrapes, witchweeds) that infect roots of crops such as sorghum, millet, maize, rapeseed, tomato, sunflower, and legumes. These obligate parasites (i.e. those dependent on a host for their survival) use their host to grow and on which to reproduce. Striga only germinate in the presence of a germination stimulant originating from the host root. So, here we even have a gene-environment interaction within a gene-environment interaction [i.e. genes of the parasite are responding to a germination signal (their environment) from the host; and genes of the host are responding to a signal (the environment) from the parasite].

1. There is an agricultural need to find some efficient means of protecting crops from the tiny Striga seeds buried in the soil. If infestation can be suppressed, millions of dollars in the food industry could be saved. A group of host-generated small-molecule hormones, called strigolactones (SLs), are known to induce germination of Striga seeds. If Striga germinates in the absence of the host, the germination process is lethal; this knowledge has convinced researchers to develop SL agonists (‘agonists’ are substances that initiate a physiological response when bound to a receptor; an agonist substitutes for the ‘normal’ signal and elicits an action, whereas an antagonist blocks an action). Such agonists could therefore act as inducers of suicidal germination — to purge the soil of viable Striga seeds.

2.

Of course, development of any potent and accessible compounds must act only on Striga — without impeding normal crop development. Authors [see attached article & editorial] describe herein the development of a Striga-selective SL agonist that acts as a potent synthetic germination stimulant in the femtomolar range (having a concentration of 10-15 moles per liter). Thus, this might be the successful development of an agrochemical (sphynolactone-7) that may be used to germinate parasitic weeds in the absence of a host (so that they will die, called suicide germination) and hopefully to provide insight into what subcellular mechanism(s) determine(s) specificity of these parasites.

DwN

Science 14 Dec 2o18; 362: 1301–1305 [article] & pp 1248-1249 [editorial]

Posted in Center for Environmental Genetics | Comments Off on Sphynolactone-7 is a potent stimulant — that induces parasitic plant germination, causing it to die

Receptor for IRISIN — the exercise-induced hormone, has now been identified

One might consider the topic for today’s gene-environment interactions a bit unusual: “exercise” in this case is “the environment”, and the response to this environmental signal is “the genes” in the signaling pathways that aid in bone formation and burning of adipose tissue.

Physical activity has been shown to benefit several metabolic disorders (e.g. obesity, diabetes, and fatty liver disease). Earlier studies suggest that exercise might prevent age-related bone loss. Loss of bone mass with age has significant socioeconomic and medical implications, because of heightened susceptibility to fractures. Osteopenia — and the more serious disorder osteoporosis — impair mobility, increase comorbidities, reduce quality of life, and can even shorten lifespan.

Evidence that an exercise program can prevent bone loss is somewhat controversial, in part because different types of physical activity affect the skeleton at distinct sites in different ways. Sclerostin (a local modulator of bone remodeling) is produced almost exclusively by osteocytes (bone cells that are formed when an osteoblast becomes embedded in the matrix that it has secreted) — which might be considered the ‘‘command and control’’ cells of the bone-remodeling unit. Osteocytes arise from mature osteoblasts (cells that secrete the matrix for bone formation), are imbedded in the cortical matrix, and comprise nearly 90% of the cellular composition of bone; therefore, osteocytes are thought to be transducers of mechanical signals arising from physical activity.

In turn, osteocytes — through an elaborate network of canaliculi — communicate with both osteoblasts and osteoclasts (cells that break down bone and are responsible for bone resorption), thereby regulating bone remodeling (under tight control). Emerging evidence suggests that osteocytes can also directly resorb bone during periods of excessive calcium demand, or after ovariectomy (removal of ovaries); therefore, osteocytes have become a prime target for treating osteoporosis with parathyroid hormone (PTH) and/or monoclonal anti-sclerostin antibodies. Anti-sclerostin antibodies increase bone mass dramatically in humans, but may also have cardiovascular side-effects that could limit their use in clinical practice.

Authors [see attached paper] demonstrate that IRISIN functions by binding to a subset of aV-integrin receptors to promote osteocyte survival and sclerostin secretion. Biophysical studies identified interacting surfaces between irisin and aV/b5 integrin. Chemical inhibition of the aV integrins blocks signaling and inhibits irisin’s function in osteocytes and adipocytes (fat cells). Irisin increases both osteocyte survival and production of sclerostin. Moreover, genetic deletion of the Fndc5 gene (encoding irisin) in C57BL/6 mice results in complete resistance — at the trabecular and cortical compartments — to bone loss caused by ovariectomy. Thus, these data identify the functioning receptor for irisin. These findings should facilitate future studies of irisin as a possible treatment for bone loss, as well as perhaps other tissues that respond to physical activity.

DwN

Cell 2o18; 175: 1756–1768

Posted in Center for Environmental Genetics | Comments Off on Receptor for IRISIN — the exercise-induced hormone, has now been identified

In Remembrance of: Luigi Luca Cavalli-Sforza (1922–2018)

This is just a brief GEITP note to report the passing of Luigi Luca Cavalli-Sforza, who died at home in Belluno, Italy, 31 Aug 2018; next week would have been his 97th birthday. Professor Cavalli-Sforza was past president of the American Society of Human Genetics (1989) and winner of the ASHG’s Allan Award (1987), and he was awarded the Balzan Prize (1999). He is survived by his four children — Matteo, Francesco, Luca Tommaso, and Violetta — and their families; his wife of more than 60 years predeceased him in 2015.

For several decades Professor Cavalli-Sforza has been internationally recognized as one of the world’s foremost human population genetics pioneers — enjoying an active research career that spanned 70 years. Cavalli-Sforza initiated a new field of research by combining the concrete findings of demography with a newly available analysis of blood groups in an actual human population. He also studied the relationships between migration patterns and blood groups.

Writing in the mid-1960s with Anthony W D Edwards, Cavalli-Sforza pioneered statistical methods for estimating evolutionary trees (phylogenies). To estimate evolutionary trees, they used maximum likelihood estimation (MLE) [see this web site for more information: https://en.wikipedia.org/wiki/Maximum_likelihood_estimation]. Edwards and Cavalli-Sforza wrote about trees of populations within the human species, where genetic differences are affected both by tree-like patterns of historical separation of populations, and by spread of genes among populations by migration and admixture

I was first introduced to Professor Cavalli-Sforza’s far-reaching ideas by my friend and colleague, Anil Menon (University of Cincinnati), who gave me as a gift Cavalli-Sforza’s 1995 book, titled “The Great Human Diasporas – The History of Diversity and Evolution”, which was coauthored by one of his sons, Francesco Cavalli-Sforza. I have read that book more than once and highly recommend it to anyone interested in the migration (diaspora) of humans across the planet, and the development of geographically isolated groups that represent today’s five major branches of populations on Earth — African, East Asian, Oceanian, Caucasian and Amerindian.

DwN

Am J Hum Genet 3 Jan 2o19; 104: 11–12

Posted in Center for Environmental Genetics | Comments Off on In Remembrance of: Luigi Luca Cavalli-Sforza (1922–2018)

Millions of People Wrongly Believe They Have Food Allergies

This article just appeared in Time magazine. What genetic composition (genes) is it that people have — when they respond to foods (environment) — resulting in a “true food allergy” for some, versus a “psychogenic food allergy” for just about an equal number of people? There are gene-environment interactions going on, somewhere, in this picture. 😉

DwN

Millions of People Wrongly Believe They Have Food Allergies

By JAMIE DUCHARME

January 8, 2019

These days, it can seem like just about everybody has a food allergy. But according to a new study, about 11% of American adults actually do. Yet 19% of adults believe they have a food allergy, even though some don’t have the diagnosis or symptoms to back it up, according to findings published in J Am Med Assoc (JAMA) Network Open.

This discrepancy suggests that quite a few adults are conflating allergies with less-severe food intolerances, which typically come with minimal digestion-related symptoms, the researchers write. If someone is truly allergic to a food, eating it can trigger a potentially life-threatening immune response. (People who are lactose intolerant, for example, may experience bloating, stomach pain and gas after eating dairy products, while those with a true milk allergy can experience wheezing, hives and anaphylaxis.)

The new estimates were based on survey responses from almost 40,500 American adults who were asked if they had any diagnosed allergies, symptoms or hospitalizations. The researchers couldn’t independently confirm whether each survey respondent actually had a food allergy, but allergies were considered “convincing” if the person reported a physician’s diagnosis or significant symptoms such as swelling, trouble breathing, chest pain, vomiting or fainting after eating a certain food. Reports of an allergy that were only backed by milder symptoms, such as itching, stomach pain and rashes, did not meet the researchers’ criteria.

They found that almost 11% of people had at least one convincing food allergy. The most common allergens were shellfish (2.9%), milk (1.9%), peanuts (1.8%), tree nuts (1.2%) and fin fish (0.9%), according to the study. Extrapolated to the national level, that means an estimated 7.2 million American adults are allergic to shellfish, 4.7 million are allergic to milk, 4.5 million are allergic to peanuts, 3 million are allergic to tree nuts and 2.2 million are allergic to fin fish. About 45% of the adults with a food allergy had more than one, the researchers found.

Food allergies are also common in children, affecting about 8% of American kids, but many childhood allergies can be outgrown. Other times, allergies can start in adulthood: About 48% of people with allergies in the new study reported developing at least one of their conditions as an adult.

Despite the relatively high rates of food allergy — and the fact that 40% of people with an allergy had visited the emergency room because of it — the researchers found that only 47.5% of those with an allergy had been officially diagnosed by a doctor. Getting verification from a doctor, the authors write, could help true allergy sufferers get the treatment and avoidance tips they need — and spare those with only intolerances from a lifetime of unnecessary precautions.

Write to Jamie Ducharme at jamie.ducharme@time.com.

Posted in Center for Environmental Genetics | Comments Off on Millions of People Wrongly Believe They Have Food Allergies