A HORRIFIC STORY OF DISHONESTY IN PUBLISHING SCIENTIFIC PAPERS

COMMENTS

 

Hi Dan, I see that China is also trying to tackle the predatory journals and fake articles!

Olavi
China updates naughty list of journals

China has updated its Early Warning Journal List — a list of journals that are deemed to be untrustworthy, predatory or not serving the Chinese research community’s interests. The latest edition includes 24 journals and, for the first time, takes note of misconduct called citation manipulation, in which authors try to inflate their citation counts. Scholarly literature researcher Yang Liying heads up the team that produces the influential list and spoke to Nature about how it’s done.

Here is an interesting alert from Nature Briefing—on the same theme, or topic, as your recent blog!

Olavi
Co-authors point the way to paper mills

A new approach looks at authors, rather than the content of papers, to help identify journal articles that originate from ‘paper mills’ — factories for fake research. It looks for unusual patterns of co-authorship and peculiar networks of researchers, which could be a sign that authorship was paid for, rather than earned. The approach could be crucial as artificial intelligence (AI) systems make it all too easy to churn out convincing fake manuscripts. “This is the kind of signal that is much more difficult to work around, or outcompete, by clever use of generative AI,” says Hylke Koers of the International Association of Scientific, Technical, and Medical Publishers.

Thanks for sharing! As course coordinator for our Advanced Writing in Biology course, we always devote the early part of the course to discussion about scientific misconduct and publishing ethics.

It was interesting that the concept of predatory journals didn’t connect well with students, even though we tried to emphasize that when they moved into writing their literature reviews — they needed to rely on indexed databases (such as PubMed) to find credible papers from credible sources. Too often, they just Google whatever and end up with crap.

This year, our focus was on generative Artificial Intelligence (AI), which just makes the whole problem worse. If ChatGPT can’t find what you need, it just hallucinates and makes it up. These are troubling times for scientists who wish to remain honest!

Chris

Posted in Center for Environmental Genetics | Comments Off on A HORRIFIC STORY OF DISHONESTY IN PUBLISHING SCIENTIFIC PAPERS

Photoreceptor Evolution–from Water Animals to Land Animals

This topic is among the most obvious of “gene-environment interactions” topics…!! 😊 During evolution, when vertebrates first left the water and ventured onto land, they encountered a visual world that was radically different from that of their aquatic ancestors. In order to survive successfully, “new” land species were required to adapt visually, in terms of being able to find food, avoid predators, and sexually reproduce. “The need to see acutely” — represents the environmental pressure; and necessary (and relatively evolutionarily “quick”) changes in visual networks of the eye (and/or brain) — represents the response by genes…

Fish exploit the strong wavelength-dependent interactions of light with water by differentially sending visual image signals from as many as five spectral photoreceptor types into distinct behavioral programs. However, out of the water — the same spectral rules do not apply, and this adaptation required rapid changes in response to this environmental pressure.

Early tetrapods [e.g., Kenichthys from China ~395 million years ago (MYA), Gogonasus & Panderichthys ~380 MYA, and then salamanders with four legs] soon evolved the double cone, a still poorly understood pair of new photoreceptors that increased the “ancestral terrestrial” complement from five to seven photoreceptors. Subsequent non-mammalian lineages differentially adapted this highly parallelized retinal input strategy for their diverse visual ecologies. In contrast, mammals (first appearing ~225 MYA) shed most of their ancestral photoreceptors and converged on an input strategy that is extraordinarily general. In eutherian mammals (i.e., animals born via placenta), including humans, parallelization emerged gradually during evolution, as the visual signal began to traverse the layers of the retina and onwards, into the brain.

Vertebrate vision first evolved in the water, where (for >50 million years) it was consistently based on visual signals from five anatomically and molecularly distinct types of photoreceptor neurons: rods, as well as ancestral red, green, blue, and UV cones (expressing RH, LWS, RH2, SWS2, and SWS1 opsin, respectively). In the water, these five input streams are probably best thought of as parallel feature channels that deliver distinct types of information to distinct downstream circuits. This is because water absorbs and scatters light in a wavelength-dependent manner (see Fig 1A in attached pdf file), which means that “beyond color,” different spectral photoreceptor channels inherently deliver different types of visual information.

Aquatic visual systems have recently been proposed to evolutionarily reach “answers” that exploit these differences. In this view, photoreceptors represent parallel channels that are differentially wired to drive and/or regulate distinct behavioral programs (see Fig 1B in attached pdf file): First, rods and ancestral red cones are the eyes’ primary brightness sensors; they are used for general-purpose vision and to drive circuits for body stabilization and navigation. Second, ancestral UV cones are used as a specialized foreground system, primarily wired into circuits related to predator–prey interactions and general threat detection. Third, ancestral green and

blue cones probably represent an auxiliary system, tasked with regulating, rather than driving, the primary red/rod and UV circuits.

This ancestral strategy exploits the specific peculiarities of aquatic visual worlds; however, in air the same rules do not necessarily apply. For example, in water, object vision can be a relatively easy task, because background structure tends to be heavily obscured by an approximately homogeneous aquatic backdrop. At short wavelengths, including in the UV range, this effect can be so extreme that no background is visible at all. Many small fish exploit this fact of physics to find their food. Out of water, this and many other “ancestral visual tricks” no longer work, because in air, contrast tends to be largely independent of viewing distance: everything is visible at high contrast. Accordingly, when early would-be tetrapods started to crawl out of the water, strong selection pressures would have favored a functional reorganization of some of these inherited aquatic circuits; nowhere is this more evident that at the level of the photoreceptors themselves.

One of the earliest and perhaps most important retinal circuit changes was the emergence of the double cone, which took the “aquatic ancestral” photoreceptor complement of five to a “terrestrial ancestral” complement of seven (see Fig 1 in attached pdf file). The visual systems of all extant tetrapods, including humans, directly descend from this early “terrestrialized” retinal blueprint. However, from there, different descendant lineages have taken this highly parallelized retinal input strategy and embarked upon radically different visual paths. Most lineages, including those that led to modern-day amphibians, reptiles, and birds — have retained the terrestrialized ancestral blueprint, modifying upon it to suit their unique visual ecologies.

Mammals, however, have ended up on a very different path. Their early synapsid ancestors gradually shifted some of their visual systems’ “heavy lifting” out of the eye and into the brain. Along this path — whether as

cause or consequence — descendant lineages gradually decreased their photoreceptor complements from seven types to six, then five, and eventually to the mere three that we see in eutherians today (see Fig 1C in attached pdf file): Rods (RH), as well as ancestral red (LWS) and UV cones (SWS1).

Primates, including humans, have then taken this eutherian strategy to the extreme: >99.9% of all photoreceptors in our eyes are either rods or ancestral red cones (including both “red-” and “green-shifted LWS variants”), the ancestral “general purpose” system of the eye. The remaining 0.1% is what is left of the ancestral UV system, today expressing a blue-shifted variant of the SWS1 opsin (hence, often called “blue cones,” not to be confused with ancestral blue cones that express SWS2). In concert, the “three” cone variants drive achromatic vision (although with limited contribution from ancestral UV cones), while in opposition they serve color vision.

However, this “textbook strategy” is far removed from the original aquatic circuit design and probably quite unique to our own lineage. Accordingly, for understanding vision in a general sense, and to understand our own visual heritage, it will be critical to respect the vertebrates’ shared evolutionary past. Here, vision is built on a retinal circuit design that begins with major parallelization — right from the original evolutionary input. 😉😊

DwN

PLoS Biol Jan 22: e3002422

Posted in Center for Environmental Genetics | Comments Off on Photoreceptor Evolution–from Water Animals to Land Animals

A HORRIFIC STORY OF DISHONESTY IN PUBLISHING SCIENTIFIC PAPERS

Below are two responses to the Feb 7th GEITP blog email about “companies churning out fake papers are now bribing journal editors; and some editors are agreeing to accept large sums of cash ‘under the table’ to help fraudulent academicians get their ‘fake paper’ published.” ☹

DwN

From: Christine Curran
Sent: Friday, February 9, 2024

Thanks for sharing! As course coordinator for our Advanced Writing in Biology course, we always devote the early part of the course to discussion about scientific misconduct and publishing ethics.

It was interesting that the concept of predatory journals didn’t connect well with students, even though we tried to emphasize that when they moved into writing their literature reviews — they needed to rely on indexed databases (such as PubMed) to find credible papers from credible sources. Too often, they just Google whatever and end up with crap.

This year, our focus was on generative Artificial Intelligence (AI), which just makes the whole problem worse. If ChatGPT can’t find what you need, it just hallucinates and makes it up. These are troubling times for scientists who wish to remain honest!

Chris

Christine Curran, PhD

Professor, Northern Kentucky University, Highland Heights, KY

From: Olavi Pelkonen Sent: Friday, February 9, 2024

Dan,

Here is an interesting alert from Nature Briefing—on the same theme, or topic, as your recent blog!

Olavi
Co-authors point the way to paper mills

A new approach looks at authors, rather than the content of papers, to help identify journal articles that originate from ‘paper mills’ — factories for fake research. It looks for unusual patterns of co-authorship and peculiar networks of researchers, which could be a sign that authorship was paid for, rather than earned. The approach could be crucial as artificial intelligence (AI) systems make it all too easy to churn out convincing fake manuscripts. “This is the kind of signal that is much more difficult to work around, or outcompete, by clever use of generative AI,” says Hylke Koers of the International Association of Scientific, Technical, and Medical Publishers.

Nature | 5 min read

Reference: arXiv preprint

​Olavi Pelkonen

eProfessor of Pharmacology

University of Oulu, Finland

From: Nebert, Daniel (nebertdw)
Sent: Wednesday, February 7, 2024

It was about 2004 that publishing companies began publishing scientific manuscripts online, rather than in paper journals. GEITP is guessing that PLoS Publishing Company was first (and it remains honest and legitimate). But it didn’t take long before somewhat shady, to downright fraudulent, “predatory online open-access journals” began to pop up. By 2014, there were at least 4,500 “predatory journals” and today there are probably more than 18,000.(!!)

Over the past 15 years, GEITP has discussed many of these fraudulent publisher stories (https://genewhisperer.com/). One extreme example was a “family of four, living in a tiny house in a small village in Turkey, using their kitchen table as their ‘publishing company’, and raking in $1.2 million in one year (without paying any taxes).” The modus operandi is always similar: [a] recruit for “papers” (even if they’re only one or two pages in length), [b] pretend they are quickly “peer-reviewed” (which may or may not be the case), [c] accept the manuscript quickly, almost always without any need for modifications, and [d] charge an exorbitant amount of money in “page charges” to have “your manuscript published quickly online.”

One major factor in considering an academic PhD or MD for a position, or promotion to a higher position — is the “number of publications” the applicant reports. [In some circles, the “number of publications only in highly prominent journals” is an important criterion, but that’s not the case for the vast majority of hiring and promoting of individuals in academia, worldwide.]

And then, in 2009, we should all remember the Sokal Hoax [ https://physics.nyu.edu/faculty/sokal/ ] in which a physicist wrote a completely gibberish paper and submitted it to what was considered one of the better journals in the field (Social Text). And the paper supposedly got peer-reviewed and published anyway. The editors later backtracked by saying that they thought the paper “lacked originality, that it wasn’t well written, that they just accepted it as a favor to Dr. Sokal, a physicist, visiting their rigorous area of study, and so on” — but the fact remains that an editor should be able to distinguish a valid paper from a pile of garbled nonsense.

During the last 6-8 years, it has become popular to “tack on the names of coauthors from the same institute or hospital who were not actually involved with the research,” to help these individuals in getting hired and/or promoted (i.e., maybe five scientists did all the work and writing the manuscript — but another 18 physicians, in need of “more publications”, had their names inserted in the middle of the co-authorship list). GEITP has also covered such fraudulent stories in the recent past.

Now comes the latest [see attached Jan 2024 editorial]: shady “companies,” churning out fake papers, have decided to bribe journal editors.(!!) Exploiting the growing pressure on scientists worldwide to amass publications — even if they lack resources to undertake quality research — these sneaky intermediary “companies” (by some accounts) pump out tens, or even hundreds, of thousands of articles every year. Many contain fictional data; others are plagiarized, or of low quality. Regardless, authors pay to have their names on them, and these “paper mills” can make tidy profits.

Nicholas Wise (a fluid dynamics researcher at the University of Cambridge (England), moonlights as a scientific fraud buster; he was digging around on shady Facebook groups and saw something new. Rather than targeting potential authors and reviewers, someone (who calls himself “Jack Ben”, from a firm whose Chinese name translates as “Olive Academic”) was approaching journal editors — and offering them large sums of cash, in return for accepting papers for publication. [Even a spokesperson for Elsevier said every week its editors are offered cash in return for accepting manuscripts.]

“Sure, you will make money from us,” “Ben” promises prospective collaborators in a document linked to the Facebook posts, along with screenshots showing transfers of as much as $20,000 or more. More than 50 journal editors have already signed on, he wrote. There was even an online form for interested editors to fill out.

According to a new preprint, more than half of medical residents in one country admit they have engaged in research misconduct — such as buying papers or fabricating results. One reason is that publications, although no longer always a strict requirement for career advancement, are still the easiest path to promotion in a range of professions — including doctors, nurses, and teachers at vocational schools, according to sources. Yet these groups may have neither the time nor the training to do serious research. In such a setting, paying a few hundred or even a thousand dollars to see one’s name in print may seem a worthwhile investment.

Everyone is invited to read the complete amazing story in the attached pdf file.(!!) 😊

For scientists about to submit their manuscript and who are wondering how to select an honest journal versus a “predatory online open-access journal” — you are encouraged to contact the “Membership in the Directory of Open Access Journals” or the “Open Access Scholarly Publishers Association.” These are good indicators that are able to confirm whether a journal is not predatory. You can check these sites to help you determine that the journal in which you are interested is legitimate. Also, please read this interesting 2020 publication https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7237319/ and these 2024 updated library guidelines as to “how to determine whether your selected journal is legitimate or predatory”: https://nuim.libguides.com/openaccess/predatory 😊😊

DwN

Science 19 Jan 2024; 383: 252-255

Posted in Center for Environmental Genetics | Comments Off on A HORRIFIC STORY OF DISHONESTY IN PUBLISHING SCIENTIFIC PAPERS

Breaking Through: My Life in Science by Katalin Karikó review – real-life lessons in chemistry

This book might become mandatory reading for some high school or college courses — designed to encourage women to go into science and persist at tenacity and self-belief. 😊😊

DwN

Breaking Through: My Life in Science by Katalin Karikó review – real-life lessons in chemistry

This vivid account of the Hungarian biochemist — who endured decades of derision before pioneering Pfizer’s Covid vaccine — is a tribute to her tenacity and self-belief

Robin McKie
Robin McKie

Sun 11 Feb 2024

In May 2013, Katalin Karikó turned up for work at her laboratory at the University of Pennsylvania and found her belongings piled in the hallway. “There were my binders, my posters, my boxes of test tubes,” she recalls. Nearby a lab technician was shoving things into a trash bin. “My things!” Karikó realised.

Despite having worked at the tiny lab for years, the scientist – then in her 50s – was cast out, without notice, for failing to bring in “sufficient dollars per net square footage”. In short, she had not attracted enough grants to justify the meagre space she occupied.

“That lab is going to be a museum one day,” Karikó hissed at the manager who had ousted her. These were odd but prophetic words, as is made clear in this engrossing, touching tale of the tribulations of a scientist now recognised as one of the world’s greatest biochemists, a woman who helped create the vaccines that saved millions during the Covid-19 pandemic.

Karikó comes from a humble background in central Hungary, growing up in a single-roomed house that was heated in winter by a solitary stove and had no running water. Her father had to work as a labourer when he was dismissed from his job as a master butcher after falling foul of local Communist party officials.

It was a harsh life but a loving one, as Breaking Through reveals. Her family was close-knit and the state at least encouraged education. And Karikó was a worker. “I don’t consider myself especially smart, but what I lacked in natural ability, I could make up for in effort,” she says.

A vial of Pfizer’s Covid vaccine

A vial of Pfizer’s Covid vaccine. Photograph: Rogelio V Solis/AP

She took summer science classes, became a biology student at Szeged University and eventually obtained a PhD there. Aged 22, she fell in love with Béla Francia, a trainee mechanic five years her junior. They married, and in 1982 Karikó gave birth to their daughter, Susan. Two years later they moved to the US with their entire savings – about £900 – that were sewn inside Susan’s teddy bear to avoid Hungary’s currency restrictions.

By this time, Karikó had become obsessed with messenger RNA (mRNA), the material responsible for translating our DNA into proteins, the molecules from which we are constructed. Crucially, mRNA is extremely difficult to work with because it is fragile and short-lived. But Karikó was convinced it could play a major role in medicine and constantly fought for it to be a research focus. Few colleagues agreed, dubbing her “the crazy mRNA lady”.

Such epithets were a minor headache, however. At Temple University, in Philadelphia, where she began her work in the US, her chief, Robert Suhadolnik – after initially being supportive – tried to have her deported because she had had the temerity to seek a post at another university.

Eventually she moved to the University of Pennsylvania. Again, things went well at first, but as she maintained her mRNA obsession, the university began criticising her failure to attract grants. She was demoted, refused tenure, had her pay cut and finally found her possessions dumped in a hallway.

Given one of the first Covid shots to be administered in the US, she recalls, ‘My eyes grew misty’

Fortunately for Karikó – and the rest of the world – her obsession with mRNA was now shared by several other scientists and she was snapped up by the German company BioNTech to begin work on mRNA medicines.

The rest is scientific history. When Covid-19 struck, BioNTech and Karikó realised they were in a prime position to tackle the pandemic, and with the backing of the pharmaceutical giant Pfizer, developed a vaccine that played a key role in helping to protect the planet against the worst vicissitudes of coronavirus.

How this success affected Karikó is explained in one of the most moving moments in Breaking Through. She returned to Penn to be given one of the first Covid shots to be administered in the US. Karikó was spotted in the crowd and she was hailed, as a vaccine inventor, with roars of approval. “My eyes grew misty,” she recalls.

This is a vividly written, absorbing memoir of a life filled with triumphs (including her daughter Susan’s own successes as an Olympic gold medal-winning rower) over near-constant adversity. The precise reasons for the continual undermining of her research and academic prestige are left open, though Breaking Through hints that science today suffers because it requires its practitioners to publish papers in numbers rather than merit and to seek grants for safe research, as opposed to risky but potentially groundbreaking work. Quantity not quality has become a career driver.

Ironically, the last laugh for Karikó is missing from Breaking Through. Along with Drew Weissman, she won the Nobel prize for physiology in October 2023 – too late for inclusion in her book. What those who thwarted her research must think about this final success can only be guessed. One thing is clear, however. Her old laboratory may not yet be a museum, but it surely will be one day.

Breaking Through: My Life in Science by Katalin Karikó is published by Bodley Head (£22). To support the Guardian and Observer order your copy at guardianbookshop.com. Delivery charges may apply

COMMENTS:
Everyone is saying that this book should be widely read — by administrators as well as scientists. I might add that we need to include study section members and federal administration officials who decide who does vs who does not receive funding for their proposed project.

Instead of being approved to repeat and confirm some established finding, principal investigators need to be approved for “cutting-edge, risky proposals” (“outside the box” proposals) designed to move the field forward to the next level. More than several times, I’ve seen a grant turned down because “the proposed research is novel and therefore too risky” to be funded. ☹
DwN

Daniel
Probably just as important, there should be mandatory reading for Division/Department Directors and Research Administrators.

Ray
I have read the book, and it should be mandatory reading — starting with all academic supervisors and administrators. Not to mention the editor of Science, editor of Nature, etc.

Doron
Touching story! Thank you for sharing this with us. I knew some of the details, but this is a really nice detailed report. Who knows how many potential Nobels give up and never get the prize… but she didn’t!

Posted in Center for Environmental Genetics | Comments Off on Breaking Through: My Life in Science by Katalin Karikó review – real-life lessons in chemistry

The Scientific Method and Critical Thinking

What I see is government-funded scientists forgetting the progress in scientific thought in the 19th century, especially Germany. Hermann Helmholtz, about the most noteworthy denounced the way Goethe and Hegel and Aristotle processed scientific thought. Aristotle, unquestionably one of the greatest minds that ever lived, believed that all truth can come from reasoning alone. Francis Bacon, Voltaire, Descartes, and Galileo introduced the need for observation and experiment in science. The whole point of science is to understand nature, and if nature doesn’t abide by your theories, nature can’t be wrong. You are wrong and need to revisit your research.

Nobel Laureate physicist Richard Feynmann, who pinpointed the cause of the Challenger space shuttle disaster, emphasized the importance of experiment:

If you’re doing an experiment, you should report everything that you think might make it invalid—not only what you think is right about it…Details that could throw doubt on your interpretation must be given, if you know them.

So, a scientist, to be worth anything, must want to understand nature.

Posted in Center for Environmental Genetics | Comments Off on The Scientific Method and Critical Thinking

SCIENTIFIC AUTOBIOGRAPHY

I received an invitation in July 2022 — to write “my scientific autobiography,” to be published in the 2024 issue of Annual Review of Pharmacology and Toxicology (APRT). The rules involved a “20-page maximum, which should include proposed figures and tables.”

My original draft was submitted at the end of Jan 2023; exactly 20-page limit. Each draft was peer-reviewed (by I don’t know how many people) multiple times, and I kept receiving comments: “Let’s talk less about this, and expand more about that.” “Please add a ‘family origins’ section at the start.” “You must add a ‘Legacy’ section.” “Did you plan your career?” “What future directions do you suggest for each of your projects?” “What fundamental pharmacological and toxicological rules did you learn over your 50-year career?”

For each question, my initial response was “Then, I’ll have to delete something in order to add that section to my article.” Their reply was “Go ahead and just add the section; we <> give you some wiggle room (beyond that initially-proposed 20-page limit).” 2023 was a roller-coaster year of additions, modifications, and subtractions, but the preprint was completed in Oct, and then this Jan 2024 final is considerably modified further and updated from that preprint (total pages of my article spills over onto page 26). The attached also includes Table of Contents of the 600+ page 2024 volume, plus a list of “Related Articles.”

Only one or two invited scientific autobiographies are planned for each year in the ARPT, so this has been quite a special honor. 😊😊

DwN

Posted in Center for Environmental Genetics | Comments Off on SCIENTIFIC AUTOBIOGRAPHY

A Mammalian DNA Methylation Landscape

As this GEITP group has discussed numerous times, each person’s overall genetic architecture (landscape) represents the combination of differences in our: [a] genetics (e.g., DNA sequence changes); [b] epigenetics (chromosomal but not involving DNA sequence); [c] environmental effects (cigarette smoking, occupation); [d] endogenous influences (cardiovascular or renal disease); and [e] each individual’s microbiome. [Except for stem cell DNA sequence differences, the other four categories continue to be subject to change in all cell-types throughout one’s lifetime.]

Epigenetic processes include DNA-methylation, RNA-interference (RNAi), histone-modifications, and chromatin-remodeling.

Genome-wide association studies (GWASs) are used to determine associations of individual DNA nucleotide changes (single-nucleotide variants, SNVs) with phenotypes (traits such as e.g., height, longevity, schizophrenia or risk of autism spectrum disorder, or risk of lung cancer from cigarette smoking, or cancer from asbestos exposure). In contrast to looking at the genome (DNA nucleotide sequence) in a GWAS, the attached article and editorial examines “the methylome” (DNA sites that are methylated vs not methylated — comparing 348 mammalian species simultaneously). Or subsets of the entire methylome… By the time you finish reading this, you probably will learn more about DNA-methylation than you really cared to know. 😉

“Life span” is an example of a phenotype. Mammals vary greatly in life span (e.g., the bowhead whale can live up to 200 years, whereas the giant Sunda rat lives only about 6 months). This disparity is encoded in the genomes of each species; however, which genes are linked to these traits is still poorly understood. Because all mammals have (approximately) the same genes, variation in how these genes are regulated should be important in determining longevity.

Authors (see attached; I count more than 200 coauthors!!) present a large-scale study of DNA methylation (more methyl groups usually results in down-regulation of gene expression; fewer methyl groups generally means up-regulation of gene expression) in a diverse range of mammalian species. Authors identified genomic regions that (e.g.,) might control life-span variation among lineages, which could help uncover the molecular drivers of life span and other traits in mammals.

DNA methylation is a chemical modification (addition of CH3- group) that almost always occurs in cytosines that are followed by a guanine (CpGs) in mammalian genomes. DNA methylation information is inherited after mitosis; however, it is constantly changing during development or among tissues — and over the lifetime of every organism. DNA methylation differences occur mostly at “enhancers” [i.e., stretches of DNA that dictate the expression of a nearby gene(s)]. Thus, each cell-type and tissue in the body has a precise DNA methylation signature (like a barcode).

Although DNA methylation is frequently not the main factor that dictates gene regulation, it is a robust biomarker for gene activity and cell identity. [To make things more mind-boggling, in the human body, there are about 208-212 estimated cell-types; consequently, the number of “DNA methylomes” each of us has would be 208-212 epigenomes.]

DNA methylation is easier to measure than other classic (epigenetic) gene regulatory mechanisms, such as histone modifications or transcription factors. However, reliable quantification of DNA methylation across the genome is not trivial because current gold-standard methods (such as whole-genome bisulfite sequencing) require a reference genome and large amounts of data. This makes studying DNA methylation across a large number of samples difficult, and large sample sizes are required to find significant associations between DNA methylation and complex traits such as life span or body weight. This limitation can be overcome by using microarrays to probe for specific subsets of CpGs. Such microarrays have previously been used for studies in humans and mice.

Authors [see attached] used a recently designed pan-mammalian DNA methylation microarray that captures a subset of the CpGs that are conserved across all (available) mammals — including marsupials and egg-laying mammals, at high confidence and for a fraction of the cost of other methods. Such a microarray does not need a reference genome, and the CpG islands are directly comparable across samples and species. Authors profiled the DNA methylation of 15,456 samples from 348 species, including up to 70 tissues/cell-types per species. They used data from blood (a tissue comparable across all species) to obtain species relationships solely on the basis of DNA methylation. This clustering largely recapitulated the mammalian Tree of Life (which indicates that phylogeny and species relatedness is a major factor that underlies variation in DNA methylation).

To disentangle the variation in DNA methylation explained by phylogeny from that explained by other traits such as age or tissue of origin, authors performed unsupervised clustering of all the CpGs (in all species and all tissues studied) according to their co-variation. CpGs that gained or lost methylation in a coordinated manner across many samples were grouped together into modules. Authors then looked for associations between these modules and a range of features (including

species traits such as taxonomy or life span and individual traits such as age, sex, or body size). As expected, many CpG modules had methylation patterns that were specific to a taxonomic group. However, other modules included groups of CpGs whose methylation status was enough to discriminate the organ or sex of the sample, regardless of species.

Several CpG modules were associated with life span. Variation in DNA methylation in these genomic regions explained, to some extent, differences in life span across species. This finding is linked to the discovery that, as humans and mice age, DNA methylation changes in many genomic regions. This has allowed the construction of so-called “epigenetic clocks,” which are mathematical models that enable the prediction of biological age on the basis of methylation status of specific CpGs. Because the relative onset of aging could be a major factor in determining species maximum life span, identifying CpG modules that are linked to cross-species variation in life span might identify gene-regulatory events that are responsible for differential aging processes in mammals.

Among the genomic regions that were associated with life-span variation, some were predicted to be regulated by transcription factors (TFs) important for pluripotency (i.e., the ability to differentiate into any cell-type). These pluripotency factors encode proteins, such as octamer-binding protein-4 (OCT4) or SRY-box transcription factor-2 (SOX2), whose expression can revert an adult differentiated cell to an embryonic-like cell. OCT4 and SOX2 belong to a group of transcription factors known as the Yamanaka factors, the experimental reactivation of which decreases markers of aging in mice. Authors found that experimental re-expression of the Yamanaka factors in adult mice affected the methylation status of some CpG modules associated with lifespan variation. Therefore, regulation of these factors across the life of mammals might drive different life spans — with some species expressing them for longer.

In conclusion, this amazing study shows that DNA methylation can be a powerful biomarker across mammals. Furthermore, this study is an example of one of the directions as to where the field of evolutionary genomics is headed…!! 😊😊

DwN

Science 11 Aug 2023; 381: eabq5693 (text 15 pages) + editorial, pp 602-603

Posted in Center for Environmental Genetics | Comments Off on A Mammalian DNA Methylation Landscape

Power of inclusion: Enhancing polygenic prediction with admixed individuals (simplified)

I apologize. Yesterday’s GEITP blog was regarded by some “as a bit difficult to understand” (i.e., “more basic background” is needed, please). So, here goes:

For more than two decades, genome-wide association studies (GWASs) have unequivocally shown that common complex disorders have a polygenic genetic architecture — which has allowed researchers to identify genetic variants (changes in DNA sequence) that are associated with specific diseases. For some traits, dozens or hundreds of single-nucleotide variants (SNVs) have been found to be associated. The “winner” to date is the trait (phenotype) for HEIGHT in which 12,111 SNVs are involved…!!

Many traits can be dissected by GWAS studies, and the hope is that the discovery of unexpected genes might help explain etiology or improve treatment. An intriguing (gene-environment) example I received today is a genetic test to forewarn the physician (and Parkinsonian patient) that dopamine agonists (in a subgroup of patients) can cause an unwanted adverse reaction, ICD (i.e., who wants to treat a horrible disease like PD, by giving a drug that makes things worse??):

Impulse control disorders (ICDs) often appear in people with Parkinson disease (PD), specifically those treated with a class of drugs called dopamine agonists. Newly published research, funded in part by The Michael J. Fox Foundation (MJFF), suggests genetic data can help provide warnings to those at the highest risk. If doctors are able to assess ICD risk consistently, it would help them warn people about ICDs and personalize treatments to minimize that risk.

Currently, doctors often use dopamine agonists to treat Parkinson’s disease. These agonists stimulate activity when binding with dopamine receptors, which can help alleviate Parkinson’s symptoms like motor challenges. However, the rise in use of dopamine agonists has caused ICDs to appear more commonly.

Knowing a person’s risk for developing an impulse control disorder can help chart their treatment path. For example, a doctor might choose a dopamine replacement like levodopa if their patient is at high risk for an ICD, while they might choose a dopamine agonist (which mimics, rather than replacing) for someone with a lower risk.

The authors of a paper recently published in the Annals of Clinical and Translational Neurology, led by a team at the University of Pennsylvania, say they can now use genetic data (along with other risk factors) to determine a person with Parkinson’s risk of developing ICDs. Knowing that risk allows for more individualized approaches (“precision medicine”) to their treatment — such as substituting dopamine agonists with dopamine replacements.

Taking all variants (DNA nucleotide changes) in each individual patient’s whole genome — can further be combined into a polygenic risk score that captures part of an individual’s susceptibility to come down with a specific disease. PRSs have been widely applied in research studies, confirming the association between the scores and disease status, but their clinical utility has yet to be established. Polygenic risk scores may be used to estimate an individual’s lifetime genetic risk of disease, but the current discriminative ability is low in the general population.

Clinical implementation of PRSs may be useful in cohorts (the larger the N of genomes, the better) where there is a higher prior probability of disease (e.g., in early stages of diseases to assist in diagnosis or to inform treatment choices). Important considerations are the weaker evidence base in application to non-European ancestry and the challenges in translating an individual’s PRS from a percentile of a normal distribution to a lifetime disease risk. In the attached review, it was confusing that the authors used “polygenic scores” (PGSs) instead of “polygenic risk scores” (PRSs), But the authors emphasized that larger numbers of non-European samples, and authors demonstrated by simulation that larger numbers of “admixed” individuals (two or more ethnicities in the same person, which is becoming increasingly common these days) — will increase the power of statistical correlations (the larger the N of admixed genomes, the better).

DwN

From: Nebert, Daniel (nebertdw)
Sent: Wednesday, January 17, 2024 4:29 PM

Polygenic scores (PGSs) are used for combining genetic effects into the individual-level genetic liability of diseases or non-disease traits (e.g., risk of type-2 diabetes or schizophrenia; risk of lung cancer as a function of cigarettes smoked, or skin cancer as a function of arsenic exposure in drinking water). PGSs have attracted substantial research interest — as a result of the recent expansion of genotyped cohort sample sizes, increased appreciation of the polygenicity of complex traits, and recent methodological innovations and advances in PGS training. For some traits, the predictive performance has improved the potential clinical relevance of PGS.

However, most PGS models suffer from limited transferability across populations — despite the fact that some complex traits manifest substantial trans-ancestry genetic correlation (i.e., correlations of genes across ethnic groups). The limited transferability is partly due to the underrepresentation of non-European individuals in genetic studies and results in delaying the realization of equitable healthcare benefits from advancements in genetic research.

Several efforts are underway to improve the transferability of PGS models. First, active recruitment of non-European individuals in genetic studies, along with global partnerships and capacity building, are significantly increasing. However, most genome-wide association study (GWAS) cohorts have not yet comprehended the vast diversity that proportionally represents global populations. Second, the development of computational methods can complement these efforts and provide immediate benefits to individuals of diverse ancestry groups. Existing efforts include performing PGS modeling — by prioritizing variants present in diverse populations, and cell-type-specific regulatory elements — and combining multiple polygenic predictors characterized for multiple ancestry groups.

Admixed individuals (whose genomes consist of haplotypes from more than one ancestry group and account for one in seven newborns in the U.S.) are often excluded in PGS model training, given the technical limitations. Most modern PGS methods apply Bayesian multivariate regression by including GWAS summary statistics and ancestry-matched linkage disequilibrium (LD) reference panels. Although methods of applying GWAS analysis to admixed individuals exist, dependencies on the LD reference panels and computational complexities in representing LD for admixed individuals present challenges in the estimation of variant-effect sizes in PGS modeling.

However, including admixed individuals offers valuable insights into the genomic basis of common complex traits. A recent study indicates that the individual-level PGS performance shows linear decay as a function of genomic distance — defined as the Euclidean distance on the genotype principle-component analyses (PCA) projection from the PGS training set; this highlights the importance of considering the continuum of genomic ancestry in PGS evaluation. Given the substantial trans-ancestry genetic correlation in some complex traits, one might expect that admixed individuals can also offer unique opportunities to train PGS models with improved transferability.

Authors [see attached pdf] presented inclusive PGS (iPGS) — which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data. This approach is naturally applicable to admixed individuals. Authors validated their approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to N = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans (by 48.9%) on average across 60 quantitative traits and up to 50-fold improvements for some traits (e.g., “neutrophil count”, R2 = 0.058) over the baseline model trained on the same number of European individuals.

When authors allowed iPGS to use N = 284,661 individuals, they observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for “other individuals”. Authors further developed iPGS + refit — to jointly model the ancestry-shared and ancestry-dependent genetic effects when heterogeneous genetic associations were present. For “neutrophil count”, for example, iPGS + refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group(!!) (R2 = 0.090 in the iPGS model) — even though only 1.49% of individuals used in the iPGS training are of African ancestry. Authors declared that their data shows the power of including diverse individuals for developing more equitable PGS models. 😊

DwN

Am J Hum Genet 2 Nov 2023; 110, 1888–1902

Posted in Center for Environmental Genetics | Comments Off on Power of inclusion: Enhancing polygenic prediction with admixed individuals (simplified)

Power of inclusion: Enhancing polygenic prediction with admixed individuals

Polygenic scores (PGSs) are used for combining genetic effects into the individual-level genetic liability of diseases or non-disease traits (e.g., risk of type-2 diabetes or schizophrenia; risk of lung cancer as a function of cigarettes smoked, or skin cancer as a function of arsenic exposure in drinking water). PGSs have attracted substantial research interest — as a result of the recent expansion of genotyped cohort sample sizes, increased appreciation of the polygenicity of complex traits, and recent methodological innovations and advances in PGS training. For some traits, the predictive performance has improved the potential clinical relevance of PGS.

However, most PGS models suffer from limited transferability across populations — despite the fact that some complex traits manifest substantial trans-ancestry genetic correlation (i.e., correlations of genes across ethnic groups). The limited transferability is partly due to the underrepresentation of non-European individuals in genetic studies and results in delaying the realization of equitable healthcare benefits from advancements in genetic research.

Several efforts are underway to improve the transferability of PGS models. First, active recruitment of non-European individuals in genetic studies, along with global partnerships and capacity building, are significantly increasing. However, most genome-wide association study (GWAS) cohorts have not yet comprehended the vast diversity that proportionally represents global populations. Second, the development of computational methods can complement these efforts and provide immediate benefits to individuals of diverse ancestry groups. Existing efforts include performing PGS modeling — by prioritizing variants present in diverse populations, and cell-type-specific regulatory elements — and combining multiple polygenic predictors characterized for multiple ancestry groups.

Admixed individuals (whose genomes consist of haplotypes from more than one ancestry group and account for one in seven newborns in the U.S.) are often excluded in PGS model training, given the technical limitations. Most modern PGS methods apply Bayesian multivariate regression by including GWAS summary statistics and ancestry-matched linkage disequilibrium (LD) reference panels. Although methods of applying GWAS analysis to admixed individuals exist, dependencies on the LD reference panels and computational complexities in representing LD for admixed individuals present challenges in the estimation of variant-effect sizes in PGS modeling.

However, including admixed individuals offers valuable insights into the genomic basis of common complex traits. A recent study indicates that the individual-level PGS performance shows linear decay as a function of genomic distance — defined as the Euclidean distance on the genotype principle-component analyses (PCA) projection from the PGS training set; this highlights the importance of considering the continuum of genomic ancestry in PGS evaluation. Given the substantial trans-ancestry genetic correlation in some complex traits, one might expect that admixed individuals can also offer unique opportunities to train PGS models with improved transferability.

Authors [see attached pdf] presented inclusive PGS (iPGS) — which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data. This approach is naturally applicable to admixed individuals. Authors validated their approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to N = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans (by 48.9%) on average across 60 quantitative traits and up to 50-fold improvements for some traits (e.g., “neutrophil count”, R2 = 0.058) over the baseline model trained on the same number of European individuals.

When authors allowed iPGS to use N = 284,661 individuals, they observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for “other individuals”. Authors further developed iPGS + refit — to jointly model the ancestry-shared and ancestry-dependent genetic effects when heterogeneous genetic associations were present. For “neutrophil count”, for example, iPGS + refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group(!!) (R2 = 0.090 in the iPGS model) — even though only 1.49% of individuals used in the iPGS training are of African ancestry. Authors declared that their data shows the power of including diverse individuals for developing more equitable PGS models. 😊

DwN

Am J Hum Genet 2 Nov 2023; 110, 1888–1902

Posted in Center for Environmental Genetics | Comments Off on Power of inclusion: Enhancing polygenic prediction with admixed individuals

**HGNC Newsletter** Autumn 2023

The Human Genome Organization (HUGO) was conceived in 1988 at the first meeting on genome mapping and sequencing at Cold Spring Harbor. Its original purpose was to promote international collaborative efforts to study the human genome and to address the myriad issues raised by knowledge of the genome — including ethical and societal questions and issues involving nomenclature. Beginning with 42 scientists from 17 countries, HUGO has increased its membership base today to more than 1,200 members from 69 countries.
In 2008, HUGO passed its 20th anniversary and decided on a change in its direction. With the original goal of sequencing the human genome accomplished, HUGO decided to focus on two outstanding issues: First, HUGO will explore the medical implications of genomic knowledge ( i.e., to seek the biological and medical meaning of genomic information — genomic medicine); and second, to enhance the genomic capabilities and to help fulfill the genomic aspirations of the emerging scientific countries of the world. The excitement and interest in genomic sciences in Asia, Latin America, the Middle East, and Africa are palpable; and the hope is that these technologies will help in national development and health, worldwide.
So, it is in these two areas in which HUGO will focus on over the ensuing years: the expansion of genomic medicine and greater engagement with the emerging scientific countries. This also includes the HUGO Gene Nomenclature Committee (HGNC), which details their progress in reports four times a year. Instead of stripping-and-pasting into an email (as I’ve always done), it is now more convenient to simply provide the URL, and interested GEITP’ers can click on it and learn the latest in gene nomenclature, if they so wish. Please click on the URL [below] to see the Autumn 2023 issue.
DwN

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblog.genenames.org%2Fhgnc%2F2023%2F11%2F23%2FAutumn_newsletter_2023&data=05%7C01%7CNEBERTDW%40UCMAIL.UC.EDU%7C7a1ffa186fde45a6a22908dbf0282a90%7Cf5222e6c5fc648eb8f0373db18203b63%7C0%7C0%7C638367827532663382%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=36eKkPny3c2IjQ4bUHNXXfe5g3YaU%2BZCwEK%2FY8R373o%3D&reserved=0

If you have questions or comments on our newsletter or on any human gene nomenclature issue, please email us at: hgnc@genenames.org

————————————————————————-
HUGO Gene Nomenclature Committee (HGNC)
European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Genome Campus Hinxton, Cambridgeshire
CB10 1SD, UK

Posted in Center for Environmental Genetics | Comments Off on **HGNC Newsletter** Autumn 2023