The Language of Science and the Science of Language: Unseen Barriers in Research

Exposome Perspectives Blog by Robert O. Wright, MD, MPH

An often forgotten aspect of science is communication. If a research finding is important, we should communicate its meaning and value to others in easy-to-understand terms. We often think about how scientists talk to the public, but what about how scientists talk to each other? The general public may not realize that there is no uniform terminology that scientists use to communicate with each other. I don’t suggest this is intentional, but it definitely happens. Let’s start with the use of English in science. Like it or not, all of the major scientific journals are published in English, the articles are written in English, and the authors are required to master English. A very large percentage of scientists are not native English speakers, however. Furthermore, we forget that language and culture are intertwined. The way we speak, our interactions with our readers, and our feelings are influenced by culture and language. As much as we want to believe science is just about truth and biology, no scientist can escape the influence of language and culture.

Science has a vast number of fields, each with their own language and terminology: genetics, environmental health, immunology, neurology, microbiology, metabolism, cardiology, physiology, molecular biology, to name a few. Much like world languages, the words and grammatical styles used in different scientific fields overlap. Psychology and psychiatry would be analogous to the difference between Italian and Spanish, while epidemiology and cell biology are as different as Mandarin and English. Nonetheless, just as different countries may have the same political system, different scientific disciplines may study the same diseases despite using dramatically different languages to do so.

Lost in translation

Take, for example, studies on the genetic risk factors for depression, versus studies on air pollution as a risk factor for depression. There are a number of studies in each area, but how do these findings relate to one another? It is after all, the same disease. Connections seem likely, yet attempts to connect findings from studies across disciplines are rare. While by no means the only barrier, lack of a common language plays a role. We all believe that genes and environment play key roles in human health but do the fields really talk to each other? We may think that two scientists, both working at the same institution, can communicate effortlessly about their work, but that is often not the case. Science is a series of different languages and scientists always use a language that is specific to their field.

Like many languages, the words used may even overlap but have slightly different meanings. Or the same words may be used to convey completely different concepts, leading to miscommunication and siloed research. The problem is heightened because, unlike a tourist visiting China who is aware that his or her language is different when asking for directions, an environmental epidemiologist speaking to a geneticist may not be aware that they are speaking different languages. Just like Spanish and Italian share many words, environmental medicine and genetics share many words as well, but the problem lies in different meanings for the same word. “Burro” means donkey in Spanish, and “butter” in Italian. Perhaps even more problematic, “gift” means a present or offering in English but it means “poison” in German.

Science is a Tower of Babel

In the Book of Genesis in the Bible, the Babylonians build a tower to reach God, who becomes dismayed at their arrogance and decides to stop the work by making them all speak in different languages in order to confuse them such that they can no longer work together effectively. Science is a Tower of Babel; as scientific disciplines evolve and expand, they develop distinct languages, grammar, and cultures. For example, the words “bias” and “unbiased” are used very differently in environmental epidemiology and genomics/genetics. Geneticists often describe their study as “unbiased” to convey the strength of a study and even to emphasize there is no hypothesis. The “unbiased” approach in genomics means that all parts of the genome are treated equally in the analysis of results. In the language of genomics, “bias” has nothing to do with how other information was collected, who is being studied, or how people were enrolled in a study.

In contrast, environmental epidemiologists use “bias” to convey the weaknesses of their study. They may write something like: “we found exposure “X” is associated with health outcome “Y”, but we can’t be sure that the results apply to people living elsewhere, how we selected people for the study, or how we applied the questionnaires.” Epidemiologists almost never describe their study as “unbiased” in any way, shape, or form. Sometimes the list of biases in an environmental epidemiology study gets so long, you wonder if the authors even believe their results. This is a major cultural difference in the two fields and perhaps explains in part why genomics research funding dwarves that of environmental health funding. If you were the funder, wouldn’t you be more impressed by a proposal that tells you it is “unbiased” as opposed to one that seems obsessed about not making a mistake? Genomics even uses the word “bias” to describe past knowledge of limited numbers of genes, and unbiased studies go beyond measuring those genes but measure thousands more. Bias is synonymous with “limited by past knowledge”.

Methods can be a barrier, too

A justification for the genomics approach is that because we understand the function of only a small fraction of our DNA, doing hypothesis testing on one genetic variant at a time would take centuries. To address this problem, scientists scan thousands to millions of genetic variations to more quickly identify variants that matter most to health. This kind of research is discovery driven and has no hypothesis. Once we have finished discovering all the relevant variants, we can then form testable hypotheses as we will have complete information on what the relevant biological factors were. In some cases, human traits, like impatience, hubris, and desperation to publish interfere, leading to the publication of discovery results as if a hypothesis was tested. Post-hoc analysis of a genomic, proteomic, transcriptomic, or even an exposomic study is akin to shooting an arrow into a wall then drawing a bullseye around the arrow. If you measure a million “things” whether they are genetic variants, chemicals, RNA sequences, or widgets, using a p-value of 0.05 as is common practice 5% (50,000) of those “things” will be statistically significant just by chance. In many cases, this “unbiased” genomic approach leads to the publication of false findings that can’t be reproduced and is biased by epidemiologic standards.

But that doesn’t mean hypothesis testing is superior. The flip side of the genomics approach has its own problems. Environmental health has steadfastly avoided –“omics” approaches to research, which is why exposomics is still in its infancy. Because we rely too much on hypothesis testing and too often fear that discovery research is biased, we tend to study the same thing over and over in a seemingly never-ending quest to make sure we haven’t made a mistake. We are still studying lead poisoning in 2022 after all, even though nearly no one will argue it is not toxic. There are thousands of studies on organophosphate pesticides that are barely used today and very few on new ones taking over the pesticide market. An over reliance on hypothesis testing has made environmental scientists stick to familiar topics instead of looking for the unknown.

We know there are thousands, perhaps even millions of chemicals in our bodies, many of them man-made, many of them sourced through food, water and air, yet we’ve only identified about 200-300 of them. The vast majority are unmeasured, uncatalogued, and forever sliding under our radar; we can’t possibly tackle them one at a time unless we become immortal. In the words of the late Donald Rumsfeld, there are “known unknowns”, although some of them may be “unknown unknowns”, or “unknown knowns”, or magic beans. In essence, we know there are millions of chemicals out there, but we don’t know what the vast majority of them do and most of the time we ignore them. The environmental literature may have fewer mistakes than the genomic literature, but it is also miniscule in comparison, and has enormous gaps in just measuring what is in our bodies, much less understanding what those chemicals do to our bodies.

Collaboration is Multi-Lingual

Perhaps what really needs to happen is that we learn each other’s perspectives and engage in cross-disciplinary collaboration. To do that, we either need a common language or researchers that are multi-lingual. I believe that common language would be “gene-environment interaction”. Exposomics is about applying some of the language of genomics to the environment, so we can discover what are the important environmental factors impacting our bodies. We also need genomics to test hypotheses on which environmental factors interact with different genetic variants. After all, nature and nurture aren’t babbling—they speak to each other constantly through time-varying complex equations and reactions that manifest as physiologic shifts, endocrine regulation, metabolism, movements, and thoughts and emotions that shape our lives.