Cancer Featured Studies

FEATURED STUDY: A New Data Science Approach for Prioritizing Cancer Risks

New methodology improves cancer hazard prioritization using an integrated approach of database fusion and text mining

New methodology improves cancer hazard prioritization using an integrated approach of database fusion and text mining

Research by

Dinesh Kumar Barupal, PhD, Assistant Professor, Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, NY, USA


Dr. Dinesh Kumar Barupal

The International Agency for Research on Cancer (IARC) based in Lyon, France, is the cancer research branch of the World Health Organization. The agency provides public reports on environmental factors that are carcinogenic hazards to human health, such as chemical and non-chemical agents (biological agents, pharmaceuticals, complex mixtures, occupational exposures, and other exposures of everyday life). These reports, also called IARC Monographs, guide cancer prevention research and policies around the world.

Generating these reports is a multi-stage, complex, and systematic process of agent nomination, prioritization and evidence evaluation. Consequently, there is a need to develop new strategies to mine and summarize the vast volume of literature and chemical data available on cancer research for IARC Monographs evaluations.


In collaboration with IARC researchers, Dinesh K. Balupal, PhD, and his team at the Institute for Exposomic Research Laboratories at the Icahn School of Medicine at Mount Sinai, developed an innovative approach for searching and analyzing global literature and chemical databases with the aim of prioritizing potential carcinogenic agents for evaluation.

”We used advanced techniques such as chemoinformatics – a combination of physical chemistry theory with computer and information science techniques, text mining, and database fusion to integrate literature data on cancer epidemiology and mechanistic evidence about carcinogenicity” says Dr. Barupal. He and his collaborators grouped and ranked literature pertaining to 119 agents prioritized for evaluation by the IARC Monographs during 2020-24. These researchers have published this prioritization approach has been made available at

What does it mean?

To lower the cancer burden using prevention strategies, it is essential to identify cancer hazards. Dr. Barupal’s novel database fusion and chemoinformatics approach provides new insights into evidence on human cancer and key characteristics of carcinogens. This strategy provides a logical framework for the IARC and other cancer hazard identification programs to prioritize what agents should be evaluated. This new methodology will allow scientists to identify agents that needs further studies and evaluations for their carcinogenicity.

An overview of IARC Monographs agent identification and selection process

Publication link :
Citation : Barupal DK, Schubauer-Berigan MK, Korenjak M, Zavadil J, Guyton KZ, Prioritizing cancer hazard assessments for IARC Monographs using an integrated approach of database fusion and text mining Environ Int, Published online 10 May 2021;