A yellow-green cover with pink accents


Tohdai-shiki Seimei Data Science Sokusenryoku Kouza (University of Tokyo Data Scientist Training / Education Program - R & Python toolbox for writing papers on large-scale data analysis, from genomes, epigenomes, and transcriptomes to single cells)


DSTEP Teaching Material Creation Committee Member (ed.)


344 pages, AB format




November 29, 2021



Published by


See Book Availability at Library

Tohdai-shiki Seimei Data Science Sokusenryoku Kouza

Japanese Page

view japanese page

This book is a book version of the contents of the bioprogramming exercise in the Data Scientist Training / Education Program “DSTEP” (http://dstep.cbms.k.u-tokyo.ac.jp/) offered by the University of Tokyo Graduate School of Medicine and Graduate School of Frontier Sciences. DSTEP was established for the purpose of cultivating human resources who could quickly respond to the challenges of large-scale biodata such as genome analysis in recent years in cooperation with pharmaceutical companies, biotech companies, and others. The lack of human resources in this field is indeed becoming a serious issue.
It has been 20 years since the first complete sequence of the human genome was announced. Following the emergence of next-generation sequencing technology 10 years ago, recent advances in biodata production technology have progressed at a rate that would have been unimaginable 20 years ago. Public and private databases are flooded with human genome data for hundreds of thousands and even millions of people. So-called “personalized medicine,” which relies on carcinogenic mutations (driver mutations) detected by precise sequence analysis of cancer genomes in order to formulate the optimal treatment strategy for each patient, has been implemented as a current medical practice and is covered by national health insurance. Furthermore, in the field of basic research, genome decoding technology has been diverted into multi-omics analysis such as gene expression analysis, with the analysis resolution even evolving to the single-cell level (i.e., single-cell analysis) in recent years. Further developments in this analysis have achieved a multi-omics analysis in a form that retains spatial information. For example, omics analysis at each spot of pathological image data is being conducted in order to elucidate various types of cancer evolution in cancer cells or interactions with immune cells. It is expected that the pace of the flow of large-scale data production and analysis will accelerate in the future. The rapid accumulation of data is dramatically expanding our understanding of molecular biology and basic medicine. In response, there have been increasing attempts to use artificial intelligence to create biological knowledge. If the essence of machine learning lies in the utilization of large amounts of accumulated data, then the time may come when a sufficient level of data would also drive a major transformation in biology. The production of biological data on an unprecedented scale and the deepening of its analysis are essential elements in order to fully understand humans, at least in the real world, and to promote health / treatment beyond humans. Many technological innovations are still underway toward this future. I believe that we are approaching a future in which large-scale, high-precision omics analysis on a scale sufficient for understanding human cell systems will be put into practice: the arrival of the “era of bio-data science” in its essential sense. In the present day, at the dawn of such an era, I hope that this book will support the reader’s first steps toward becoming a young researcher who will lead the future.

(Written by SUZUKI Yutaka, Professor, Graduate School of Frontier Sciences / 2022)

Related Info

Related Website:
the Data Scientist Training / Education Program “DSTEP” by the University of Tokyo Graduate School of Medicine and Graduate School of Frontier Sciences

Try these read-alike books: