Lifeboat Foundation Bios: Professor Serafim Batzoglou

Professor Serafim Batzoglou

The ScienceDaily article New Gene Prediction Method Capitalizes On Multiple Genomes said

“Researchers at Stanford University report in the online open access journal, Genome Biology, a new approach to computationally predicting the locations and structures of protein-coding genes in a genome. Gene finding remains an important problem in biology as scientists are still far from fully mapping the set of human genes. Furthermore, gene maps for other vertebrates, including important model organisms such as mouse, are much more incomplete than the human annotation. The new technique, known as CONTRAST (CONditionally TRAined Search for Transcripts), works by comparing a genome of interest to the genomes of several related species.

CONTRAST exploits the fact that the functional role protein-coding genes play a specific part within a cell and are therefore subjected to characteristic evolutionary pressures. For example, mutations that alter an important part of a protein’s structure are likely to be deleterious and thus selected against. On the other hand, mutations that preserve a protein’s amino acid sequence are normally well tolerated. Thus, protein-coding genes can be identified by searching a genome for regions that show evidence such patterns of selection. However, learning to recognize such patterns when more than two species are compared has proved difficult.”

Serafim Batzoglou, Ph.D. is one of these scientists and is Assistant Professor, Computer Science Department, James H. Clark Center, Stanford University. His research area is computational genomics. The broad goal of his research is to develop efficient and accurate methodologies for the analysis of genomic data.

In recent years, genomics has rapidly become one of the main ways in which to study biology. Key milestones along this development were the draft release of the human genome in 2000, the subsequent sequencing of tens of vertebrate genomes including chimpanzee, mouse, rat, dog, and chicken, with which the human genome can be compared, sequencing of tens of fruit fly, yeast, plant, and hundreds of bacterial species, and the availability of global datasets on the expression and interactions of genes in human and key laboratory organisms.

To enable the analysis of such genomic data, drawing from string algorithms, combinatorial optimization, machine learning, and efficient software development, he built methods and systems for a range of bioinformatics problems: sequence assembly, whole-genome multiple alignment, protein sequence alignment, RNA structure prediction, gene finding, motif finding, protein association network alignment, and population ancestry inference on genotype data. All his tools are publicly available as source code and executables.

Serafim is on the Scientific Advisory Boards of NextBio and 23andMe. NextBio is a web-based scientific data search engine that offers instant access, search and collaboration across a vast repository of life sciences information. Their query interface makes it easy to ask questions about genes, pathways, study results, disease areas, compound treatments and biomarkers, just to name a few. 23andMe is a privately held biotechnology company based in Mountain View, California that is developing new ways to help people make sense of their own genetic information. Google has invested $3.9M in 23andMe, whose cofounder Anne Wojcicki is married to Google cofounder Sergey Brin.

Serafim coauthored Effect of Genetic Divergence in Identifying Ancestral Origin using HAPAA, Automatic parameter learning for multiple network alignment, CONTRAST: A Discriminative, Phylogeny-free Approach to Multiple Informant De Novo Gene Prediction, Current progress in network research: toward reference networks for key model organisms, Whole-genome sequencing and assembly with high-throughput short-read technologies, and Multiple alignment of protein sequences with repeats and rearrangements. Read the full list of his publications!

He earned his B.S. in Mathematics from MIT in 1996, his B.S. in Computer Science from MIT in 1996, his MEng. at MIT in 1996, and his Ph.D. in Computer Science from MIT in 2000 with the thesis Computational Genomics: Mapping, Comparison, and Annotation of Genomes.

Read MIT Technology Review Magazine, 100 Top Young Technology Innovators 2003.