Professor Serafim Batzoglou
The ScienceDaily article New Gene Prediction Method Capitalizes On Multiple Genomes said
Researchers at Stanford University report in the online open access journal, Genome Biology, a new approach to computationally predicting the locations and structures of protein-coding genes in a genome. Gene finding remains an important problem in biology as scientists are still far from fully mapping the set of human genes. Furthermore, gene maps for other vertebrates, including important model organisms such as mouse, are much more incomplete than the human annotation. The new technique, known as CONTRAST (CONditionally TRAined Search for Transcripts), works by comparing a genome of interest to the genomes of several related species.
CONTRAST exploits the fact that the functional role protein-coding genes play a specific part within a cell and are therefore subjected to characteristic evolutionary pressures. For example, mutations that alter an important part of a protein’s structure are likely to be deleterious and thus selected against. On the other hand, mutations that preserve a protein’s amino acid sequence are normally well tolerated. Thus, protein-coding genes can be identified by searching a genome for regions that show evidence such patterns of selection. However, learning to recognize such patterns when more than two species are compared has proved difficult.
Serafim Batzoglou, Ph.D. is one of these scientists and is
Assistant Professor, Computer Science
Department, James H. Clark Center, Stanford University.
His research area is computational genomics. The broad goal of his
research is to develop efficient and accurate methodologies for the
analysis of genomic data.
In recent years, genomics has rapidly become one of the main ways in
which to study biology. Key milestones along this development were the
draft release of the human genome in 2000, the subsequent sequencing of
tens of vertebrate genomes including chimpanzee, mouse, rat, dog, and
chicken, with which the human genome can be compared, sequencing of tens
of fruit fly, yeast, plant, and hundreds of bacterial species, and the
availability of global datasets on the expression and interactions of
genes in human and key laboratory organisms.
To enable the
analysis of
such genomic data, drawing from string algorithms, combinatorial
optimization, machine learning, and efficient software development, he
built methods and systems for a range of bioinformatics problems:
sequence assembly, whole-genome multiple alignment, protein sequence
alignment, RNA structure prediction, gene finding, motif finding,
protein association network alignment, and population ancestry
inference on genotype data. All
his tools are publicly available as
source code and executables.
Serafim is on the Scientific Advisory Boards of
NextBio and 23andMe.
NextBio is a web-based scientific data search engine that offers instant
access, search and collaboration across a vast repository of life
sciences information.
Their query interface makes it easy to ask questions about genes,
pathways, study results, disease areas, compound treatments and
biomarkers, just to name a few.
23andMe is a privately held biotechnology company based in Mountain
View, California that is developing new ways to help people make
sense of their own genetic information. Google has invested $3.9M in
23andMe, whose cofounder Anne Wojcicki is married to Google cofounder
Sergey Brin.
Serafim coauthored
Effect of Genetic Divergence in Identifying Ancestral Origin using
HAPAA,
Automatic parameter learning for multiple network alignment,
CONTRAST: A Discriminative, Phylogeny-free Approach to
Multiple Informant De Novo Gene Prediction,
Current progress in network research: toward reference networks for
key model organisms,
Whole-genome sequencing and assembly with high-throughput short-read
technologies, and
Multiple alignment of protein sequences with repeats and
rearrangements.
Read the
full list of his publications!
He earned his B.S. in Mathematics from MIT in 1996, his B.S. in
Computer Science from MIT in 1996, his MEng. at MIT in 1996, and his
Ph.D. in Computer Science from MIT in 2000 with the
thesis
Computational Genomics: Mapping, Comparison, and Annotation of
Genomes.
Read
MIT Technology Review Magazine, 100 Top Young Technology Innovators
2003.