Genetics Defense Seminar of Sahra Uygun

“Using Transcriptome and Data Science Methods to Uncover Gene Regulatory and Functional Information” by Sahra Uygun

January 25, 2017; 1425 BPS; 1:00 PM-2:00 PM

Committee Members:
Dr. Shin-Han Shiu, Dr. Robert Last, Dr. Jin Chen, Dr. Christina Chan


There are still genomic regions with unknown function even in model organisms. These genomic regions include protein-coding genes and regulatory elements that are key components of transcriptional regulation. With technological advances, more biological data are being generated including spatial, temporal, developmental, and conditional gene expression data. Gene expression data, and specifically co-expression analyses have been widely used to predict gene function through guilt-by association. However, it remains to be seen to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the choice of gene expression dataset and clustering algorithms impact inferences about gene functions. To answer these questions, I evaluated the best practices in using co-expression data to identify novel genes that function in a biological process, and the impact of different clustering algorithms on the ability to identify genes that function in the same pathway. Gene co-expression analyses can also be used to identify the putative cis-regulatory elements that are over-represented in co-expressed gene promoters and build models of gene regulation under changing environments. Genome-wide models of how different organ and cell type gene expression is regulated under changing environments have not yet been built in plants. I used Arabidopsis thaliana organ and cell type stress responsive gene expression data and co-expression clusters to identify putative cis-regulatory elements. Using these elements and machine learning models, I predicted high salinity responsive gene expression in shoots, roots and six root cell types. I found that plant organ and cell type transcriptional response to high salinity is likely regulated by a core set of elements that we identified and built predictive models of plant spatial transcriptional responses to environmental stress. Overall, this research contributes to understanding the role of “big data” in biology, provides guidelines for effectively using gene co-expression in functional associations and shows how computational approaches help in identifying gene regulatory information.