Joshua Ho's Research Areas
All projects in this laboratory involve integrative analysis of diverse genome-wide datasets, especially next-generation sequencing (NGS) data such as RNA-seq, ChIP-seq, DNase-seq, and whole genome sequencing data. Multiple projects are available under each theme.
1. Systems developmental biology of mammalian organ formation.
Many organs form via intercellular exchange of signaling molecules and an intracellular network of transcription factors. These interactions can be summarised as a gene regulatory network (GRN). We recently reconstructed a GRN from more than 1,000 pieces of gene perturbation evidence and identified a feedback circuit associated with epithelial-mesenchymal signaling interactions during embryonic development of mouse molar tooth (O’Connell et al., 2012). In silico simulation suggests that the observed reciprocal tissue signaling interactions could be an intrinsic property of the circuit structure. This finding has significantly implication on our understanding of this important class of signaling interactions in organ formation/malformation. We are now extending this approach to study the development of other organs, such as salivary gland, pancreatic islet, ocular lens and heart valve.
2. Decoding the language of life
It is curious that stimulation of the same signalling pathway (e.g., Wnt pathway) can often lead to expression of different genes in different cell types (e.g., embryonic stem cells vs. differentiated intestinal cells). Recent findings based on genome-wide chromatin analysis (e.g., ChIP-chip/ChIP-seq) suggested that both the chromatin environment and DNA sequencing composition play an important role in genomic targeting of transcription factors, opening up the possibility we could learn a ‘grammar’ to describe and predict cell-type specific signalling response. To test this hypothesis, this project will compile and perform meta-analysis on published genome-wide datasets (ChIP-chip, ChIP-seq, DNase-seq, and RNA-seq from ENCODE/modENCODE consortia for example) as well as in-house data generated by local and international collaborators. We will adapt advanced methods from computational linguistics and machine learning to build biologically meaningful models for signaling-responsive transcription factor binding in mammalian cells.
3. Genome-wide chromatin landscape analysis of fungal epigenomes
Fungi have wide medical, agricultural and biotechnological relevance because of their abilities to cause diseases in humans and plants. Moreover, many fungi have long been used in the biotechnology (e.g., industrial enzyme productions) and food (e.g., wine, cheese, soy sauce fermentations) industries; and some fungi (e.g., mushrooms) are a valuable food source with high nutritional and medicinal values. To better understand fungal potentials and diseases, many representative fungal genomes have already been sequenced, and an on-going joint effort initiative aims to have 1000 fungal genomes sequenced over the next few years. Despite these efforts at the genomic level, information about the epigenomes for most fungal species is still largely uncharted. In collaboration with an international collaborator, we will study genome-wide chromatin landscape of several closely related medically, agriculturally and industrially important species by ChIP-seq. We will identify and systematically analyse the chromatin states in these species using advanced data mining and machine learning techniques and use these information to gain insight into different key physiologies of the species.
4. Causal disease mutation identification in whole genome sequencing data.
Whole genome sequencing is now highly cost-effective. It is possible to identify sequence or structural variants in the genome of an individual within weeks. This has open up enormous possibilities for personalized genomic medicine and the identification of causal genes of both rare and common diseases. Nonetheless, while a large number of sequence or structural variants can be identified in each individual, it is often difficult to pin-point the disease causing genetic mutation. In this project, we will develop a bioinformatic pipeline to integrate diverse functional genomic data to prioritise likely causal mutations that underlie a disease.
5. Bioinformatics software testing
Many bioinformatics programs have large input data (e.g., gigabyte-sized sequence data) and often implement sophisticated computational procedures (e.g., network simulation, string matching, machine learning, and combinatorial optimization). As a result, it is difficult to systematically test the correctness of these programs beyond the use of a few trivial test cases. Most of the faults in the programs are very difficult to detect, but once occur, may lead to incorrect biological conclusion or the design of a misguided follow-up experiment. This project will develop tools to help bioinformaticians to perform systematic software testing. We observed that many practicing bioinformaticians lack proper software testing training, and their programs are often not subjected to sufficient testing. One immediate goal of this project is to develop a software package that will help bioinformatics program developer to design, execute and report test cases. This project will fill an important need in bioinformatics that has not been fully addressed previously.
6. A cloud-based approach for incorporating scalability in genome informatics
The advent of Next Generation Sequencing (NGS) is transforming the landscape of biomedical research – ranging from disease gene discovery to clinical application of genomic medicine. NGS enables low-cost, high-throughput sequencing for a wide variety of genome-wide scale analysis of the genome, epigenome and the transcriptome. However, with this vast quantity of data, we are faced with unprecedented technical challenges in terms of computational analysis and storage of these data. Our goal of this research project is to investigate the use of cloud based technology to deal with these challenges. In particular, we plan to utilise the unique strengths of cloud technology – adaptive scalability and remote distributed data storage – to overcome the technical challenges. We will develop new bioinformatics pipelines, and apply it to two cutting edge applications: (i) single-cell transcriptomic analysis, and (ii) disease gene discovery using whole genome sequencing data
For postdoc/students/RA who wants to join this laboratory:
All projects require proficiency in at least one programming/scripting language (R, Perl, Python, Java, C++, C Matlab…) Familiarity with the Unix operating system is desirable but not required. Individual project can be tailored to fit each student’s personal interest and skill set. Most projects involve close interactions with local and international collaborators. This is a highly interdisciplinary laboratory. We welcome perspective group members from diverse background, such as medicine, biology, physics, computer science, mathematics, statistics, and engineering. Expression of interest, along with your CV, can be sent to Dr Ho: email@example.com
HEART BEAT BALL
ASK THE INSTITUTE
Do you have a question but cannot find the answer?ask the institute