"Using cutting-edge computational methods, research and discoveries will become far cheaper, less time consuming and more powerful"

- Dr Joshua Ho

Dr Joshua Ho

Acting Head, Molecular, Structural & Computational Division

research overview

Key Research Areas

  • Bioinformatics
  • Computational Systems Biology
  • Cardiovascular Genomics

Research Overview

Bioinformatics is the application of computer science, mathematics and statistics to understand how complex biological systems work. We can now solve previously unthinkable biological questions by analysing a large amount of genetic and molecular data using cutting edge computational methods, such as big data technology, artificial intelligence, computational linguistics, software testing and cloud computing technology.

This laboratory develops fast and accurate bioinformatics methods and software to tackle longstanding problems in basic and translational medicine. Through bioinformatics our scientists are able to rapidly sift through a large amount of genetic data using advanced computational technology. Which means research and discoveries will be far cheaper, less time consuming and more powerful.

research projects

There are 8 key projects underway in the Bioinformatics & Systems Laboratory, led by Dr Joshua Ho;

1. Bioinformatics algorithms for single cell RNA-seq analysis

Single-cell RNA sequencing (scRNA-Seq) enables researchers to study heterogeneity among tens of thousands of individual cells and define cell types from a transcriptomic perspective. scRNA-Seq offers a means to precisely quantify the state of individual cells, enabling the high resolution mapping of cell cycle progression, cell differentiation and other trajectories. However, fast and reliable analysis of these large and noisy data requires new statistical and computational considerations. In this project we will develop cutting-edge bioinformatics methods to analyse a range of scRNA-seq data to answer important biological questions.

2. Integrative metabolomic data analysis

Analysis of high throughput mass spectrometry-based metabolomic data is challenging because of the difficulty in accurate and fast identification of metabolites. It has been found that integration of other omic data, such as genomic, transcriptomic and proteomic data, can help metabolite identification in a metabolomics analysis. In this project, we will develop a fast integrative bioinformatics pipeline for metabolomic data analysis. 

3. Scalable 3D virtual reality visualisation of biological data

Visualisation of biological data is critical in analysis and interpretation of large omic data, such as single-cell RNA-seq data. In this project, we will use state-of-the-art virtual reality technology to construct effective and scalable 3D visualisation of various omic data. 

4. A cloud-based approach for incorporating scalability in genome informatics

The advent of Next Generation Sequencing (NGS) is transforming the landscape of biomedical research ranging from disease gene discovery to clinical application of genomic medicine. NGS enables low-cost, high-throughput sequencing for a wide variety of genome-wide scale analysis of the genome, epigenome and the transcriptome. However, with this vast quantity of data, we are faced with unprecedented technical challenges in terms of computational analysis and storage of these data. Our goal of this research project is to investigate the use of cloud-based big data technologies to deal with these challenges. In particular, we plan to utilise the unique strengths of cloud technology – adaptive scalability and remote distributed data storage – to overcome the technical challenges. We will develop new bioinformatics pipelines, and apply it to two cutting edge applications: (i) single-cell transcriptomic analysis, and (ii) disease gene discovery using whole genome sequencing data.

5. Bioinformatics software testing

Many bioinformatics programs have large input data (e.g., gigabyte-sized sequence data) and often implement sophisticated computational procedures (e.g., network simulation, string matching, machine learning, and combinatorial optimization). As a result, it is difficult to systematically test the correctness of these programs beyond the use of a few trivial test cases. Most of the faults in the programs are very difficult to detect, but once occur, may lead to incorrect biological conclusion or the design of a misguided follow-up experiment. This project will develop tools to help bioinformaticians to perform systematic software testing. We observed that many practicing bioinformaticians lack proper software testing training, and their programs are often not subjected to sufficient testing. One immediate goal of this project is to develop a software package that will help bioinformatics program developer to design, execute and report test cases. This project will fill an important need in bioinformatics that has not been fully addressed previously.

6. Causal disease mutation identification in whole genome sequencing data

Whole genome sequencing is now highly cost-effective. It is possible to identify sequence or structural variants in the genome of an individual within weeks. This has open up enormous possibilities for personalized genomic medicine and the identification of causal genes of both rare and common diseases. Nonetheless, while a large number of sequence or structural variants can be identified in each individual, it is often difficult to pin-point the disease causing genetic mutation. In this project, we will develop fast and accurate bioinformatics programs to integrate diverse functional genomic data to prioritise likely causal mutations that underlie a disease. This project will utilise state-of-the-art machine learning and statistical approaches.

7. Systems developmental biology of mammalian organ formation and congenital disease

Many organs form via intercellular exchange of signaling molecules and an intracellular network of transcription factors. These interactions can be summarised as a gene regulatory network (GRN). In this project, we will use cutting-edge machine learning, text mining, and statistical approaches to reconstruct the mammalian GRNs, with the goal of discovering the regulatory pathways in embryonic organ formation and congenital diseases, such as congenital heart disease.

8. Decoding the language of life

It is curious that stimulation of the same signalling pathway (e.g., Wnt pathway) can often lead to expression of different genes in different cell types (e.g., embryonic stem cells vs. differentiated intestinal cells). Recent findings based on genome-wide chromatin analysis (e.g., ChIP-chip/ChIP-seq) suggested that both the chromatin environment and DNA sequencing composition play an important role in genomic targeting of transcription factors, opening up the possibility we could learn a grammar to describe and predict cell-type specific signalling response. To test this hypothesis, this project will compile and perform meta-analysis on published genome-wide datasets (ChIP-chip, ChIP-seq, DNase-seq, and RNA-seq from ENCODE/modENCODE consortia for example) as well as in-house data generated by local and international collaborators. We will adapt advanced methods from computational linguistics and machine learning to build biologically meaningful models for signaling-responsive transcription factor binding in mammalian cells.

laboratory members & collaborators


David Humphreys, Senior Research Scientist

Djordje Djordjevic, PhD student

Tomasz Szczesnik, PhD Student

Xin Wang, PhD Student

Andrian Yang, PhD Student

Michael Troup, Research Assistant 

Joseph Godbehere, Research Assistant 


Dr Richard Sherwood, Harvard Medical School

Dr Koon Ho Wong, University of Macau

Dr Michael O’Connor, Western Sydney University

Professor Kenro Kusumi, Arizona State University

Professor T. Y. Chen, Swinburne University of Technology

publication highlights

1. Lin P, Troup MHo JWK (2017) CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biology, 18, 59  

2. Yang P^, Oldfield A, Kim T, Yang A, Yang JYH, Ho JWK^ (2017) Integrative analysis identifies co-dependent gene expression regulation of BRG1 and CHD7 at distal regulatory sites in embryonic stem cellsBioinformatics (in press)

3. Szot PS, Yang A, Wang X, Parsania C, Röhm W, Wong KH, Ho JWK (2017) PBrowse: A web-based platform for real-time collaborative exploration of genomic dataNucleic Acids Research (in press)

4. Yang A, Troup M, Lin PHo JWK (2017) Falco: A quick and flexible single-cell RNA-seq processing framework on the cloudBioinformatics, 33(5), 767-769

5. Djordjevic D, Kusumi K, Ho JWK (2016) XGSA: A statistical method for cross-species gene set analysisBioinformatics, 32(17), i620-i628

6. Troup M, Yang A, Kamali AH, Giannoulatou E, Chen TY, Ho JWK (2016) A cloud-based framework for applying metamorphic testing to a bioinformatics pipeline. In Proceedings of the IEEE/ACM 1st International Workshop on Metamorphic Testing, 33-36

7. Kamali AH, Giannoulatou E, Chen TY, Charleston MA, McEwan AL, Ho JWK (2015) How to test bioinformatics software? Biophysical Reviews 7(3), 343-352

8. Djordjevic D, Deshpande V, Szczesnik T, Yang A, Humphreys DT, Giannoulatou EHo JWK (2015) Decoding the complex genetic causes of heart diseases using systems biologyBiophysical Reviews 7, 141-159

9. Djordjevic D, Yang A, Zadoorian A, Rungrugeecharoen KHo JWK (2014) How difficult is inference of mammalian causal gene regulatory networks? PLoS One, 9(11), e111661

10. Giannoulatou E, Park SH, Humphreys DTHo JWK (2014) Verification and validation of bioinformatics software without a gold standard: A case study of BWA and Bowtie. BMC Bioinformatics, 15(Suppl 16), S15

11. Ho JWK*, Jung YL*, Liu T* et al. (2014) Comparative analysis of metazoan chromatin organizationNature, 512(7515), 449-52

12. O'Connell DJ*, Ho JWK*, Mammoto T, Turbe-Doan A, O'Connell JT, Haseley PS, Koo S, Kamiya N, Ingber DE, Park PJ, Maas RL (2012) A Wnt-Bmp feedback circuit controls intertissuesignaling dynamics in tooth organogenesisScience Signaling, 5, ra4

13. Ho JWK, Bishop E, Kharchenko PV, Negre N, White K, Park PJ (2011) ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysisBMC Genomics, 12, 134 

14. Ho JWK, Stefani M, dos Remedios CG, Charleston MA (2008) Differential variability analysis of gene expression and its application to human diseasesBioinformatics, 24, i390-i398

Acknowledgement of Country

The Victor Chang Cardiac Research Institute acknowledges the traditional custodians of the land, the Gadigal of the Eora nation, on which we meet, work, and discover.
Our Western Australian laboratories pay their respect to the Whadjuk Noongar who remain as the spiritual and cultural custodians of their land.