“To make sense of a single patient’s genome, you need to put it in the context of many people’s genomes.”

-Daniel MacArthur

Quoted in The Atlantic, September 9, 2015


Introducing the BioGraph™ Engine

When you have a large genome project with next generation sequencing data, you want to spend your time understanding the biology, not just trying to compare groups of genomes to each other.

The BioGraph™ Sequencing Analysis Engine allows you to access, analyze and unify data from large scale genome sequencing projects. Our team of scientists, bioinformaticians and developers designed BioGraph from the ground up to solve genomic data mining at scale.

In the past, bioinformatics tools were built for the analysis of individual genomes. When we had more compute resources and time than we had samples, we built tools for each analysis. Today, the rate of data generation, combined with the need to reanalyze data over time, has created challenges that cannot be overcome with more computing resources. We need to analyze large populations while also understanding the genetics of each individual at a deeper level.

BioGraph is a novel bioinformatics approach that addresses the unique challenge of comparing large genomic datasets. BioGraph converts read data into a highly efficient graph structure making it possible to quickly query. The BioGraph engine facilitates the next level of genomic analysis, from large scale population level analysis, to structural variant calling, to a platform for machine learning.

How is BioGraph different?

What the BioGraph analysis engine provides is true simplicity and flexibility - solving the problem of organizing data to make it most useful for what you want to do.

BioGraph takes a fundamentally new approach, focusing on representing the raw reads while making as few assumptions as possible to make data universally harmonizable. Most file formats index data for query by reference position; the BioGraph Format indexes the reads by sequence. With this new representation, the data are uniform, making queries extremely fast. Given the amount you sequence, it doesn’t make sense to trade off between storage footprint, compute and quality of analysis. 

Want more details? See our publications and documentation.

The BioGraph engine facilitates the next level of genomic analysis, from large scale population level analysis, to structural variant calling, to a platform for machine learning.

Machine Learning Optimized

Machine learning benefits from uniform data representation. Start with a uniform set of variation for your machine learning study.

Population Graph Genomes

You create updatable population level graph genomes that can be quickly mapped to.  This way you can capture the variation in each individual and across an entire population - what you actually want from whole genome data.

Structural Variations

Structural variants are known to be associated with a range of disorders including neurological and cardiovascular conditions. Discovery of structural variants with accurate sequences allows you to dramatically reduce the time for analysis by making it possible to create a set of uniform calls across the individuals in the study.

Data Harmonization - The N+1 Problem

The BioGraph format inherently harmonizes data sets, making it easy to add new individuals without committing to specific reference genomes or variant callers.  This means the data only need to be converted once, saving time - and computational resources.


Ready to see something new in your data?