We build tools to meet the challenges of analyzing genomic data at scale.
For the most part, current genomic analysis practices were developed when only a small number of samples were being collected and sequenced. Today, with dramatically increasing sequencing capacity resulting in hundreds of thousands of sequenced genomes worldwide, these methods are not efficient or optimized for the questions we want to ask of large genomic data sets. Spiral's products have been developed to address the unique challenges of storing and comparing large datasets, and doing so over time as references and variant calling algorithms change.
BioGraph™ technology addresses the unique challenge of comparing large genomic datasets. The volume of raw read data and the need to reanalyze the data over time is computationally expensive. BioGraph converts read data into a highly efficient graph structure making it possible to quickly query as genomic references or variant detection algorithms change.
Structural variants are becoming increasingly important as research expands to populations that are not well represented by the "reference genome". Popular open-source variant callers have false discovery rates (FDRs) as high as 40% making it difficult to compare variants. BioGraph Assembly harmonizes data from different callers without requiring reanalysis and has a 3% FDR.
Most large sequencing projects want to store the original BAM files for possible future reanalysis. However, these files take up a lot of space. The Spiral Encrypted Compression (SpEC) is a lossless compression format that reduces file sizes up to 60% and can easily be converted back into BAM files containing records MD5Sum identical to the original BAM file.