Q&A: Why HPC is bioinformatics’ best friend

We interviewed Jorge Balcells, Director of Technical Services at Verne Global.

With more than 6.4 billion bases in a single person’s DNA, it’s easy to understand why data analytics is such a huge part of medical science. As data sets (and what we do with them) gets more complex, the scientific community are relying heavily on complex computing to achieve their advances.

We interviewed Jorge Balcells, Director of Technical Services at Verne Global – a 100 per cent renewably powered data centre campus in Iceland – to understand how high performance computing is changing the world of science and why the data centre industry is gravitating to the heart of modern bioinformatics.

What role does HPC have to play in the bioinformatics sector?

High performance computing (HPC) has always had a significant role to play within computational science. Its ability to accomplish compute-intensive tasks from molecular modelling to processing and analysing the DNA of animals and microbes, makes HPC the ideal solution for delivering complex, fast-paced data analytics and insights.

As data in this sector continues to grow – by 2025 it is predicted between 100 million and as many as 2 billion human genomes will have been sequenced – the effective application and management of HPC clusters is fast becoming a necessity. HPC is, and will continue to be, essential for the effective storage, processing, analysis and sharing of scientific data – crucial for unlocking the value of this data to the broader bioinformatics sector.

How can HPC contribute to the advancement of genome/life sciences research, in particular?

Genomic analyses are incredibly complex – often involving the comparison of new data against multiple large-scale external datasets, or integrating it with data from other sources (i.e. public records, other collaborators and research partners).

The resulting analyses generate massive amounts of data – anything from a few hundred gigabytes to several terabytes per run – which requires large computer processing and storage capacity to translate it into usable insights. HPC is the answer to doing so quickly and at scale, matching complex data and processing requirements with complex compute capabilities. Scientific advances in the genome space are therefore an inevitable result of the increasing focus on sequencing bioinformatics data in this way – as has been apparent in recent developments made by a number of medical centres and organisations.

Additionally, the uptake of HPC in the bio-IT world is also proving beneficial to medical and genome communities by delivering efficiencies on a number of other levels – from driving down data analysis costs to speeding up sequencing results, both of which are also indirectly assisting medical advances.

Is there a trend for ERLS (Education Research Life Sciences) organisations seeking out HPC facilities?

As more life sciences organisations turn to HPC to process their large data sets, demand is growing for scalable and secure data centre solutions that can deal with their HPC requirements.

For some institutes, the solution is within reaching distance – but as the pressure of increasing amounts of data and the need for higher compute capacity rises, even research centres with incredibly large computational platforms are feeling the strain.The result is a trend for organisations seeking out external data centre providers to help support HPC operations, by supplementing compute capacity and improving operational costs.

These specialised HPC facilities present the ERLS sector with a solution that delivers the power infrastructure, resiliency levels and computing resources needed to drive HPC loads cost-effectively. Moving data to remote campuses that benefit from these qualities – often found in regions like the Nordics – provides research centres the medium and high power computing density required at significantly lower energy costs. These locations simultaneously deliver excellent global network communications and data centre security.

An example of this trend can be seen with the Earlham Institute – a leading genomics and bioinformatics research organisation, which runs some of Europe’s largest Life Sciences computational platforms. Earlier this year the institute migrated one of its strategic collaborative bioinformatics analysis platforms to Verne Global’s Icelandic datacentre campus, in order to take advantage of the benefits outlined above. It should be expected that other research institutes will follow a similar path in future – and the data centre industry must be prepared to deliver the HPC-ready facilities the bioinformatics industry needs.

Why is Verne Global – and campuses like it – suited to delivering HPC to the broader bioinformatics industry?

Verne Global’s location in Iceland holds many of the reasons why its facility is ideally suited to HPC operations – stable and plentiful power infrastructure, lower energy costs, excellent network connectivity etc. However, additional qualities also make the Verne Global campus ‘HPC-ready’.

Its ‘variable resiliency’ model and flexible power structure, for example, allow companies to disaggregate applications across tiered data halls and choose the appropriate level of power protection for their data – meaning customers can ensure their HPC workloads are appropriately protected, minimising the risk of down-time and stalling critical research programmes.

That said, these are benefits that can be felt by any organisation in any sector looking to run HPC applications.

Why is Iceland an ideal location for HPC and data storage?

When it comes to HPC and data storage, businesses should be sourcing a robust site that benefits from a stable power infrastructure where the risk of the grid going down – and taking data operations down with it – is much less. For compute-intensive applications such as those used in bioinformatics, regions with renewable energy resources are optimal.

With this in mind, Iceland’s credentials as a HPC and bioinformatics data hub are incredibly strong. According to Cushman & Wakefield’s latest Data Risk Index, Iceland's power grid is ranked the lowest risk in the world, operating at just 10 per cent capacity, and to fours 9s (99.99 per cent) of availability (meaning it experiences an average of just 52 minutes’ downtime a year, if that at all).

Iceland’s proximity to 100 per cent renewable power resources also make data centres in this region – and Verne Global’s campus in particular – ideally located to facilitate the needs of not only bioinformatics centres, but any industry requiring HPC to process their data.

Image Credit: Wichy / Shutterstock