What is needed 'under the bonnet' for interpreting the human genome?

Silicon Mechanics is a leading manufacturer of Rackmount servers, storage and HPC solutions aimed at the medical, lab, university and research sectors. They have just released a new product solution called Knome knoSYS 100 and the scenarios it is designed for contain some interesting insights for all organisations that deal with Big Data. Last time we spoke to Silicon Mechanics on this podcast earlier in the year we found out just how much hard drive space the human genome takes up so always a fascinating conversation!

Tim Groen, Strategic Account Manager at Silicon Mechanics joins us to tell us more.

For more related podcasts click here.

To subscribe in iTunes to receive new podcast episodes for free click here.

Give us an overview of what Knome knoSYS 100 is and what lead you to develop it?

Knome is a clever play on “Genome” of course and their goal was to build a complete solution not for performing the Genome sequencing but once you have that Genome sequencing managing what you do with it. They developed an application that actually updates the entire Genome with the known research to date on a given set of genes in that sequence. So, for example it will automatically sort through your Genome data and say this person has blue eyes and then it will link that to academic research that confirms that yes this gene does indicate blue eyes - for example.

Genome research is a key aspect of the future of the healthcare industry and pharmaceutical research and so Knome came to Silicon Mechanics to develop a black box solution that would turn a Genome sequence into this useable information. One of their market advantages with this product is to address the privacy and security concerns that are prevalent in medical lab environments because obviously the human Genome is confidential data and so you can’t upload that to the cloud in a situation where you would risk exposing any of that private consumer data. There are also logistical issues associated with possibly moving this type of data outside of the lab environment in which it begins.

It can take weeks to upload a given Genome sequence because these could be multiple terabits of data so it is actually faster and cheaper to send a truck full of hard drives all the way from New York to LA rather than upload a single Genome onto the cloud via gigabit. So we needed something that was in the lab, to which they could upload the data quickly and they could keep it private and secure and then of course it has to be useable for the customer. The other aspect of our design was to make it quiet and to make it easy. We need to consider that our clients are lab scientists and not necessarily computer science engineers or IT professionals that want to mess with computer hardware so we had to make it an “appliance.”

Give us an idea of the sort of applications Knome will be used for and what sort of organisations arte likely to take advantage of this technology?

This takes a Genome sequence and makes it consumable by a non technical researcher so certainly if you are a PhD in Genome research and Biology you can probably look at a Genome sequence and understand what is going on, but someone like me cannot. So this appliance make it easy to read the sequence if you will. It gives you that annotation and tells you this what part of the Genome relates to which characteristic. Obviously we are talking about a sequence that defines an entire human being so it is very complicated and we don’t understand everything, but it is important to be able to recognize quickly what we know and what we don’t know and by running a sequence through this appliance you can quickly get the real information that you need to continue your research.

The amount of data that this kind of research generates must be one of the main challenges when designing any data handling technology, this solution uses 45 applications all running simultaneously so give us an idea of how you configured the systems to deal with that volume of data and tasks.

This is sort of an eco-system of multiple applications running beneath the surface. So for the user it is an appliance, you have an input of a sequence and you have an output of the annotated data but underneath the hood there are multiple applications performing multiple functions and really the end user doesn’t need to see those. These applications are running on a high performance computer cluster, which we call HPC. Each HPC involves linking heterogeneous servers and storage through different types of hardware into a unified system that is generally managed by software of some kind. In this case Knome selected open source software called Lustre that provides hardware clustering capability to unify multiple hardware components into a unified HPC cluster. This enables all 45 or more applications to run in harmony without the customer or the user even realizing it. What the customer will see is the Knome software on top that makes it really easy to use.

To design this system we really had to start from the ground up because there were a number of applications and each of them had their own performance criteria and requirements and we had to look at each one and make sure that we were building in enough performance at all levels.

The performance levels can be multi tasking for example one application might use more CPU, one might need more RAM, one might need more storage and so we had to really perform a thorough cost analysis and make sure that we were putting money and resources into the things that mattered to make this appliance perform extremely well.

You have a partnership with Intel, how does that relationship work when developing technology like this?

Intel has been a major partner for Silicon Mechanics. They are clearly a leader in technology generally and certainly in CPU’s. Knome had been a customer of Silicon Mechanics and they had been purchasing servers from us but Silicon Mechanics had really been growing into more of a solutions orientated company. Knome had really decided, from their won independent conversations to use Intel on this next generation. It became clear that Knome needed the complete solution not just CPU vendor and so Intel recommended that Knome work with Silicon Mechanics and so it was a great harmony where all three of the companies were already working together and this project really helped us come together with a common goal and develop this unique product.

Given that there are so many choices of CPU’s even within Intel’s product line we had to achieve a few different goals. We had to create a product that would stay current for years to come and we also had to make sure that it would match the performance criteria, the power usage criteria, and the pricing criteria that we had also previously established. We built it from the ground up with multiple options and we had to approach it with a very broad net and try a lot of different CPU options and run the pricing, power numbers, performance estimates, set up lab environments and test and test until we came out with a really good bundle of options that still gives the customer flexibility but also make it easy to deploy this appliance.

How can the challenges that you have overcome here and the solutions you have employed be applied to other types of industries?

A lot of people talk about clouds and private clouds and a cloud is basically an HPC cluster and when you rent a cloud service you are simply renting a portion of another person’s HPC cluster so it is really a question of scale and if you are on a big enough scale it makes sense to have your own HPC cluster instead of going onto the cloud and once you have the HPC cluster and in this case we have customized a very specific bundle of components into a very specific function but the nature of this technology is that it could be applied for many uses for example in manufacturing you could design moulds and you could design aerodynamics and water dynamics . We have had Wall Street type of customers who have been using HPC to do high frequency training. We are doing a lot more pharmaceutical research. We are doing a lot more with government. We are selling into Universities and labs around the country doing all kinds of research because this HPC cluster gives you a base framework from which you can really do anything that you could do in the cloud or any kind of product. We can customize it to meet your exact needs so really this type of template can be applied to any industry under the sun.