Robert Witoff, NASA's first data scientist: How to build a successful data science team

At AWS re:Invent, ITProPortal spoke to Robert Witoff, the first IT data scientist to be hired by NASA. Robert works in a startup built within NASA's Jet Propulsion Laboratory, tasked with analysing the vast amounts of data produced by the space agency.

So you're currently the only data scientist employed by NASA?

I think NASA is flush with data scientists, but I'm the only one fortunate enough to have that title. I think right now we're on the cusp of appreciating the data scientist and the data itself for the value inherent in it.

What skills does a good data scientists need?

I think a data scientist needs creativity and versatility, and also it's important to have a consulting mindset. We're here to help people, not to do work for them, or take work from them. I think it can be a subtle but a really important distinction. It's also what makes the job so much fun - we get to help people rethink how they're analysing, and what's possible. And I love that!

Can you tell me a little about some of the mission data you've been analysing?

We do so many things on our team - we have so many projects. Some have gone big, some have failed miserably. But it's about moving fast. Our most exciting project right now is a project we're doing for the Curiosity Rover team. They've been pulling telemetry down from the Red Planet for over an earth year now, and they've got a lot of data.

Now there are trends manifesting themselves in some of those telemetry streams that are hard to analyse with traditional tools. So we're bringing in new search technology and new web-based visualisation technology that the project hasn't used before. But we've seen what's happening outside of the JPL, and it's about brining that experience in, to allow the Curiosity team to more effectively analyse their data. It's a really amazing project.

That sounds great - a very successful mission. But could you give me an example of one of the projects that failed?

I think the failure that's taught me the most also helped us understand more about what it means to be a data scientist and what it means to lead a data science team. It was the project that was meant to analyse some of the infrastructure data of one of our networking teams. Very smart people, very robust tools, and they've been doing what they do very well for many years. But they've collected a lot of data that we posit has new value when looked at together.

We wanted to help those engineers respond to failures faster and justify a larger investment in time. We thought why not look at the data and find some examples. The mistake I made was that we had a data scientist working on that project who wasn't in the same room as those people. We didn't start the project with real data, we started the project with an idea. every week, we got 90 per cent of the way toward finishing that project, and at the end of every week, some problem would come up and set us back 100 per cent.

So we spent 10 weeks on that project, always so close to the finish line. The big lessons learned for us was that if you're working on a project as a newcomer to a mature area of any domain, you need to work closely with them. You need to sit with them.

But the next piece of advice we learned was a little more profound, a little more impactful in a way we didn't expect. It was a mistake I made: we started the project without getting access to the data. We got access to a system that we were told had data inside of it. What we've shifted our focus to now is that we're happy to help people analyse their data if they hand us a data file, but if it's locked inside a system, we'll now sit patiently until they exfiltrate that data for us.

We never actually got that data out - but I think it was one of our biggest successes in terms of how much we learnt. We spent about eight weeks going back and forth with people being on vacation, not having the right usernames or passwords or credentials. And about eight weeks in, we called it quits and rethought the project.

When you set up your data analytics startup within the JPL, did you have any structure in mind?

So I wanted to structure it based on my background in the Y Combinator up in San Francisco or Silicon Valley. So I took a company through that in 2009, and being in an environment focused on building things and not planning things - you know, getting down to the ground floor and removing distractions so you can focus on a solution. I think that coupled with the experience I have at the JPL - I've been there almost five years - so I've got some domain experience there, which I think I was lucky to have. So I'm able to bridge the gap to know when we need to focus on a problem, but also when and where we need to speak to someone else.

People are always talking about the "three V's of big data": Volume, velocity and variety. Which of these is the biggest problem for NASA?

I think they're all mixed together. I know people like to pinpoint what it means to be big data, but to me and my team, it's more of an era that we're in. Even if you take the smallest bit of data, say a piece of information like the two of us sitting here now - well that piece of information can tie back to as much data as you want it to. From your connections on LinkedIn, to my friends on Facebook, to the emails we've exchanged, to the imagery that the overhead cameras took when we walked in. So the smallest bit of data is tied to everything else now. I think big data is more of a state than a definition.

So is the JPL involved with new techniques like machine learning and neural networking?

Yeah - we're excited by graph technology right now. We've been working with technologies like [graph database software] Neo4j, and their chief scientist Jim Webber, and introducing that to the laboratory. It's now easier than ever to spin up a graph database, and so we've done things like bring in speakers like Jim. But not just that - before he came in, we built up a graph of all the publications that had come out of the lab, and all the people that had collaborated on projects. For us, it's still a tool of inspiration right now, and we're still working towards operation finance.

Yesterday we also spoke to Tom Soderstrom, chief technology officer of the JPL, about how he chose his first data scientist. Make sure to tune in to our live coverage of the Amazon Web Services re:Invent conference here in Las Vegas, for more interviews, photos and analysis.

Images: NASA