Optimising machine learning

A great deal of manual work goes into building and training intelligent machine learning algorithms.

Machine learning seems to be the new hot topic these days. Everybody is talking about it since we’ve been seeing machines beat human players in chess, jeopardy, and now even Go. Our cars will be driven by artificial intelligence in the future, our jobs will be taken over by robots. There’s a lot of hype, a lot of fear and uncertainty - as so often when new technology has the potential to disrupt our societies. 

However, when you talk to the people that are actually involved in developing these new types of intelligent algorithms, you get a quite different picture. Today, there’s a lot of manual work involved in automating decision processes. The development of algorithms that can make decisions in a “weak” intelligent way is hard work. I call them “weak” intelligent algorithms, since we’ve only so far been able to develop algorithms that can do one thing. They might be able to do this one thing extraordinarily well, like playing Go or playing Chess. However, if you ask the algorithm that can play Go to drive your car, it will fail. So, we’re still a long way off being able to develop the ‘highly’ intelligent machine.  

Cumbersome trial-and-error approach   

What we can do, though, is apply algorithms to almost any kind of digital data to extract information automatically and make decisions in a seemingly intelligent way. The development of these algorithms, called machine learning, can remain a cumbersome journey. This is because the usual approach is to apply trial-and-error methods to find the optimal algorithms for a problem at hand. Usually, a data scientist will choose algorithms based on practical experience and personal preferences. That is okay, because usually there’s no unique and relevant solution to create a machine learning model. Many algorithms have been developed to automate manual and tedious steps of the machine learning pipeline – for example, to loosen prerequisites under which machine learning theories and approaches apply, to create input features automatically and select the best predictors, to test different modeling algorithms and choose the best model.  But still, there’s a lot of lab work required to build a machine learning model with trustworthy results.  

A big chunk of this manual work is related to finding the optimal set of hyperparameters for a chosen modelling algorithm. Hyperparameters are the parameters that define the model applied to a data set for automated information extraction. For example, if I decide to build a machine learning model to predict which customer to grant a credit, I need to make many decisions during the training process. I need to choose which modeling approaches to test, which data I choose to train the model and which data to test the results, how to tune the parameters of the chosen model and how to validate the results. All these choices will impact the outcome of my model building exercise, and eventually the final model selected. When you consider this model will be used to decide which customer will get  credit, it’s important that we have high confidence in the model to make decisions we can trust. 

A large portion of the model building process – beside the analytical data preparation that still takes the lion’s share of the time – is taken up by experiments to identify the optimal set of parameters for the model algorithm. Here we quite quickly get into the curse of dimensionality. Modern machine learning algorithms have a large number of parameters that need to be tuned during the model training process. There’s also a trend to develop more and more complex algorithms that can automatically drill deeper into the data to find more subtle patterns. 

For example, we’re seeing a development from shallow neural networks to deep neural networks, from simple decision trees to random forests and gradient boosting algorithms. While these algorithms improve the chances to build accurate, stable predictive models for more complex business problems (such as fraud detection, image processing, speech recognition, cognitive computing), they also require a much larger number of parameters to be tuned during training. (There is no free lunch J). So, if I have 10 parameters that need to be tuned to an optimal setting and each parameter can have 10 different values (these are very conservative numbers) I end up with combinations to test as many as  90 different settings. And this only applies to a single modeling approach. If I’d like to test different algorithms this number grows very quickly. 

Speedy autotuning approach 

So, what can we do? There are several ways to support the data scientist in this cumbersome lab work of tuning machine learning model parameters. These approaches are called hyperparameter optimisation.   

In general, there are three different types: parameter sweep, random search and parameter optimisation.   

Parameter sweep: this is an exhaustive search through a pre-defined set of parameter values. The data scientist selects the candidates of values for each parameter to tune, trains a model with each possible combination and selects the best-performing model. Here, the final outcome very much depends on the experience and selection of the data scientist. 

Random search: this is a search through a set of randomly selected sets of values for the model parameters. With modern computers, this can provide a less biased approach to finding an optimal set of parameters for the selected model.  As this is a random search there are chances to miss the optimal set unless a sufficient number of experiments are conducted, which can be expensive. 

Parameter optimisation: again there are different approaches here, but they all apply modern optimisation techniques to find the optimal solution. In my opinion, this is the best way to find the most appropriate set of parameters for any predictive model, and any business problem, in the least expensive way. The “Optimal Solution” so to speak. 

SAS has conducted lots of research into hyper parameter tuning - we call it Autotuning.  It’s now possible to quickly and easily find the optimal parameter settings for diverse machine learning algorithms such as Decision Trees, Random Forests, Gradient Boosting, Neural Networks, Support Vector Machines and Factoridation Machines by simply selecting the option you want. In the background there are complex local search optimisation routines hard at work that tune the models efficiently and effectively. I’m convinced this new capability will be a great help to the modern data scientist. They will find the best model much quicker and with more confidence. For the business this means getting to value with machine learning faster.  

Sascha Schubert, business solutions manager at SAS 

Image Credit: Sarah Holmlund / Shutterstock