How to lay the foundations for machine learning

null

Machine learning feels new to many of us because it has only recently become feasible for the mass market, but its roots span decades. The concept of machines learning from data materialised in the 1950’s, and in 1988 IBM revolutionised the industry by introducing principles of probability-based data algorithms into what had previously been the rule-based field of machine learning.

Now, we have our very own virtual PAs – Siri, Alexa, Google Now – that utilise machine learning to collect and analyse information collated from our interactions to anticipate our needs and tailor services to our preferences. Social media sites use the technology to suggest further friends while facial recognition in photos apps to save us time, energy and effort. But beyond the obvious, machine learning is also now protecting us from fraud by detecting patterns in card payments and improving the delivery of our online shopping.

Businesses today want their data to do the heavy lifting and work for them, driven by the desire to cut costs, improve consistency and streamline operations. Machine learning helps to make this possible at scale, and a Deloitte survey found 57 per cent of businesses increased spending in related technology in 2018. While the technology was previously perceived as an excessive expenditure, today it is understood as an investment in the business’ future and a competitive revenue driver.

Data expert and author, Bernard Marr says that now that the developers have experimented with and trialled the algorithms and technology, machine learning is set to take centre stage in business plans and budgets across the UK this year. Recent research supports this claim, revealing that 48 per cent of European organisations now regard ML as crucial to their business’ near future.

As the likes of Amazon, Facebook and Google continue to push the boundaries in ML technology, how can you make the most of the latest and greatest algorithms? The most successful businesses will be those who invest in their technology, and intelligently utilise the skills and data systems already at their disposal. Forget the hype and get back to basics with these tips.

Get your data(base) in order

One of machine learning technology’s greatest features is its flexibility; it can be used to leverage everything from supply chain and stock control to the automation of factories and repetitive data entry tasks. Each application requires a single repository in which the data can be collected and manipulated to allow the algorithm to assess value. For ML algorithms to offer informed judgements and recommendations, the underlying database must provide a steady supply of clean, accurate, detailed data.

Nearly half of organisations studied in recent research conducted by Vanson Bourne admitted to making investments in data quality services to ensure that their data is serviceable for all ML applications. Without data quality and consolidation, AI technologies wouldn’t have gone on to improve cancer survival rates, best humans at chess and Go, and change the face of biochemistry.

This shift in investment, focusing on ensuring captured data is of the highest quality possible rather than simply casting the data net as wide as possible, is a stark industry change. Less than a decade ago, dedicated data quality services and tools were a niche service and largely underused by data-heavy businesses. Now, they are front and foremost in the C-suite’s future plans.

As ML continues to advance at an increasing rate, businesses must ensure that they support their data scientists and invest in the necessary technology to process such algorithms. A solid database of high-quality data takes organisations a step closer to integrating ML into their business, but if their data scientists do not have the correct resources then this momentum will falter.

Talk the (data scientist’s) talk

Before walking the walk, businesses must consider the various programming languages they wish to add to their ecosystems of software, taking into consideration the business’ end goals, the programming skills available, and the qualities of each language.

Research reveals that 64 per cent of organisations admit that predictive analytics, which relies upon ML to mine large datasets and predict the outcome of future events, is a key motivator for investing in ML. This predictive analytics function relies upon Data Scientists’ mastery of the appropriate programming language – and just how does one master anything? By studying, experimenting and learning from others.

This is where Python, one of the most popular programming languages in the world according to 2018’s Tiobe Index, really stands out. Python has overtaken its rivals in popularity, largely due to its simplicity, readability, versatility and flexibility. As millions of people globally learn and use the language, more and more individuals and groups share programmes, tips and entire algorithms with each other online. Python’s network of users gives businesses hoping to use and experiment with Python countless learning materials right at their fingertips.

Python-based technologies also continue to proliferate. On March 4th, deep learning library TensorFlow’s second edition Alpha was distributed. TensorFlow 2.0 promises to continue its predecessor’s reign as one of the world’s most popular machine learning projects, with an even more extensive library for Python. Using Python scripts, it is much easier to tap into the wealth of knowledge and rapid advances in the data science community.

Ultimately, businesses should aim to provide one underlying data infrastructure that everyone across all teams should feed into and take from. For the BI team, this will typically be SQL (even if their tool generates it), but to succeed in this goal data scientists must be allowed to run scripts on the data using their preferred languages – notably Python. This standardisation and democratisation of data means that businesses can apply ML across any and all parts of the business in more creative and experimental ways.

Look to the cloud

While on-premise IT infrastructure is capable of hosting many open-source frameworks to build ML solutions, today many businesses lack the power and scalability to efficiently support these solutions. For example, most businesses do not currently have significant GPU compute because they plan capacity for operational x86 workloads – whereas the process of training a deep learning algorithm may be dramatically accelerated by a cluster of GPU servers working in parallel.

If a business is evaluating machine learning for a project, hyperscale cloud offers consumption-based access to GPU compute. It also offers additional x86 compute with which to build out a performant infrastructure for in-database analytics that algorithms can then feed from.

When the requirement shifts from batch analysis to real time (or at least business time), the flow of relevant data must keep pace with the need of ML algorithms working in near real-time. Cloud elasticity can be exploited to ensure that workloads are supported throughout a project’s lifecycle and give enterprises the freedom to experiment with ML capabilities without being held back by CAPEX decisions.  

It has never been so easy for organisations to expand into the cloud, as the big three public cloud providers – AWS, Google and Amazon – all fight for ML business. Despite this, last year’s BI to DA Analytics study found only 30 per cent of organisations had exploited the elastic scalability of the cloud to derive value from their organisation’s data with machine learning.

Data analytics and ML infrastructure is business critical for data centric organisations. Business looking to invest in new technology strategies should ensure that their analytical database infrastructure runs across both on-premise and cloud applications in unison – giving them the freedom to migrate workloads between third party datacentres and on-premise to optimise for cost and plan for evolving data governance requirements in the regions where they operate.

While ML may seem daunting in its complexity and application, providing the infrastructure for launching ML projects is more achievable than many think. In fact, businesses are already utilising the technologies they need in their standard IT processes: databases, programming languages, Infrastructure as a Service. To take the next step towards optimising for ML, these technologies must simply be employed in a different, less passive capacity.

As more organisations prioritise the quality of their data and understand the benefits of understanding and applying machine learning, they will enjoy the benefits of better decisions and reduced cost. As margins become ever-tighter, and profits harder to achieve, machine learning will become the route to success.

Mathias Golombek, CTO, Exasol
Image Credit: Computerizer / Pixabay