Deep learning in KNIME analytics platform

(Image credit: Image Credit: Geralt / Pixabay)

Recently, deep learning has become very popular in the field of data science or, more specifically, in the field of artificial intelligence (AI).

Deep learning covers a subset of machine learning algorithms, mostly stemming from neural networks.

On the subject of neural networks and their training algorithms, much and more has already been written. Briefly, a neural network is an architecture of interconnected artificial neurons, each neuron performing a basic computation via its activation function. An architecture of interconnected neurons can thus implement a more complex transformation on the input data. The complexity of the transformation functions depends on the single neurons, on the connection structure, and on the learning algorithm. 

While past neural network architectures consisted of just a few simple layers, mainly due to limitations in computational power, deep learning architectures nowadays take advantage of neurons and layers of neurons dedicated to specific tasks, such as convolutional layers for image segmentationor LSTM units for sequence analysis iv. Deep learning architectures also rely on increased computational power, which allows for a relatively fast training of multilayer and recurrent networks.

Figure 1. An example of a neural network. Each circle represents an artificial neuron. Each artificial neuron implements a basic mathematical function. A neural network uses a number of diverse neurons organised in layers and inter-connected.

Figure 1. An example of a neural network. Each circle represents an artificial neuron. Each artificial neuron implements a basic mathematical function. A neural network uses a number of diverse neurons organised in layers and inter-connected.

Python –TensorFlow –Keras –KNIME

The authors of the Python script have long since made available to the general public a set of machine learning algorithms within the scikit-learn library framework.

In more recent years, Google has also open sourced its TensorFlow libraries, including a number of deep learning neural networks. TensorFlow functions can run on single devices as well as on multiple CPUs and multiple GPUs. This parallel calculation feature is the key to speeding up the computationally intensive training required for deep learning networks.

However, using the TensorFlow library within Python can prove quite complicated, even for an expert Python programmer or a deep learning pro. Thus, a number of simplified interfaces have been developed on top of TensorFlow, exposing a subset of its functions and parameters. The most successful of such TensorFlow-based libraries is Keras. However, even though Keras integration presents a lower difficulty than the original TensorFlow framework, it still requires some programming skills.

KNIME Analytics Platform, on the other hand, is an open source GUI-based platform for data science. It covers all your data needs without requiring any coding skills. This makes it very intuitive and easy to use, considerably reducing the learning time. KNIME Analytics Platform has been designed to be open to different data formats, data types, data sources, and data platforms as well as external tools, for example, Python and R.

Because of its graphical interface, computing units in KNIME Analytics Platform are small colourful blocks, named “nodes.” Assembling nodes in a pipeline, one after the other, implements a data processing application. This pipeline is called “workflow” (Fig. 2).

 

Figure 2. A KNIME workflow. KNIME Analytics Platform is based on visual programming. Each node implements a specific task. A pipeline of nodes implements a data analysis application. This is called a “workflow.”

Figure 2. A KNIME workflow. KNIME Analytics Platform is based on visual programming. Each node implements a specific task. A pipeline of nodes implements a data analysis application. This is called a “workflow.”

KNIME Analytics Platform consists of a software core and a number of community provided extensions and integrations. Such extensions and integrations greatly enrich the software core functionalities, tapping, among others, into the most advanced algorithms for artificial intelligence. This is the case, for example, with deep learning.

One of the KNIME Deep Learning extensions integrates functionalities from Keras libraries, which in turn integrate functionalities from TensorFlow within Python (Fig. 3).

KNIME Deep Learning –Keras Integration

In general, KNIME deep learning integrations bring deep learning capabilities to KNIME Analytics Platform. These extensions allow users to read, create, edit, train and execute deep learning neural networks within KNIME Analytics Platform.

In particular, the KNIME Deep Learning –Keras integration utilises the Keras deep learning framework to read, write, create, train and execute deep learning networks. This KNIME Deep Learning –Keras integration has adopted the KNIME GUI as much as possible. This means that a number of Keras library functions have been wrapped into KNIME nodes, most of them providing a visual dialog window to set the required parameters.

The advantage of using the KNIME Deep Learning –Keras integration within KNIME Analytics Platform is the drastic reduction of the amount of code to write. Just by dragging and dropping a few nodes, you can build the desired neural architecture, which you can subsequently train with the Keras Network Learner node and apply with the DL Python Network Executor node — just a few nodes with easy configuration rather than calls to functions in Python code.

Installation

In order to make the KNIME Deep Learning –Keras integration work, a few pieces of the puzzle need to be installed:

  • Python (including TensorFlow)
  • Keras
  • KNIME Deep Learning –Keras extension

More information on how to install and connect all of these pieces can be found on the “KNIME Deep Learning –Keras Integration” documentation page.

A useful video explaining how to install KNIME extensions can be found on the KNIME TV channel on YouTube.

Available nodes

After installing the KNIME Deep Learning –Keras extension, you will find a category KNIME Labs / Deep Learning / Kerasin the Node Repository of KNIME Analytics Platform (Fig. 4).

Here, you can see all of the nodes available for deep learning built on Keras. A large number of nodes implement neural layers: input and dropout layers in Core, LSTM layers in Recurrent, and Embedding layers in the Embedding subcategory. Then, of course, there are the Learner, Reader and Writer nodes to respectively train, retrieve and store a network. A few nodes are dedicated to the conversions between network formats.

Two important nodes are the DL Python Network Executor and DL Python Network Editor. These two nodes respectively enable custom execution and custom editing of a Python-compatible deep learning network via a Python script, including Jupyter notebook. These two nodes effectively bridge KNIME Keras nodes with other Keras/TensorFlow library functions not yet available in the KNIME Deep Learning integration.

Figure 4. Some of the nodes available in KNIME deep learning integrations. Notice the DL4J integration and the Keras integration. Especially within the Keras integration, notice the many nodes available to build specific network layers. A number of nodes are also available to train networks in Keras, TensorFlow and Python.

Figure 4. Some of the nodes available in KNIME deep learning integrations. Notice the DL4J integration and the Keras integration. Especially within the Keras integration, notice the many nodes available to build specific network layers. A number of nodes are also available to train networks in Keras, TensorFlow and Python.

Available example workflows

KNIME offers a public EXAMPLES server with a large selection of example workflows. Some workflows are simple and illustrate the usage of a specific node or feature. Some workflows are more complex and show a possible solution to a classic data science use case, such as demand prediction in IoT, customer segmentation in customer intelligence, or sentiment analysis in social media. These example workflows help you jump-start the resolution of your own use case.

This public EXAMPLES server is accessible from within the KNIME Analytics Platform workbench. In the top left corner, you can see the KNIME Explorer panel, listing the content of your LOCAL workspace as well as the content of mounted KNIME Servers. One KNIME Server is mounted from the start: the EXAMPLES server (Fig. 5). The EXAMPLES server can be accessed only in read-only mode. Double-click on it to open the list of example workflows.

Under 04_Analytics/14_Deep_Learning/02_Keras, a few examples are available that use KNIME Deep Learning –Keras integration. Some use only Python scripts, some use only KNIME Keras nodes, and some use a mix of the two. Some solve an image processing problem, some a text processing problem, and some a classic data analytics problem. If you intend to use KNIME for deep learning, you should start from one of these example workflows — the one that is closest to your current use case.

Figure 5. KNIME offers a public EXAMPLES server with a list of example workflows. Some illustrate just a node or a feature. Some are more complex and illustrate how to solve a data science use case.

Figure 5. KNIME offers a public EXAMPLES server with a list of example workflows. Some illustrate just a node or a feature. Some are more complex and illustrate how to solve a data science use case.

The same list of example workflows is also available on the Node Guide page on the KNIME site. Each workflow comes with a detailed description of task and implementation.

To see how to build, train, deploy, import, customise and speed up a deep learning network, let’s go through the workflow 08_Sentiment_Analysis_with_Deep_Learning_KNIME_nodes.

This workflow extracts the sentiment of movie reviews from the IMDb dataset, using a relatively simple LSTM-based neural network built solely with KNIME Keras nodes and no Python script.

Figure 6. Workflow 08_Sentiment_Analysis_with_Deep_Learning_KNIME_nodes predicts the sentiment of movie reviews using a codeless implementation of an LSTM-based deep learning network.

Figure 6. Workflow 08_Sentiment_Analysis_with_Deep_Learning_KNIME_nodes predicts the sentiment of movie reviews using a codeless implementation of an LSTM-based deep learning network.

How to build a deep learning network

The network for sentiment analysis should have:

  • An input layer to accept the word sequence in each review
  • An embedding layer to transform the text words into a numerical space with lower dimensionality than the dictionary size
  • An LSTM layer to learn and predict the review sentiment from the text word sequence
  • A dense layer with sigmoid output function to produce the sentiment prediction

The category Keras/Layers shown in Figure 4 offers a wide selection of nodes to build specific layers in deep learning networks. Thus, a pipeline of such nodes builds the desired neural architecture.

In order to build the network, we use a specific node for the input layer, a node for the embedding layer, then the LSTM Layer node, and the Dense Layer node for the output neurons.

The input layer is configured to have N=80 neurons, where N is the maximum number of words in a document. Input documents are then zero-padded if the maximum number of words is not reached. Notice that this layer does not perform any transformation on the input data; it just collects them.

The embedding layer is configured to embed the words of the input sequence into a numerical vector. The embedding vector dimension is set to 128, which is way below the dictionary size (~20K words). The dictionary size would be the vector dimension if a one-hot encoding text representation were adopted. The output of the embedding layer is now a tensor [128x80], covering a sequence of 80 word vectors of size 128 each.

The LSTM layer processes this input tensor and produces a sentiment class (1-positive vs. 0-negative) as output.

The dense layer applies a sigmoid function to the predicted sentiment to make sure the predicted sentiment class falls in the [0,1] range.

The whole network was built using four nodes and without writing a single line of code. Easy, right?

Figure 7. Detail of neural network assembling in the top part of workflow 08_Sentiment_Analysis_with_Deep_Learning_KNIME_nodes: input layer, embedding layer, LSTM layer, dense output layer.

Figure 7. Detail of neural network assembling in the top part of workflow 08_Sentiment_Analysis_with_Deep_Learning_KNIME_nodes: input layer, embedding layer, LSTM layer, dense output layer.

Notice that these are not the only neural layers available in the KNIME Deep Learning –Keras extension. There, you can find a number of convolutional layers, the dropout layer, the simple recurrent layer, and many more.

How to train a deep learning network

After transforming each text into a sequence of index-encoded words, we split the original data set into training and test sets, via the Partitioning node. The training set is then used to train the deep learning neural network we have just created.

To train a neural network, you need only one node: the Keras Network Learner node. This node takes three inputs: a previously built neural network, the training set, and, optionally, a validation data set.

This node has four configuration setting groups: one for the input data and their format; one for the target data and the loss function; a third one for the training epochs, batch size, and optimisation parameters; and, finally, a fourth one to handle stagnation in learning (Fig. 8).

In the third group of settings, the following optimisation algorithms are available:

  • Adadelta
  • Adagrad
  • Adam
  • Adamax
  • Nadam
  • RMSProp
  • Stochastic gradient descent

Figure 8. The four tabs in the configuration window of the Keras Network Learner node. The first tab sets the input data, the second tab the target data, the third tab deals with the training parameters, and the fourth tab with the emergency exits.

Figure 8. The four tabs in the configuration window of the Keras Network Learner node. The first tab sets the input data, the second tab the target data, the third tab deals with the training parameters, and the fourth tab with the emergency exits.

How to deploy a deep learning network

Similar to training a deep learning neural network, only one node is needed to apply it: the DL Network Executor node.

The DL [Python] Network Executor node is a very versatile node. It executes a deep learning network on a compatible external back-end platform, selectable in the node configuration window.

The configuration window also requires the format of the input data under the menu “Conversion.” Here, you need to specify the type of encoding that has been used to convert the words into numbers to feed the network, e.g., just numbers or collections of numbers.

Figure 9. Configuration window of the DL Network Executor node.

Figure 9. Configuration window of the DL Network Executor node.

How to speed up training and execution of a deep learning network

What would deep learning be without fast execution of its networks?

As anticipated earlier on in this article, one of the most successful features of TensorFlow (and therefore of Keras) is the parallelisation of neural network training and execution on multiple CPUs and GPUs.

Execution of Keras libraries within the KNIME Deep Learning integration is automatically parallelised across multiple CPUs. If the GPU-based libraries of Keras are installed, execution of Keras libraries also runs on available GPUs, as explained in the Python-Keras-KNIME installation instructions.

How to import and modify a deep learning network

Previously, trained deep learning neural networks could also be imported via:

  • KNIME Model Reader node, if the network has been stored using KNIME
  • Keras Network Reader node, if the network has been stored using Keras
  • TensorFlow Network Reader, if the network has been stored using TensorFlow

Sometimes, networks need to be modified after training. For example, we might need to get rid of a dropout layer, separate the original network into subnetworks, or add additional layers for deployment. Whatever the reason, the DL Python Network Editor node can help. The DL Python Network Editor node allows custom editing of a Python-compatible deep learning network via a user-defined Python script, including Jupyter notebooks.

This simple Python code snippet, for example, extracts the encoder network from an encoder-decoder architecture.

 

from keras.models import Model

from keras.layers import Input

 

new_input = Input((None,70))

encoder = input_network.layers[-3]

output = encoder(new_input)

 

output_network = Model(inputs=new_input, outputs=output)

Conclusions

We have finished the exploration tour of the KNIME deep learning integration via Keras libraries.

We’ve seen that the Keras deep learning integration in KNIME Analytics Platform relies on the Keras deep learning library, which in turn relies on the TensorFlow deep learning library within Python. Therefore, installation requires a few pieces to make the puzzle complete: Python, Keras, and KNIME Deep Learning –Keras extension.

This particular KNIME Deep Learning extension takes advantage of the KNIME GUI and thus allows you to build, train and apply deep learning neural networks with just a minimum amount of Python code — if any at all.

Dedicated nodes implement specific layers in a neural network, and a neural architecture can be assembled simply with just a few drag and drop actions. A Learner node takes care of the training and an Executor node takes care of the application. Additional support nodes are available for network editing, storage, retrieval, and format conversion.

While not eliminating the mathematical complexity of deep learning algorithms, the KNIME-Keras integration allows you, at least, to implement, train and execute networks with little to no experience in Python coding.

Rosaria Silipo, Ph.D., principal data scientist, KNIME
Image Credit: Geralt / Pixabay