Skip to main content

Implications of blockchain In data science

(Image credit: Image Credit: Zapp2Photo / Shutterstock)

Emerging technologies such as Big data and Blockchain are touted to be the next big things set to revolutionise the way organisations do business. Most of us are under the impression that these technologies are mutually exclusive - each having its own unique paths and used separately. However, that will be off the mark.

While Data science deals with utilising data for proper administration, blockchain ensures the data security with its decentralised ledger.

These technologies have vast untapped potential that can increase efficiency and enhance productivity. The question is, is there a point where these technologies can come together? What will be achieved when Blockchain and Data Science are concurrently applied? Why is it said that Blockchain is the future of Data Science?

Before we answer these questions, let’s first take a separate look into each of these technologies to understand them better.

What is Blockchain?

Blockchain is fundamentally a digital ledger that records every transaction that takes place. As it is decentralised, there is no one single authority, meaning that no one can manipulate the transactions that takes place in this ledger. The information that is stored in blockchain data structure cannot be tampered with, as altering one block would mean changing all the other blocks that follows it. In case one past block is changed, all the following blocks are changed as a result. Thus, it is not possible for change in even one block to escape being noticed. 

Blockchain technology rose to prominence with the increasing interest in digital currencies such as cryptocurrencies and bitcoin. However, today it has found relevance not just in recording cryptocurrency transactions, but also recording anything of value.

A study conducted by Upwork found blockchain skills to be one of the hottest commodities in the freelance marketplace. The same report also highlighted that job postings in blockchain has seen an exponential increase in recent years. 

The applications of blockchains go far beyond monetary use cases such as Bitcoin. The blocks in blockchain can hold different kinds of information, thus making blockchain very useful and versatile. Some of the things that can be stored in blockchain are medical records, land deeds, car titles and many more. In short, blockchain is valuable virtually in any type of case where recording things in a transparent, decentralised, secure and tamper-free manner is essential. Some more use cases of blockchain are as follows:

  • Creating digital identity systems
  • Keeping a record of physical products
  • Developing various financial instruments
  • Enabling voting to be more transparent

What is data science?

Data Science is one of the trending industries in technology today. The field sees a lot of innovations in its subdomains such as Predictive analytics, Diagnostic analytics and Descriptive Analytics.

The aim of Data Science is to extract insights and other information from data, both structured and unstructured data. The field of data science encompasses machine learning, data analysis, statistics, and other advanced methods that are employed to gain an understanding of the actual processes that uses data.

Corporate giants such as Facebook, Google, Apple, and Amazon are mining volumes of data every day. The vast field of data science has spurred the demand for data scientists who are tasked with deriving meaning from data and assist in solving real-world problems. This demand is also fed by the area of Big Data, an advanced area of data science which deals with extremely huge volumes of data that cannot be handled by the conventional data handling techniques.

The relationship between Blockchain and Data Science, if there is any, has not been researched much. Looking at it simplistically, both these technologies have data at the centre. While blockchain validates and records data, data science focuses on deriving meaningful insights from data for problem-solving. Both of these technologies employ algorithms to control interactions with different data segments. In crux, data science is for predicting and blockchain is for validating data.

How blockchain can help big data?

You can say that if big data refers to the quantity of data, blockchain refers to the quality of data.

With Blockchain, a new way of handling data is possible. It has eliminated the need for the data to be brought together and has paved the way to a decentralised structure where data analysis is possible right from the edge of individual devices. Additionally, data generated through blockchain is validated, structured and immutable. Since the data that provide by blockchain is ensured of data integrity, it enhances big data.

Today, most businesses are looking towards deeper, advanced analytics as data has become more accessible and robust. Currently, the data that business use are mostly scattered which demands weeks or months of effort to sort out. The integrity of the data can be affected greatly by any sort of human error, affecting the end analysis. Data also faces the risk of being compromised when it is stored in one centralised location. There is also the possibility of data centres being tampered with and getting released to the public. Everyone wants needs, but it is a huge chore to ensure that it is accurate and secure. For executing data analysis and predictive modelling, data science needs a functional and solid data set. With a decentralised blockchain, data scientists can strengthen their ability to manage data and also set a solid infrastructure.

Did you know, recently a consortium of 47 Japanese banks signed up with a blockchain startup called Ripple to use blockchain for facilitating money transfers between bank accounts? The motive behind this move was to significantly reduce costs while performing real-time transfers. As you know the traditional real time transfers are bit high on the cost side as the potential risk factors are huge. One of the problems with real-time transfers is double-spending. This can be curbed by using blockchain technology.  One of the reasons traditional real-time transfers were expensive was because of the potential risk factors. Double-spending (which is a form of transaction failure where the same security token gets used twice) is a real problem with real-time transfers.

Other than banking industries, many industries too has adopted blockchain with security in mind. Diverse companies, from retail, healthcare to public administration have started on their blockchain journey to prevent data leaks and hacks. Blockchain is the future of DataScience.

How blockchain will enhance data science

Enables Data Traceability

Blockchain facilitates peer to peer relationships. For example, if a published account does not explain any methodology properly, any peer can review the entire process and identify how the results were obtained.

With the ledger’s transparent channels, anyone can come to know which data is reliable to use, how to store it, how to update it, where it comes from, and regarding its proper usage. To summarise, blockchain technology will enable users to trace data from the point of entry to exit.

Makes Real-Time Analysis Possible

Data analysis in real time is very difficult. Being able to monitor changes in real time is considered the most proficient way of identifying fraudsters. However, for a long time real-time analysis was not possible. Today, thanks to Blockchain’s distributed nature, companies are able to detect any anomalies in the database from the start.

The ability to see changes in data in real time is a feature that is present in spreadsheets. Just like that blockchain also enables two or more people to work on the same kind of information at the same time. 

Guarantees Data Quality 

The information in Blockchain’s digital ledger is stored in different nodes, including both private and public. Before it is added to other blocks, the information is cross-checked and analysed at the entry point itself. This process taken in itself is a way to verify the data.

Makes Data Sharing Easier

There are a lot of advantages for the organisations when there is a smooth and easy flow of data. It is very difficult with paper records. This difficulty is compounded even more when the data in it is required elsewhere. It is true that these files will reach the other department, but it might take a long time and might also face the risk of getting lost during the transit.

Today, most data scientists are fascinated with blockchain because it makes it possible for possible for two or more people to access the data at the same time and in real-time. 

Thus, when information flows without any restriction, the administration process becomes streamlined.

Ensures Trust

As you must be aware, biases are often an issue when there is a single authority. Placing too much trust in a single person can prove hazardous. There are many companies that does not allow any third party access to their data because of trust issues. This causes information sharing to be literally impossible. With blockchain technology, the issue of trust does not come in the way of information sharing. Organisations are able to collaborate effectively by sharing the information they have at their disposal.

Improves Data Integrity

In the previous decade, the main focus of organisations was on improving the data storage capacity. Towards the end of 2017, that was resolved. Now, the new concern that most organisations have pertains to protecting and verifying the integrity of the data.

The main reason for this is because organisations harvest data from different centres. The data that is pulled even from government offices or internally produced can be prone to errors. Additionally, other sources of data such as social media can also prove to be inaccurate.

Today, data scientists are using blockchain technology to ensure the authenticity and track the data at every point on the chain. One of the reasons for its large scale adoption is its immutable security. With blockchain’s decentralised ledger, data is protected at every step through multiple signatures. In order for anyone to gain access to the data, the exact signatures has to be provided. This has the consequence of significantly reducing instances of data hacking and leaks.  

Following are some of the security features of blockchain that is invaluable to data science:

Encoded transactions

Blockachain uses complex mathematical algorithms for encrypting every transaction that takes place in its ledger. These transactions exits as immutable and irreversible  digital contracts between parties.

Data lakes

Data scientists usually record their organisation’s details in data lakes. When blockchain is used to track the origin of the data, it is recorded in a specific block with a specific cryptographic key. What this means is that anyone who uses this data has the right key from person who originated the data meaning that the information is accurate, of good quality and genuine.

My verdict

Data Science is a field that is constantly evolving. With the integration of blockchain technology, transparent record keeping and robust security will become a reality and thus, data scientists will be able to achieve a number of milestones that were previously considered impossible. Though blockchain is relatively a new technology, the preliminary results from some of the companies that has been experimenting on them proves that they can be used effectively.

Currently, blockchain is still in its nascent stage; this is not very evident due to the hype that is surrounding it. As and when the technology matures and more innovations takes place, there will emerge more concrete use cases and data science can be one of the areas that will greatly benefit from it. Having said that, some questions have been raised regarding its impact in data science, particularly in big data where large volumes of data needs to be handled. One major concern is that implementing blockchain application in this regard will be expensive. This is because storing data in blockchain is costlier compared to the traditional means of data storage.  Relatively small volumes of data can be stored in blocks which might prove hindersome as large volumes of data are collected per second for big data & data analysis tasks.

It remains to be seen how blockchain will evolve to address these concerns and go on to disrupt the data science space. One thing for sure is that this technology has huge potential to transform how data is handled and used.

Vibhuthi Viswanathan, content curator, SpringPeople