There are many different scenarios when building a data science workflow. No matter how complex the data analysis, every data scientist needs to deal with an important final step: communicate their findings to the different stakeholders — decision-makers, managers or clients. This final step is vital because if the findings cannot be understood, trusted or valued, then the entire analysis will be discarded and forgotten.
Besides the usual set of soft skills, data scientists can use data visualization to send a clear message in just a few slides. Data visualization uses colors, shapes, position and other visual channels to encode information so that humans can understand data way faster than by reading some text or looking at an Excel spreadsheet.
Below you will find my personal top five preferred charts to visualize data.
A scatter plot represents input data rows as points in a two-dimensional plot. It is useful for bivariate visual exploration as you can easily display in a two-dimensional space a strong relationship between two features (columns) in the data. Interactively experimenting with different input columns on the x-y axis and with different graphical properties can be an efficient strategy to find those relationships.
A sunburst chart displays categorical features through a hierarchy of rings. Each ring is sliced according to the nominal values in the corresponding feature and to the selected hierarchy. This is a powerful chart for multivariate analysis.
Stacked area chart
The stacked area chart plots multiple numerical features on top of each other using the previous line as the base reference. The areas in between the lines are colored for easier comparison. This chart is commonly used to visualize trending topics.
A bar chart visualizes one or more aggregated metrics for different data partitions with rectangular bars where the heights are proportional to the metric values. The partitions are defined by the values in a categorical feature.
The line plot maps numerical values in one or more data features (y-axis) against values in a reference feature (x-axis). Data points are connected via colored lines. If the reference column on the x-axis contains sorted time values, the line plot graphically represents the evolution of a time series.
I have shown you my personally preferred charts to visualize data: scatter plot, sunburst chart, stacked area chart, bar chart, and line plot. They are very basic charts but yet very powerful. Interesting information can be understood from these charts about bivariate analysis and the relationship between pairs of input features (scatter plot), multivariate analysis of nominal input features (sunburst chart), feature evolution over time (stacked area chart) such as topic trending, comparison of aggregated metrics instead of investigating the single data points (bar chart), and finally evolution over time of time series (line plot).
These are, of course, not the only available charts to visualize and gain insights about the data we are analyzing. What are your preferred charts to visualize data? Share your favorites below in the comments.
Paolo Tamagnini, Data Scientist, KNIME
Image Credit: IT Pro Portal