Data Visualization with pandas & Dash Part 2

In my last post I went into some detail about how to use pandas to take data stored in a flat file format (e.g. CSVs/TXTs) and perform various manipulations to wrangle the raw data. Pandas makes many of these various manipulations incredibly easy, but it can take a while to get used to their API. I would suggest taking some time to just dive into their documentation and user guides, or simply browse through the variety of instructional walkthroughs that are on Youtube. Brandon Rhodes has a great PyCon 2015 presentation entitled Pandas From The Ground Up that can get almost anybody up and running.

Today, I’ll quickly show how we can quickly display our prepped data using Dash’s html & core components! Never heard about Dash? It is a framework that is suited for anyone that wants to build data visualization apps with highly custom user interfaces in pure Python! To quote Dash’s official website:

Through a couple of simple patterns, Dash abstracts away all of the technologies and protocols that are required to build an interactive web-based application. Dash is simple enough that you can bind a user interface around your Python code in an afternoon.

In order to follow along, you will need to make sure you have imported the following modules into python file.

#!/usr/bin/env python
import dash
import dash_core_components as dcc
import dash_html_components as html
import pandas as pd
import plotly.graph_objs as go
import datetime

First, I create an instance of a Dash object.

app = dash.Dash()

Next, I begin to build the layout of my application. The layout of a Dash app describes what the app is going to look like.

app.layout = html.Div([
    html.Br(),
    dcc.Slider(
        id = "month-slider",
        min = data_df.dateordered.dt.month.min(),
        max = data_df.dateordered.dt.month.max(),
        marks = {str(i): months[str(i)] for i in data_df.dateordered.dt.month.unique()},
        value = data_df.dateordered.dt.month.min(),
        step = None
    ),
    html.Br(),
    html.H1(["Sales & Returns 2016"],
        style = {"textAlign": "center"}),
    dcc.Graph(
        id = "data-graph",
        animate=True),
    html.Div([
        html.H3(["Monthly Totals"],
            style = {"textAlign": "center"}),
        pandas_gen_html_table(month_df)
    ]),
],
className = "container")

There is a lot to unpack here…much more than I am going to go into in this post, but to summarize, the dash_html_components library provides numerous classes for many of the well-known HTML tags and keyword arguments.

To give a quick visual example, the python code above produces the following HTML tags (the tags have been collapsed to make the HTML model the python code as much as possible).

Dash to HTML

Once I have established the general HTML structure of the page, I begin to build the code that is going to be driving the interactivity of our application.

@app.callback(
    dash.dependencies.Output("data-graph", "figure"),
    [dash.dependencies.Input("month-slider", "value")])
def update_df_graph(in_month):
    traces = []
    for i in data_df.orderstatus.unique():
        df_by_status = data_df[data_df.orderstatus == i]
        if i == "returned":
            filtered_df = df_by_status[df_by_status.datereturned.dt.month == in_month]
            filtered_df = filtered_df.groupby("datereturned")["returns"].sum().to_frame()
            traces.append(go.Scatter(
                x = filtered_df.index,
                y = filtered_df.returns,
                mode = "lines+markers",
                opacity = .7,
                marker = {
                    "line": {"width": .5, "color": "white"},
                    "symbol": "square"
                },
                name = i
            ))
        else:
            filtered_df = df_by_status[df_by_status.dateordered.dt.month == in_month]
            filtered_df = filtered_df.groupby("dateordered")["orders"].sum().to_frame()
            traces.append(go.Scatter(
                x = filtered_df.index,
                y = filtered_df.orders,
                mode = "lines+markers",
                opacity = .7,
                marker = {
                    "line": {"width": .5, "color": "white"},
                    "symbol": "201"
                },
                name = i
            ))

    return {
        "data" : traces,
        "layout" : go.Layout(
            title = "Returns & Orders by Month",
            xaxis={"range": [datetime.date(2016,8,1), datetime.date(2017,1,1)], "type": "date", "title": "Date"},
            yaxis = {"type": "Linear", "title": "# of Transactions"},
            legend = {"x": 1, "y": 1},
            hovermode = "closest"
        )
    }

Let’s break this excerpt down! The decorator will be ignored for now, and we’ll start with the update_df_graph function definition. As detailed in the first part of this series, our data_df dataframe currently has the following columns:

dataframe2

The function will take an integer value representing a Month. This month integer will be used to filter the dataframe by the given month, and create two dataframes that sum returns and orders grouped by their date. Each of these resulting dataframes are then used to populate a Scatter plot object (a class of a Dash graph_objs) which is then appended to a list of Scatter plot objects traces. For each of these Scatter plot objects I chose to plot the respective dates (either “dateordered” or “datereturned”) on the X axis , and the amount of “orders” or “returns” on the Y axis.

mode, opacity, marker, name attributes let a user define what kind of style the plot should be implemented with. For example, when plotting the returns data, I chose to use have my individual points connected by a line, and my actual points are to be square-shaped.

Lastly, the function will return a dictionary that has two keys: data and layout. The layout value will detail how the graph is going to look (e.g. which data is on the X & Y axes, whether or not I want the graph object to be titled, etc.), and the data value will be graphed ONTO the graph described by the layout value.

There, now that we have the quick function walkthrough out of the way we can discuss the use of the @ decorator immediately prior to our update_df_graph. This decorator is essentially what allows our graph to be dynamically updated based on the user’s interactions with the Slider object in our app.layout object. In Dash, the inputs and outputs of an application are the properties of a particular component. In this example, our input is the “value” property of the component that has the ID “month-slider”. Our output is the “figure” property of the component with the ID “data-graph”. You probably have noticed that I did not set a “figure” property of the “data-graph” component in my layout, this is because when the Dash app starts, it automatically calls all of the callbacks with the initial values of the input components in order to populate the initial state of the output components. For example, if I had specified something like html.Div(id='data-graph') with a “figure” property to start, it would get overwritten when the app starts. In this application the initial output in the “data-graph” component was explicitly defined in the our layout code.

value = data_df.dateordered.dt.month.min()

Reactive Programming! Pretty neat stuff.

#TODO data engineering

Data Visualization with pandas & Dash Part 2

Related Links: