Data Visualization with pandas & Dash Part 2
In my last post I went into some detail about how to use pandas to take data stored in a flat file format (e.g. CSVs/TXTs) and perform various manipulations to wrangle the raw data. Pandas makes many of these various manipulations incredibly easy, but it can take a while to get used to their API. I would suggest taking some time to just dive into their documentation and user guides, or simply browse through the variety of instructional walkthroughs that are on Youtube. Brandon Rhodes has a great PyCon 2015 presentation entitled Pandas From The Ground Up that can get almost anybody up and running.
Today, I’ll quickly show how we can quickly display our prepped data using Dash’s html & core components! Never heard about Dash? It is a framework that is suited for anyone that wants to build data visualization apps with highly custom user interfaces in pure Python! To quote Dash’s official website:
Through a couple of simple patterns, Dash abstracts away all of the technologies and protocols that are required to build an interactive web-based application. Dash is simple enough that you can bind a user interface around your Python code in an afternoon.
In order to follow along, you will need to make sure you have imported the following modules into python file.
#!/usr/bin/env python
import dash
import dash_core_components as dcc
import dash_html_components as html
import pandas as pd
import plotly.graph_objs as go
import datetime
First, I create an instance of a Dash object.
app = dash.Dash()
Next, I begin to build the layout
of my application. The layout of a Dash app
describes what the app is going to look like.
app.layout = html.Div([
html.Br(),
dcc.Slider(
id = "month-slider",
min = data_df.dateordered.dt.month.min(),
max = data_df.dateordered.dt.month.max(),
marks = {str(i): months[str(i)] for i in data_df.dateordered.dt.month.unique()},
value = data_df.dateordered.dt.month.min(),
step = None
),
html.Br(),
html.H1(["Sales & Returns 2016"],
style = {"textAlign": "center"}),
dcc.Graph(
id = "data-graph",
animate=True),
html.Div([
html.H3(["Monthly Totals"],
style = {"textAlign": "center"}),
pandas_gen_html_table(month_df)
]),
],
className = "container")
There is a lot to unpack here…much more than I am going to go into in this post, but to summarize,
the dash_html_components
library provides numerous classes for many of the well-known HTML tags
and keyword arguments.
To give a quick visual example, the python code above produces the following HTML tags (the tags have been collapsed to make the HTML model the python code as much as possible).
Once I have established the general HTML structure of the page, I begin to build the code that is going to be driving the interactivity of our application.
@app.callback(
dash.dependencies.Output("data-graph", "figure"),
[dash.dependencies.Input("month-slider", "value")])
def update_df_graph(in_month):
traces = []
for i in data_df.orderstatus.unique():
df_by_status = data_df[data_df.orderstatus == i]
if i == "returned":
filtered_df = df_by_status[df_by_status.datereturned.dt.month == in_month]
filtered_df = filtered_df.groupby("datereturned")["returns"].sum().to_frame()
traces.append(go.Scatter(
x = filtered_df.index,
y = filtered_df.returns,
mode = "lines+markers",
opacity = .7,
marker = {
"line": {"width": .5, "color": "white"},
"symbol": "square"
},
name = i
))
else:
filtered_df = df_by_status[df_by_status.dateordered.dt.month == in_month]
filtered_df = filtered_df.groupby("dateordered")["orders"].sum().to_frame()
traces.append(go.Scatter(
x = filtered_df.index,
y = filtered_df.orders,
mode = "lines+markers",
opacity = .7,
marker = {
"line": {"width": .5, "color": "white"},
"symbol": "201"
},
name = i
))
return {
"data" : traces,
"layout" : go.Layout(
title = "Returns & Orders by Month",
xaxis={"range": [datetime.date(2016,8,1), datetime.date(2017,1,1)], "type": "date", "title": "Date"},
yaxis = {"type": "Linear", "title": "# of Transactions"},
legend = {"x": 1, "y": 1},
hovermode = "closest"
)
}
Let’s break this excerpt down! The decorator will be ignored for now, and we’ll start with the
update_df_graph function definition. As detailed in the first part of this series, our data_df
dataframe currently has the following columns:
The function will take an integer value representing a Month. This month integer will be used
to filter the dataframe by the given month, and create two dataframes that sum returns and orders
grouped by their date. Each of these resulting dataframes are then used to populate
a Scatter plot object (a class of a Dash graph_objs
) which is then appended to a list of
Scatter plot objects traces
. For each of these Scatter plot objects I chose
to plot the respective dates (either “dateordered” or “datereturned”) on the X axis
, and the amount of “orders” or “returns” on the Y axis.
mode
, opacity
, marker
, name
attributes let a user define what kind of style the plot
should be implemented with. For example, when plotting the returns data, I chose to use have my
individual points connected by a line, and my actual points are to be square-shaped.
Lastly, the function will return a dictionary that has two keys: data
and layout
. The layout
value will detail how the graph is going to look (e.g. which data is on the X & Y axes, whether
or not I want the graph object to be titled, etc.), and the data
value will be graphed
ONTO the graph described by the layout
value.
There, now that we have the quick function walkthrough out of the way we can discuss the use of
the @
decorator immediately prior to our update_df_graph
. This decorator is essentially what
allows our graph to be dynamically updated based on the user’s interactions with the Slider object
in our app.layout
object. In Dash, the inputs and outputs of an application are the properties
of a particular component. In this example, our input is the “value” property of the component
that has the ID “month-slider”. Our output is the “figure” property of the component with the ID
“data-graph”. You probably have noticed that I did not set a “figure” property of the
“data-graph” component in my layout
, this is because when the Dash app starts, it
automatically calls all of the callbacks with the initial values of the input components in
order to populate the initial state of the output components. For example, if I had specified
something like html.Div(id='data-graph')
with a “figure” property to start, it would get
overwritten when the app starts. In this application the initial output in the “data-graph” component
was explicitly defined in the our layout
code.
value = data_df.dateordered.dt.month.min()
Reactive Programming! Pretty neat stuff.