Data visualisation in Python#

Making sense of data is complicated. We can often process patterns and trends in large amounts of data much more easily when presented in a visual way.

Take for example, this csv file. It’s a lot of numbers!

!head somedata.csv
x,s,c
0.0,0.0,1.0
0.06346651825433926,0.0634239196565645,0.9979866764718844
0.12693303650867852,0.12659245357374926,0.9919548128307953
0.1903995547630178,0.18925124436041021,0.9819286972627067
0.25386607301735703,0.2511479871810792,0.9679487013963562
0.3173325912716963,0.31203344569848707,0.9500711177409454
0.3807991095260356,0.3716624556603276,0.9283679330160726
0.4442656277803748,0.42979491208917164,0.9029265382866212
0.5077321460347141,0.4861967361004687,0.8738493770697849

We can read CSVs into pandas, which let us view it as a nice table and perform operations, but looking at the table doesn’t give us a lot more insight:

import pandas as pd

df = pd.read_csv("somedata.csv", index_col="x")
df
s c
x
0.000000 0.000000e+00 1.000000
0.063467 6.342392e-02 0.997987
0.126933 1.265925e-01 0.991955
0.190400 1.892512e-01 0.981929
0.253866 2.511480e-01 0.967949
... ... ...
6.029319 -2.511480e-01 0.967949
6.092786 -1.892512e-01 0.981929
6.156252 -1.265925e-01 0.991955
6.219719 -6.342392e-02 0.997987
6.283185 -2.449294e-16 1.000000

100 rows × 2 columns

But what if we plot it?

df.plot()
<Axes: xlabel='x'>
../../_images/045f8fdb3d3bd8093e822ae02c525ee19f4d571f8f5501c2edc8fe1c0a2479b4.png

I recognize that! That’s \(sin(x)\), and \(cos(x)\)!

But what kind of plot we choose matters a lot! What if I’d chosen to use a bar chart:

df.plot(kind="bar")
<Axes: xlabel='x'>
../../_images/9a56f010b9c3a65872287ac7ff3f7a261ba0af8bf984a860c93bbbe341ccea14.png

or histogram:

df.plot(kind="hist", stacked=True)
<Axes: ylabel='Frequency'>
../../_images/efb6200ccba63addc14a947a6bb0eae4ac8b666fc7f53454d2b850cdd9584de1.png

There are lots of ways to visualise data, and different kinds of visualisations help answer different kinds of questions about the data.

Data visualisation tools in Python#

There are lots of General purpose tools:

  • matplotlib (general purpose, extremely powerful)

  • altair (tabular data, declarative)

  • plotly (interactive)

  • bokeh (interactive)

  • bqplot (interactive, Jupyter-focused)

  • pyvista (vtk, 3d meshes)

  • lots more!

Domain-specific tools:

We are going to talk about matplotlib, which is what pandas’ DataFrame.plot uses, and altair, which is also a dataframe-focused plotting library, which makes heavy use of pandas.