{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Brief tour of Altair\n", "\n", "Adapted from https://altair-viz.github.io/altair-tutorial\n", "Copyright (c) 2018 Jake Vanderplas\n", "Used under MIT License\n", "\n", "Useful links:\n", "\n", "- altair tutorial: https://altair-viz.github.io/altair-tutorial\n", "- altair docs: https://altair-viz.github.io\n", "\n", "\n", "The goal of this section is to teach you the core concepts required to create a basic Altair chart; namely:\n", "\n", "- **Data**, **Marks**, and **Encodings**: the three core pieces of an Altair chart\n", "\n", "- **Encoding Types**: ``Q`` (quantitative), ``N`` (nominal), ``O`` (ordinal), ``T`` (temporal), which drive the visual representation of the encodings\n", "\n", "- **Binning and Aggregation**: which let you control aspects of the data representation within Altair.\n", "\n", "With a good understanding of these core pieces, you will be well on your way to making a variety of charts in Altair.\n", "\n", "```bash\n", "python3 -m pip install altair vega_datasets\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll start by importing Altair:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import altair as alt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A Basic Altair Chart\n", "\n", "The essential elements of an Altair chart are the **data**, the **mark**, and the **encoding**.\n", "\n", "The format by which these are specified will look something like this:\n", "\n", "```python\n", "alt.Chart(data).mark_point().encode(\n", " encoding_1='column_1',\n", " encoding_2='column_2',\n", " # etc.\n", ")\n", "```\n", "\n", "Let's take a look at these pieces, one at a time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Data\n", "\n", "Data in Altair is built around the [Pandas Dataframe](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).\n", "For this section, we'll use the cars dataset that we saw before, which we can load using the [vega_datasets](https://github.com/altair-viz/vega_datasets) package:\n", "\n", "```\n", "pip install vega-datasets\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameMiles_per_GallonCylindersDisplacementHorsepowerWeight_in_lbsAccelerationYearOrigin
0chevrolet chevelle malibu18.08307.0130.0350412.01970-01-01USA
1buick skylark 32015.08350.0165.0369311.51970-01-01USA
2plymouth satellite18.08318.0150.0343611.01970-01-01USA
3amc rebel sst16.08304.0150.0343312.01970-01-01USA
4ford torino17.08302.0140.0344910.51970-01-01USA
..............................
401ford mustang gl27.04140.086.0279015.61982-01-01USA
402vw pickup44.0497.052.0213024.61982-01-01Europe
403dodge rampage32.04135.084.0229511.61982-01-01USA
404ford ranger28.04120.079.0262518.61982-01-01USA
405chevy s-1031.04119.082.0272019.41982-01-01USA
\n", "

406 rows × 9 columns

\n", "
" ], "text/plain": [ " Name Miles_per_Gallon Cylinders Displacement \n", "0 chevrolet chevelle malibu 18.0 8 307.0 \\\n", "1 buick skylark 320 15.0 8 350.0 \n", "2 plymouth satellite 18.0 8 318.0 \n", "3 amc rebel sst 16.0 8 304.0 \n", "4 ford torino 17.0 8 302.0 \n", ".. ... ... ... ... \n", "401 ford mustang gl 27.0 4 140.0 \n", "402 vw pickup 44.0 4 97.0 \n", "403 dodge rampage 32.0 4 135.0 \n", "404 ford ranger 28.0 4 120.0 \n", "405 chevy s-10 31.0 4 119.0 \n", "\n", " Horsepower Weight_in_lbs Acceleration Year Origin \n", "0 130.0 3504 12.0 1970-01-01 USA \n", "1 165.0 3693 11.5 1970-01-01 USA \n", "2 150.0 3436 11.0 1970-01-01 USA \n", "3 150.0 3433 12.0 1970-01-01 USA \n", "4 140.0 3449 10.5 1970-01-01 USA \n", ".. ... ... ... ... ... \n", "401 86.0 2790 15.6 1982-01-01 USA \n", "402 52.0 2130 24.6 1982-01-01 Europe \n", "403 84.0 2295 11.6 1982-01-01 USA \n", "404 79.0 2625 18.6 1982-01-01 USA \n", "405 82.0 2720 19.4 1982-01-01 USA \n", "\n", "[406 rows x 9 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from vega_datasets import data\n", "\n", "cars = data.cars()\n", "\n", "cars" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data in Altair is expected to be in a [tidy format](http://vita.had.co.nz/papers/tidy-data.html); in other words:\n", "\n", "- each **row** is an observation\n", "- each **column** is a variable\n", "\n", "See [Altair's Data Documentation](https://altair-viz.github.io/user_guide/data.html) for more information." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The *Chart* object\n", "\n", "With the data defined, you can instantiate Altair's fundamental object, the ``Chart``. Fundamentally, a ``Chart`` is an object which knows how to emit a JSON dictionary representing the data and visualization encodings, which can be sent to the notebook and rendered by the Vega-Lite JavaScript library.\n", "Let's take a look at what this JSON representation looks like, using only the first row of the data:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameMiles_per_GallonCylindersDisplacementHorsepowerWeight_in_lbsAccelerationYearOrigin
0chevrolet chevelle malibu18.08307.0130.0350412.01970-01-01USA
\n", "
" ], "text/plain": [ " Name Miles_per_Gallon Cylinders Displacement \n", "0 chevrolet chevelle malibu 18.0 8 307.0 \\\n", "\n", " Horsepower Weight_in_lbs Acceleration Year Origin \n", "0 130.0 3504 12.0 1970-01-01 USA " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cars1 = cars.iloc[:1]\n", "cars1" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}},\n", " 'data': {'name': 'data-36a712fbaefa4d20aa0b32e160cfd83a'},\n", " 'mark': {'type': 'point'},\n", " '$schema': 'https://vega.github.io/schema/vega-lite/v5.8.0.json',\n", " 'datasets': {'data-36a712fbaefa4d20aa0b32e160cfd83a': [{'Name': 'chevrolet chevelle malibu',\n", " 'Miles_per_Gallon': 18.0,\n", " 'Cylinders': 8,\n", " 'Displacement': 307.0,\n", " 'Horsepower': 130.0,\n", " 'Weight_in_lbs': 3504,\n", " 'Acceleration': 12.0,\n", " 'Year': '1970-01-01T00:00:00',\n", " 'Origin': 'USA'}]}}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars1).mark_point().to_dict()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point the chart includes a JSON-formatted representation of the dataframe, what type of mark to use, along with some metadata that is included in every chart output." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Mark\n", "\n", "We can decide what sort of *mark* we would like to use to represent our data.\n", "In the previous example, we can choose the ``point`` mark to represent each data as a point on the plot:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars).mark_point()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is a visualization with one point per row in the data, though it is not a particularly interesting: all the points are stacked right on top of each other!\n", "\n", "It is useful to again examine the JSON output here:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}},\n", " 'data': {'name': 'data-36a712fbaefa4d20aa0b32e160cfd83a'},\n", " 'mark': {'type': 'point'},\n", " '$schema': 'https://vega.github.io/schema/vega-lite/v5.8.0.json',\n", " 'datasets': {'data-36a712fbaefa4d20aa0b32e160cfd83a': [{'Name': 'chevrolet chevelle malibu',\n", " 'Miles_per_Gallon': 18.0,\n", " 'Cylinders': 8,\n", " 'Displacement': 307.0,\n", " 'Horsepower': 130.0,\n", " 'Weight_in_lbs': 3504,\n", " 'Acceleration': 12.0,\n", " 'Year': '1970-01-01T00:00:00',\n", " 'Origin': 'USA'}]}}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars1).mark_point().to_dict()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that now in addition to the data, the specification includes information about the mark type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a number of available marks that you can use; some of the more common are the following:\n", "\n", "* ``mark_point()`` \n", "* ``mark_circle()``\n", "* ``mark_square()``\n", "* ``mark_line()``\n", "* ``mark_area()``\n", "* ``mark_bar()``\n", "* ``mark_tick()``\n", "\n", "You can get a complete list of ``mark_*`` methods using Jupyter's tab-completion feature: in any cell just type:\n", "\n", " alt.Chart.mark_\n", " \n", "followed by the tab key to see the available options." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Encodings\n", "\n", "The next step is to add *visual encoding channels* (or *encodings* for short) to the chart. An encoding channel specifies how a given data column should be mapped onto the visual properties of the visualization.\n", "Some of the more frequenty used visual encodings are listed here:\n", "\n", "* ``x``: x-axis value\n", "* ``y``: y-axis value\n", "* ``color``: color of the mark\n", "* ``opacity``: transparency/opacity of the mark\n", "* ``shape``: shape of the mark\n", "* ``size``: size of the mark\n", "* ``row``: row within a grid of facet plots\n", "* ``column``: column within a grid of facet plots\n", "\n", "For a complete list of these encodings, see the [Encodings](https://altair-viz.github.io/user_guide/encoding.html) section of the documentation.\n", "\n", "Visual encodings can be created with the `encode()` method of the `Chart` object. For example, we can start by mapping the `y` axis of the chart to the `Origin` column:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars).mark_point().encode(y=\"Origin\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is a one-dimensional visualization representing the values taken on by `Origin`, with the points in each category on top of each other.\n", "As above, we can view the JSON data generated for this visualization:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}},\n", " 'data': {'name': 'data-36a712fbaefa4d20aa0b32e160cfd83a'},\n", " 'mark': {'type': 'point'},\n", " 'encoding': {'x': {'field': 'Origin', 'type': 'nominal'}},\n", " '$schema': 'https://vega.github.io/schema/vega-lite/v5.8.0.json',\n", " 'datasets': {'data-36a712fbaefa4d20aa0b32e160cfd83a': [{'Name': 'chevrolet chevelle malibu',\n", " 'Miles_per_Gallon': 18.0,\n", " 'Cylinders': 8,\n", " 'Displacement': 307.0,\n", " 'Horsepower': 130.0,\n", " 'Weight_in_lbs': 3504,\n", " 'Acceleration': 12.0,\n", " 'Year': '1970-01-01T00:00:00',\n", " 'Origin': 'USA'}]}}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars1).mark_point().encode(x=\"Origin\").to_dict()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is the same as above with the addition of the `'encoding'` key, which specifies the visualization channel (`y`), the name of the field (`Origin`), and the type of the variable (`nominal`).\n", "We'll discuss these data types in a moment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualization can be made more interesting by adding another channel to the encoding: let's encode the `Miles_per_Gallon` as the `x` position:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars).mark_point().encode(\n", " x=\"Miles_per_Gallon\",\n", " y=\"Origin\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can add as many encodings as you wish, with each encoding mapped to a column in the data.\n", "For example, here we will color the points by *Origin*, and plot *Miles_per_gallon* vs *Year*:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars).mark_point().encode(\n", " x=\"Year\",\n", " y=\"Miles_per_Gallon\",\n", " color=\"Origin\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Excercise: Exploring Data\n", "\n", "Now that you know the basics (Data, encodings, marks) take some time and try making a few plots!\n", "\n", "In particular, I'd suggest trying various combinations of the following:\n", "\n", "- Marks: ``mark_point()``, ``mark_line()``, ``mark_bar()``, ``mark_text()``, ``mark_rect()``...\n", "- Data Columns: ``'Acceleration'``, ``'Cylinders'``, ``'Displacement'``, ``'Horsepower'``, ``'Miles_per_Gallon'``, ``'Name'``, ``'Origin'``, ``'Weight_in_lbs'``, ``'Year'``\n", "- Encodings: ``x``, ``y``, ``color``, ``shape``, ``row``, ``column``, ``opacity``, ``text``, ``tooltip``...\n", "\n", "\n", "Use various combinations of these options, and see what you can learn from the data! In particular, think about the following:\n", "\n", "- Which encodings go well with continuous, quantitative values?\n", "- Which encodings go well with discrete, categorical (i.e. nominal) values?\n", "\n", "If you want a prompt, try to answer some specific questions:\n", "\n", "- Can you visualize how miles per gallon relates to properties such as displacement, horsepower, cylinders?\n", "- How much information can you fit in one chart with different encodings?\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name object\n", "Miles_per_Gallon float64\n", "Cylinders int64\n", "Displacement float64\n", "Horsepower float64\n", "Weight_in_lbs int64\n", "Acceleration float64\n", "Year datetime64[ns]\n", "Origin object\n", "dtype: object" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cars.dtypes" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars).mark_point().encode(\n", " x=\"Acceleration\",\n", " y=\"Miles_per_Gallon\",\n", " color=\"Cylinders:N\",\n", " size=\"Weight_in_lbs\",\n", " shape=\"Origin\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Encoding Types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the central ideas of Altair is that the library will **choose good defaults for your data type**.\n", "\n", "The basic data types supported by Altair are as follows:\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Data TypeCodeDescription
quantitativeQNumerical quantity (real-valued)
nominalNName / Unordered categorical
ordinalOOrdered categorial
temporalTDate/time
\n", "\n", "When you specify data as a pandas dataframe, these types are **automatically determined** by Altair.\n", "\n", "When you specify data as a URL, you must **manually specify** data types for each of your columns." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at a simple plot containing three of the columns from the cars data:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars).mark_tick().encode(\n", " x=\"Miles_per_Gallon\", y=\"Origin\", color=\"Cylinders:O\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Questions:\n", "\n", "- what data type best goes with ``Miles_per_Gallon``?\n", "- what data type best goes with ``Origin``?\n", "- what data type best goes with ``Cylinders``?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's add the shorthands for each of these data types to our specification, using the one-letter codes above\n", "(for example, change ``\"Miles_per_Gallon\"`` to ``\"Miles_per_Gallon:Q\"`` to explicitly specify that it is a quantitative type):" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars).mark_tick().encode(\n", " x=\"Miles_per_Gallon:Q\",\n", " color=\"Origin:N\",\n", " y=\"Cylinders:O\",\n", ")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameMiles_per_GallonCylindersDisplacementHorsepowerWeight_in_lbsAccelerationYearOrigin
78mazda rx2 coupe19.0370.097.0233013.51972-01-01Japan
118maxda rx318.0370.090.0212413.51973-01-01Japan
250mazda rx-421.5380.0110.0272013.51977-01-01Japan
341mazda rx-7 gs23.7370.0100.0242012.51980-01-01Japan
\n", "
" ], "text/plain": [ " Name Miles_per_Gallon Cylinders Displacement Horsepower \n", "78 mazda rx2 coupe 19.0 3 70.0 97.0 \\\n", "118 maxda rx3 18.0 3 70.0 90.0 \n", "250 mazda rx-4 21.5 3 80.0 110.0 \n", "341 mazda rx-7 gs 23.7 3 70.0 100.0 \n", "\n", " Weight_in_lbs Acceleration Year Origin \n", "78 2330 13.5 1972-01-01 Japan \n", "118 2124 13.5 1973-01-01 Japan \n", "250 2720 13.5 1977-01-01 Japan \n", "341 2420 12.5 1980-01-01 Japan " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cars[cars.Cylinders == 3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how if we change the data type for ``'Cylinders'`` to ordinal the plot changes.\n", "\n", "As you use Altair, it is useful to get into the habit of always specifying these types explicitly, because this is *mandatory* when working with data loaded from a file or a URL." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise: Adding Explicit Types\n", "\n", "Following are a few simple charts made with the cars dataset. For each one, try to add explicit types to the encodings (i.e. change ``\"Horsepower\"`` to ``\"Horsepower:Q\"`` so that the plot doesn't change.\n", "\n", "Are there any plots that can be made better by changing the type?" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars).mark_bar().encode(\n", " y=\"Origin:N\",\n", " x=\"mean(Horsepower):Q\",\n", ")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars).mark_line().encode(\n", " x=\"Year:T\",\n", " y=\"mean(Miles_per_Gallon):Q\",\n", " color=\"Origin:N\",\n", ")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars).mark_bar().encode(\n", " y=\"Cylinders:O\",\n", " x=\"count():Q\",\n", " color=\"Origin:N\",\n", ")" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(cars).mark_rect().encode(\n", " x=\"Cylinders:O\",\n", " y=\"Origin:N\",\n", " color=\"count():Q\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Back to weather\n", "\n", "We can load our forecasting data and visualize it again with altair" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "tags": [] }, "outputs": [], "source": [ "from forecasting import city_forecast" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timeair_pressure_at_sea_levelair_temperaturecloud_area_fractionrelative_humiditywind_from_directionwind_speednext_12_hours_symbol_codenext_1_hours_symbol_codenext_1_hours_precipitation_amountnext_6_hours_symbol_codenext_6_hours_precipitation_amountcity
02023-10-25 07:00:00+00:001015.52.925.478.947.14.2partlycloudyfair0.0partlycloudy0.0Oslo
12023-10-25 08:00:00+00:001015.43.316.876.845.64.8partlycloudyfair0.0partlycloudy0.0Oslo
22023-10-25 09:00:00+00:001015.23.911.673.147.74.6partlycloudyclearsky0.0partlycloudy0.0Oslo
32023-10-25 10:00:00+00:001015.14.630.470.949.84.5partlycloudyfair0.0partlycloudy0.0Oslo
42023-10-25 11:00:00+00:001014.45.359.368.046.65.0partlycloudypartlycloudy0.0partlycloudy0.0Oslo
..........................................
802023-11-03 06:00:00+00:001001.93.6100.070.9128.24.2cloudyNaNNaNcloudy0.0Trondheim
812023-11-03 12:00:00+00:00994.95.698.867.3116.64.2cloudyNaNNaNcloudy0.0Trondheim
822023-11-03 18:00:00+00:00997.64.595.774.1129.14.2cloudyNaNNaNcloudy0.0Trondheim
832023-11-04 00:00:00+00:00995.84.3100.074.1135.34.1NaNNaNNaNcloudy0.0Trondheim
842023-11-04 06:00:00+00:00995.03.8100.076.2120.34.1NaNNaNNaNNaNNaNTrondheim
\n", "

340 rows × 13 columns

\n", "
" ], "text/plain": [ " time air_pressure_at_sea_level air_temperature \n", "0 2023-10-25 07:00:00+00:00 1015.5 2.9 \\\n", "1 2023-10-25 08:00:00+00:00 1015.4 3.3 \n", "2 2023-10-25 09:00:00+00:00 1015.2 3.9 \n", "3 2023-10-25 10:00:00+00:00 1015.1 4.6 \n", "4 2023-10-25 11:00:00+00:00 1014.4 5.3 \n", ".. ... ... ... \n", "80 2023-11-03 06:00:00+00:00 1001.9 3.6 \n", "81 2023-11-03 12:00:00+00:00 994.9 5.6 \n", "82 2023-11-03 18:00:00+00:00 997.6 4.5 \n", "83 2023-11-04 00:00:00+00:00 995.8 4.3 \n", "84 2023-11-04 06:00:00+00:00 995.0 3.8 \n", "\n", " cloud_area_fraction relative_humidity wind_from_direction wind_speed \n", "0 25.4 78.9 47.1 4.2 \\\n", "1 16.8 76.8 45.6 4.8 \n", "2 11.6 73.1 47.7 4.6 \n", "3 30.4 70.9 49.8 4.5 \n", "4 59.3 68.0 46.6 5.0 \n", ".. ... ... ... ... \n", "80 100.0 70.9 128.2 4.2 \n", "81 98.8 67.3 116.6 4.2 \n", "82 95.7 74.1 129.1 4.2 \n", "83 100.0 74.1 135.3 4.1 \n", "84 100.0 76.2 120.3 4.1 \n", "\n", " next_12_hours_symbol_code next_1_hours_symbol_code \n", "0 partlycloudy fair \\\n", "1 partlycloudy fair \n", "2 partlycloudy clearsky \n", "3 partlycloudy fair \n", "4 partlycloudy partlycloudy \n", ".. ... ... \n", "80 cloudy NaN \n", "81 cloudy NaN \n", "82 cloudy NaN \n", "83 NaN NaN \n", "84 NaN NaN \n", "\n", " next_1_hours_precipitation_amount next_6_hours_symbol_code \n", "0 0.0 partlycloudy \\\n", "1 0.0 partlycloudy \n", "2 0.0 partlycloudy \n", "3 0.0 partlycloudy \n", "4 0.0 partlycloudy \n", ".. ... ... \n", "80 NaN cloudy \n", "81 NaN cloudy \n", "82 NaN cloudy \n", "83 NaN cloudy \n", "84 NaN NaN \n", "\n", " next_6_hours_precipitation_amount city \n", "0 0.0 Oslo \n", "1 0.0 Oslo \n", "2 0.0 Oslo \n", "3 0.0 Oslo \n", "4 0.0 Oslo \n", ".. ... ... \n", "80 0.0 Trondheim \n", "81 0.0 Trondheim \n", "82 0.0 Trondheim \n", "83 0.0 Trondheim \n", "84 NaN Trondheim \n", "\n", "[340 rows x 13 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "forecasts = pd.concat(\n", " city_forecast(city) for city in (\"Oslo\", \"Bergen\", \" Tromsø\", \"Trondheim\")\n", ")\n", "forecasts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recalling our columns and data types" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "time datetime64[ns, UTC]\n", "air_pressure_at_sea_level float64\n", "air_temperature float64\n", "cloud_area_fraction float64\n", "relative_humidity float64\n", "wind_from_direction float64\n", "wind_speed float64\n", "next_12_hours_symbol_code object\n", "next_1_hours_symbol_code object\n", "next_1_hours_precipitation_amount float64\n", "next_6_hours_symbol_code object\n", "next_6_hours_precipitation_amount float64\n", "city object\n", "dtype: object" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "forecasts.dtypes" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(forecasts).mark_line().encode(\n", " x=\"time\",\n", " y=\"air_temperature\",\n", " color=\"city\",\n", ")" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "short_forecast = forecasts.dropna(subset=[\"next_1_hours_symbol_code\"])\n", "alt.Chart(short_forecast).mark_point().encode(\n", " x=\"time\",\n", " y=\"air_temperature\",\n", " color=\"city\",\n", " shape=\"next_1_hours_symbol_code\",\n", ")" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(forecasts).mark_bar().encode(\n", " x=\"next_6_hours_symbol_code\",\n", " color=\"city\",\n", " y=\"count()\",\n", ")" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.FacetChart(...)" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(forecasts).mark_bar().encode(\n", " x=\"next_6_hours_symbol_code\",\n", " color=\"city\",\n", " y=\"count()\",\n", ").facet(\n", " facet=\"city\",\n", " columns=2,\n", ")" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.FacetChart(...)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(short_forecast).mark_bar().encode(\n", " x=\"time\",\n", " y=\"next_1_hours_precipitation_amount\",\n", " color=\"next_1_hours_symbol_code\",\n", ").facet(facet=\"city:N\", columns=2)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.FacetChart(...)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(short_forecast).mark_point().encode(\n", " x=\"time\",\n", " y=\"next_1_hours_precipitation_amount\",\n", " color=\"next_1_hours_symbol_code\",\n", " shape=\"next_1_hours_symbol_code\",\n", ").facet(facet=\"city:N\", columns=2)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(forecasts).mark_point().encode(\n", " x=\"cloud_area_fraction\",\n", " y=\"next_6_hours_symbol_code\",\n", " shape=\"city\",\n", " color=\"next_6_hours_precipitation_amount\",\n", " # shape=\"next_1_hours_symbol_code\",\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }