Skip to content
Snippets Groups Projects
Commit 3453a921 authored by wgallard's avatar wgallard
Browse files

changed slide layout

parent 7fb4d4ff
No related branches found
No related tags found
1 merge request!4Visualization 2018
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Data Visualization with Python # Data Visualization with Python
## Part 1: Python + Matplotlib ## Part 1: Python + Matplotlib
### [Guy Allard](mailto://w.g.allard@lumc.nl) ### [Guy Allard](mailto://w.g.allard@lumc.nl)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Matplotlib # Matplotlib
- Plotting library for Python - Plotting library for Python
- High quality figures suitable for publication - High quality figures suitable for publication
- Integrates with IPython, Jupyter and NumPy (in PyLab mode) - Integrates with IPython, Jupyter and NumPy (in PyLab mode)
- Established and robust - Established and robust
- Large community / user base - Large community / user base
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Interfaces # Interfaces
1. Object Oriented 1. Object Oriented
- Best for larger development projects - Best for larger development projects
- Have to keep track of figures and axes - Have to keep track of figures and axes
- Steep learning curve - Steep learning curve
<br><br> <br><br>
2. Pyplot State Machine 2. Pyplot State Machine
- For interactive plotting - For interactive plotting
- Takes care of many housekeeping tasks - Takes care of many housekeeping tasks
- Easier to learn than the OO interface - Easier to learn than the OO interface
<br><br> <br><br>
3. Pylab 3. Pylab
- Modelled on matlab - Modelled on matlab
- Imports common modules - Imports common modules
- Handles most housekeeping tasks - Handles most housekeeping tasks
- Easiest to learn - Easiest to learn
- The one we will be using! - The one we will be using!
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Interfaces Example # Interfaces Example
1. Object-oriented interface 1. Object-oriented interface
```python ```python
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
x = np.arange(0, 10, 0.2) x = np.arange(0, 10, 0.2)
y = np.sin(x) y = np.sin(x)
fig = plt.figure() fig = plt.figure()
ax = fig.add_subplot(111) ax = fig.add_subplot(111)
ax.plot(x, y) ax.plot(x, y)
``` ```
2. State-machine environment (pyplot) 2. State-machine environment (pyplot)
```python ```python
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
x = np.arange(0, 10, 0.2) x = np.arange(0, 10, 0.2)
y = np.sin(x) y = np.sin(x)
plt.plot(x, y) plt.plot(x, y)
``` ```
3. PyLab mode 3. PyLab mode
```python ```python
%pylab %pylab
x = arange(0, 10, 0.2) x = arange(0, 10, 0.2)
y = sin(x) y = sin(x)
plot(x, y) plot(x, y)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Getting help # Getting help
Consult the built-in documentation, for example: Consult the built-in documentation, for example:
``` ```
>>> help(subplot) >>> help(subplot)
Help on function subplot in module matplotlib.pyplot: Help on function subplot in module matplotlib.pyplot:
subplot(*args, **kwargs) subplot(*args, **kwargs)
Return a subplot axes positioned by the given grid definition. Return a subplot axes positioned by the given grid definition.
... ...
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Useful Resources # Useful Resources
- Matplotlib Homepage - Matplotlib Homepage
- https://matplotlib.org/ - https://matplotlib.org/
<br><br> <br><br>
- Gallery - Gallery
- https://matplotlib.org/gallery.html - https://matplotlib.org/gallery.html
- Many examples with source code - Many examples with source code
<br><br> <br><br>
- Online documentation - Online documentation
- https://matplotlib.org/contents.html - https://matplotlib.org/contents.html
- Full API documentation - Full API documentation
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# First Steps # First Steps
## Preparing the Jupyter Notebook ## Preparing the Jupyter Notebook
1. Open a new Jupyter Notebook 1. Open a new Jupyter Notebook
2. Run this code in the first empty cell: 2. Run this code in the first empty cell:
``` ```
%pylab inline %pylab inline
``` ```
3. Now any pylab plotting commands will display in the notebook 3. Now any pylab plotting commands will display in the notebook
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
%pylab inline %pylab inline
``` ```
%% Output %% Output
Populating the interactive namespace from numpy and matplotlib Populating the interactive namespace from numpy and matplotlib
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Grab some data # Grab some data
Use Pandas to load a dataset which contains population data for four countries Use Pandas to load a dataset which contains population data for four countries
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import pandas as pd import pandas as pd
populations = pd.read_csv( populations = pd.read_csv(
'https://git.lumc.nl/courses/programming-course/raw/visualization-2018/visualization/data/populations.csv' 'https://git.lumc.nl/courses/programming-course/raw/visualization-2018/visualization/data/populations.csv'
) )
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Take a quick look at the data Take a quick look at the data
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
populations.head() populations.head()
``` ```
%% Output %% Output
Year Belgium Denmark Netherlands Sweden Year Belgium Denmark Netherlands Sweden
0 1950 8.63930 4.28135 10.11365 7.01660 0 1950 8.63930 4.28135 10.11365 7.01660
1 1951 8.67820 4.30370 10.26440 7.07040 1 1951 8.67820 4.30370 10.26440 7.07040
2 1952 8.73040 4.33380 10.38210 7.12445 2 1952 8.73040 4.33380 10.38210 7.12445
3 1953 8.77775 4.36930 10.49300 7.17145 3 1953 8.77775 4.36930 10.49300 7.17145
4 1954 8.81940 4.40570 10.61535 7.21360 4 1954 8.81940 4.40570 10.61535 7.21360
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Plot it! # Plot it!
Let's make a plot the population of the Netherlands on the y-axis, and the year on the x-axis Let's make a plot the population of the Netherlands on the y-axis, and the year on the x-axis
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plot(populations['Year'], populations['Netherlands']); plot(populations['Year'], populations['Netherlands']);
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Add titles and label the axes # Add titles and label the axes
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plot(populations['Year'], populations['Netherlands']) plot(populations['Year'], populations['Netherlands'])
title('Historical Population of The Netherlands') title('Historical Population of The Netherlands')
xlabel('Year') xlabel('Year')
ylabel('Population (Millions)'); ylabel('Population (Millions)');
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Change some properties of the line # Change some properties of the line
How about a 5px thick orange line? How about a 5px thick orange line?
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plot(populations['Year'], populations['Netherlands'], plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange') linewidth=5, color='orange')
title('Historical Population of The Netherlands') title('Historical Population of The Netherlands')
xlabel('Year') xlabel('Year')
ylabel('Population (Millions)'); ylabel('Population (Millions)');
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Change some properties of the x-axis # Change some properties of the x-axis
Label at five-year intervals Label at five-year intervals
Display the label vertically Display the label vertically
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plot(populations['Year'], populations['Netherlands'], plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange') linewidth=5, color='orange')
title('Historical Population of The Netherlands') title('Historical Population of The Netherlands')
xlabel('Year') xlabel('Year')
ylabel('Population (Millions)') ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90); xticks(range(1950, 2016, 5), rotation=90);
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Change which years are displayed # Change which years are displayed
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plot(populations['Year'], populations['Netherlands'], plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange') linewidth=5, color='orange')
title('Historical Population of The Netherlands') title('Historical Population of The Netherlands')
xlabel('Year') xlabel('Year')
ylabel('Population (Millions)') ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90) xticks(range(1950, 2016, 5), rotation=90)
xlim(1970, 1990); xlim(1970, 1990);
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Change the y-axis scale # Change the y-axis scale
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plot(populations['Year'], populations['Netherlands'], plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange') linewidth=5, color='orange')
title('Historical Population of The Netherlands') title('Historical Population of The Netherlands')
xlabel('Year') xlabel('Year')
ylabel('Population (Millions)') ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90) xticks(range(1950, 2016, 5), rotation=90)
xlim(1970, 1990) xlim(1970, 1990)
ylim(13,15); ylim(13,15);
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Clean up the number formatting on the y-axis # Clean up the number formatting on the y-axis
Integer tick labels Integer tick labels
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plot(populations['Year'], populations['Netherlands'], plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange') linewidth=5, color='orange')
title('Historical Population of The Netherlands') title('Historical Population of The Netherlands')
xlabel('Year') xlabel('Year')
ylabel('Population (Millions)') ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90) xticks(range(1950, 2016, 5), rotation=90)
xlim(1970, 1990) xlim(1970, 1990)
ylim(13,15) ylim(13,15)
yticks(range(13,16)); yticks(range(13,16));
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Plot multiple series # Plot multiple series
Calling **plot** multiple times within the same cell will add multiple series to the chart Calling **plot** multiple times within the same cell will add multiple series to the chart
Let's compare the Dutch with the Danes Let's compare the Dutch with the Danes
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plot(populations['Year'], populations['Netherlands'], color='orange') plot(populations['Year'], populations['Netherlands'], color='orange')
plot(populations['Year'], populations['Denmark'], color='red') plot(populations['Year'], populations['Denmark'], color='red')
title('Historical Populations of The Netherlands and Denmark') title('Historical Populations of The Netherlands and Denmark')
xlabel('Year') xlabel('Year')
ylabel('Population (Millions)') ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90); xticks(range(1950, 2016, 5), rotation=90);
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Add a legend # Add a legend
1. Give each plotted line a label 1. Give each plotted line a label
2. Add a legend to the figure 2. Add a legend to the figure
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plot(populations['Year'], populations['Netherlands'], plot(populations['Year'], populations['Netherlands'],
color='orange', label='The Netherlands') color='orange', label='The Netherlands')
plot(populations['Year'], populations['Denmark'], plot(populations['Year'], populations['Denmark'],
color='red', label='Denmark') color='red', label='Denmark')
legend(loc='upper left') legend(loc='upper left')
title('Historical Populations of The Netherlands and Denmark') title('Historical Populations of The Netherlands and Denmark')
xlabel('Year') xlabel('Year')
ylabel('Population (Millions)') ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90); xticks(range(1950, 2016, 5), rotation=90);
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Other plot types # Other plot types
Let's load a different dataset and take a look at some different plot types Let's load a different dataset and take a look at some different plot types
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
flowers = pd.read_csv('https://git.lumc.nl/courses/programming-course/raw/visualization-2018/visualization/data/iris.csv') flowers = pd.read_csv('https://git.lumc.nl/courses/programming-course/raw/visualization-2018/visualization/data/iris.csv')
flowers.head() flowers.head()
``` ```
%% Output %% Output
sepal_length sepal_width petal_length petal_width species sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa 0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Boxplots # Boxplots
A simple boxplot of the sepal-length distribution A simple boxplot of the sepal-length distribution
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
boxplot(flowers['sepal_length'], labels=['Sepal_length']); boxplot(flowers['sepal_length'], labels=['Sepal_length']);
``` ```
%% Output %% Output
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Boxplots # Boxplots
Distributions of multiple features Distributions of multiple features
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# make a list containing the numeric feature column names # make a list containing the numeric feature column names
features = list(flowers.columns[:-1]) features = list(flowers.columns[:-1])
features features
``` ```
%% Output %% Output
['sepal_length', 'sepal_width', 'petal_length', 'petal_width'] ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# plot the data # plot the data
boxplot([flowers[f] for f in features], labels=features); boxplot([flowers[f] for f in features], labels=features);
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Controlling the size of the plot # Controlling the size of the plot
Let's change the shape of the boxplot Let's change the shape of the boxplot
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# make the figure 10 'units' wide and 5 'units' high # make the figure 10 'units' wide and 5 'units' high
figsize(10, 5) figsize(10, 5)
# plot the data # plot the data
boxplot([flowers[f] for f in features], labels=features); boxplot([flowers[f] for f in features], labels=features);
``` ```
%% Output %% Output
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
figsize(7,4) figsize(7,4)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Histogram # Histogram
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
hist(flowers['petal_length']) hist(flowers['petal_length'])
title('Petal Length Distribution') title('Petal Length Distribution')
xlabel('petal length') xlabel('petal length')
ylabel('count'); ylabel('count');
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Histogram # Histogram
change the number of 'bins' change the number of 'bins'
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
hist(flowers['petal_length'], bins=20) hist(flowers['petal_length'], bins=20)
title('Petal Length Distribution') title('Petal Length Distribution')
xlabel('petal length') xlabel('petal length')
ylabel('count'); ylabel('count');
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Histogram # Histogram
Some formatting Some formatting
- Lines around the bars - Lines around the bars
- color - color
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
hist(flowers['petal_length'], bins=20, facecolor='teal', edgecolor='black', alpha=0.7) hist(flowers['petal_length'], bins=20, facecolor='teal', edgecolor='black', alpha=0.7)
title('Petal Length Distribution') title('Petal Length Distribution')
xlabel('petal length') xlabel('petal length')
ylabel('count'); ylabel('count');
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Subplots # Subplots
Separate plots with their own axes within a single figure Separate plots with their own axes within a single figure
The syntax can be confusing! The syntax can be confusing!
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
for i in range(1, 5): for i in range(1, 5):
subplot(2, 2, i) subplot(2, 2, i)
xticks([]), yticks([]) xticks([]), yticks([])
text(0.5, 0.5, 'subplot(2, 2, %d)' % i, ha='center', size=18, alpha=0.75); text(0.5, 0.5, 'subplot(2, 2, %d)' % i, ha='center', size=18, alpha=0.75);
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
subplot(2, 2, 1) indicates the first cell of a 2 row x 2 column matrix subplot(2, 2, 1) indicates the first cell of a 2 row x 2 column matrix
subplot(2, 2, 4) indicates the fourth cell of a 2 column x 2 row matrix subplot(2, 2, 4) indicates the fourth cell of a 2 column x 2 row matrix
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Subplots # Subplots
More complicated layouts More complicated layouts
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
subplot(1, 3, 1) # 1 row, 3 columns, cell 1 subplot(1, 3, 1) # 1 row, 3 columns, cell 1
xticks([]), yticks([]) xticks([]), yticks([])
text(0.5, 0.5, '(1, 3, 1)', ha='center', size=18, alpha=0.75) text(0.5, 0.5, '(1, 3, 1)', ha='center', size=18, alpha=0.75)
subplot(2, 3, 3) # 2 rows, 3 columns, cell 3 subplot(2, 3, 3) # 2 rows, 3 columns, cell 3
xticks([]), yticks([]) xticks([]), yticks([])
text(0.5, 0.5, '(2, 3, 3)', ha='center', size=18, alpha=0.75) text(0.5, 0.5, '(2, 3, 3)', ha='center', size=18, alpha=0.75)
subplot(3, 2, 6) # 3 rows, 2 columns, cell 6 subplot(3, 2, 6) # 3 rows, 2 columns, cell 6
xticks([]), yticks([]) xticks([]), yticks([])
text(0.5, 0.5, '(3, 2, 6)', ha='center', size=18, alpha=0.75) text(0.5, 0.5, '(3, 2, 6)', ha='center', size=18, alpha=0.75)
subplot(3, 3, 5) # 3 rows, 3 columns, cell 5 subplot(3, 3, 5) # 3 rows, 3 columns, cell 5
xticks([]), yticks([]) xticks([]), yticks([])
text(0.5, 0.5, '(3, 3, 5)', ha='center', size=18, alpha=0.75); text(0.5, 0.5, '(3, 3, 5)', ha='center', size=18, alpha=0.75);
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Subplots and Boxplots # Subplots and Boxplots
Compare how the features are distributed by species Compare how the features are distributed by species
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
species = list(set(flowers.species)) species = list(set(flowers.species))
print(species) print(species)
``` ```
%% Output %% Output
['virginica', 'setosa', 'versicolor'] ['virginica', 'setosa', 'versicolor']
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# make a dataset for each species # make a dataset for each species
setosa = flowers[flowers.species == 'setosa'] setosa = flowers[flowers.species == 'setosa']
versicolor = flowers[flowers.species == 'versicolor'] versicolor = flowers[flowers.species == 'versicolor']
virginica = flowers[flowers.species == 'virginica'] virginica = flowers[flowers.species == 'virginica']
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
figsize(10, 8) figsize(10, 8)
for cell, feature in enumerate(features): for cell, feature in enumerate(features):
subplot(2, 2, cell + 1) subplot(2, 2, cell + 1)
boxplot( boxplot(
[setosa[feature], versicolor[feature], virginica[feature]], [setosa[feature], versicolor[feature], virginica[feature]],
labels=species labels=species
) )
ylabel(feature) ylabel(feature)
``` ```
%% Output %% Output
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
figsize(7,4) figsize(7,4)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Sketch-style drawing # Sketch-style drawing
using xkcd mode using xkcd mode
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
with xkcd(): with xkcd():
hist(flowers['petal_length'], bins=20, facecolor='teal', edgecolor='black') hist(flowers['petal_length'], bins=20, facecolor='teal', edgecolor='black')
title('Petal Length Distribution') title('Petal Length Distribution')
xlabel('petal length') xlabel('petal length')
ylabel('count'); ylabel('count');
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Saving to a file # Saving to a file
Images can be saved to a file using savefig after the plotting commands: Images can be saved to a file using savefig after the plotting commands:
``` ```
savefig('myplot.pdf') savefig('myplot.pdf')
``` ```
The format of the saved image will be inferred from the given file extension. The format of the saved image will be inferred from the given file extension.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# The End # The End
This lesson was based on previous work by [Jeroen Laros](mailto://j.f.j.laros@lumc.nl) and Martijn Vermaat This lesson was based on previous work by [Jeroen Laros](mailto://j.f.j.laros@lumc.nl) and Martijn Vermaat
License: [Creative Commons Attribution 3.0 License (CC-by)](http://creativecommons.org/licenses/by/3.0) License: [Creative Commons Attribution 3.0 License (CC-by)](http://creativecommons.org/licenses/by/3.0)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment