Skip to content
Snippets Groups Projects
Commit 28858e97 authored by wgallard's avatar wgallard
Browse files

changed css

parent 8871e2a8
No related branches found
No related tags found
1 merge request!4Visualization 2018
%% Cell type:code id: tags:
``` python
from IPython.display import HTML
def css_styling():
styles = open('../styles/custom.css', 'r').read()
styles = open('styles/custom.css', 'r').read()
return HTML('<style>' + styles + '</style>')
css_styling()
```
%% Output
<IPython.core.display.HTML object>
%% Cell type:markdown id: tags:
# Data Visualization with Python
## Part 1: Python + Matplotlib
### [Guy Allard](mailto://w.g.allard@lumc.nl)
%% Cell type:markdown id: tags:
# Matplotlib
- Plotting library for Python
- High quality figures suitable for publication
- Integrates with IPython, Jupyter and NumPy (in PyLab mode)
- Established and robust
- Large community / user base
%% Cell type:markdown id: tags:
# Interfaces
1. Object Oriented
- Best for larger development projects
- Have to keep track of figures and axes
- Steep learning curve
<br><br>
2. Pyplot State Machine
- For interactive plotting
- Takes care of many housekeeping tasks
- Easier to learn than the OO interface
<br><br>
3. Pylab
- Modelled on matlab
- Imports common modules
- Handles most housekeeping tasks
- Easiest to learn
- The one we will be using!
%% Cell type:markdown id: tags:
# Interfaces Example
1. Object-oriented interface
```python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0, 10, 0.2)
y = np.sin(x)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y)
```
2. State-machine environment (pyplot)
```python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0, 10, 0.2)
y = np.sin(x)
plt.plot(x, y)
```
3. PyLab mode
```python
%pylab
x = arange(0, 10, 0.2)
y = sin(x)
plot(x, y)
```
%% Cell type:markdown id: tags:
# Getting help
Consult the built-in documentation, for example:
```
>>> help(subplot)
Help on function subplot in module matplotlib.pyplot:
subplot(*args, **kwargs)
Return a subplot axes positioned by the given grid definition.
...
```
%% Cell type:markdown id: tags:
# Useful Resources
- Matplotlib Homepage
- https://matplotlib.org/
<br><br>
- Gallery
- https://matplotlib.org/gallery.html
- Many examples with source code
<br><br>
- Online documentation
- https://matplotlib.org/contents.html
- Full API documentation
%% Cell type:markdown id: tags:
# First Steps
## Preparing the Jupyter Notebook
1. Open a new Jupyter Notebook
2. Run this code in the first empty cell:
```
%pylab inline
```
3. Now any pylab plotting commands will display in the notebook
%% Cell type:code id: tags:
``` python
%pylab inline
```
%% Output
Populating the interactive namespace from numpy and matplotlib
%% Cell type:markdown id: tags:
# Grab some data
Use Pandas to load a dataset which contains population data for four countries
%% Cell type:code id: tags:
``` python
import pandas as pd
populations = pd.read_csv(
'https://git.lumc.nl/courses/programming-course/raw/visualization-2018/visualization/data/populations.csv'
)
```
%% Cell type:markdown id: tags:
Take a quick look at the data
%% Cell type:code id: tags:
``` python
populations.head()
```
%% Output
Year Belgium Denmark Netherlands Sweden
0 1950 8.63930 4.28135 10.11365 7.01660
1 1951 8.67820 4.30370 10.26440 7.07040
2 1952 8.73040 4.33380 10.38210 7.12445
3 1953 8.77775 4.36930 10.49300 7.17145
4 1954 8.81940 4.40570 10.61535 7.21360
%% Cell type:markdown id: tags:
# Plot it!
Let's make a plot the population of the Netherlands on the y-axis, and the year on the x-axis
%% Cell type:code id: tags:
``` python
plot(populations['Year'], populations['Netherlands']);
```
%% Output
%% Cell type:markdown id: tags:
# Add titles and label the axes
%% Cell type:code id: tags:
``` python
plot(populations['Year'], populations['Netherlands'])
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)');
```
%% Output
%% Cell type:markdown id: tags:
# Change some properties of the line
How about a 5px thick orange line?
%% Cell type:code id: tags:
``` python
plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange')
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)');
```
%% Output
%% Cell type:markdown id: tags:
# Change some properties of the x-axis
Label at five-year intervals
Display the label vertically
%% Cell type:code id: tags:
``` python
plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange')
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90);
```
%% Output
%% Cell type:markdown id: tags:
# Change which years are displayed
%% Cell type:code id: tags:
``` python
plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange')
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90)
xlim(1970, 1990);
```
%% Output
%% Cell type:markdown id: tags:
# Change the y-axis scale
%% Cell type:code id: tags:
``` python
plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange')
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90)
xlim(1970, 1990)
ylim(13,15);
```
%% Output
%% Cell type:markdown id: tags:
# Clean up the number formatting on the y-axis
Integer tick labels
%% Cell type:code id: tags:
``` python
plot(populations['Year'], populations['Netherlands'],
linewidth=5, color='orange')
title('Historical Population of The Netherlands')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90)
xlim(1970, 1990)
ylim(13,15)
yticks(range(13,16));
```
%% Output
%% Cell type:markdown id: tags:
# Plot multiple series
Calling **plot** multiple times within the same cell will add multiple series to the chart
Let's compare the Dutch with the Danes
%% Cell type:code id: tags:
``` python
plot(populations['Year'], populations['Netherlands'], color='orange')
plot(populations['Year'], populations['Denmark'], color='red')
title('Historical Populations of The Netherlands and Denmark')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90);
```
%% Output
%% Cell type:markdown id: tags:
# Add a legend
1. Give each plotted line a label
2. Add a legend to the figure
%% Cell type:code id: tags:
``` python
plot(populations['Year'], populations['Netherlands'],
color='orange', label='The Netherlands')
plot(populations['Year'], populations['Denmark'],
color='red', label='Denmark')
legend(loc='upper left')
title('Historical Populations of The Netherlands and Denmark')
xlabel('Year')
ylabel('Population (Millions)')
xticks(range(1950, 2016, 5), rotation=90);
```
%% Output
%% Cell type:markdown id: tags:
# Other plot types
Let's load a different dataset and take a look at some different plot types
%% Cell type:code id: tags:
``` python
flowers = pd.read_csv('https://git.lumc.nl/courses/programming-course/raw/visualization-2018/visualization/data/iris.csv')
flowers.head()
```
%% Output
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
%% Cell type:markdown id: tags:
# Boxplots
A simple boxplot of the sepal-length distribution
%% Cell type:code id: tags:
``` python
boxplot(flowers['sepal_length'], labels=['Sepal_length']);
```
%% Output
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
# Boxplots
Distributions of multiple features
%% Cell type:code id: tags:
``` python
# make a list containing the numeric feature column names
features = list(flowers.columns[:-1])
features
```
%% Output
['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
%% Cell type:code id: tags:
``` python
# plot the data
boxplot([flowers[f] for f in features], labels=features);
```
%% Output
%% Cell type:markdown id: tags:
# Controlling the size of the plot
Let's change the shape of the boxplot
%% Cell type:code id: tags:
``` python
# make the figure 10 'units' wide and 5 'units' high
figsize(10, 5)
# plot the data
boxplot([flowers[f] for f in features], labels=features);
```
%% Output
%% Cell type:code id: tags:
``` python
figsize(7,4)
```
%% Cell type:markdown id: tags:
# Histogram
%% Cell type:code id: tags:
``` python
hist(flowers['petal_length'])
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
```
%% Output
%% Cell type:markdown id: tags:
# Histogram
change the number of 'bins'
%% Cell type:code id: tags:
``` python
hist(flowers['petal_length'], bins=20)
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
```
%% Output
%% Cell type:markdown id: tags:
# Histogram
Some formatting
- Lines around the bars
- color
%% Cell type:code id: tags:
``` python
hist(flowers['petal_length'], bins=20, facecolor='teal', edgecolor='black', alpha=0.7)
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
```
%% Output
%% Cell type:markdown id: tags:
# Subplots
Separate plots with their own axes within a single figure
The syntax can be confusing!
%% Cell type:code id: tags:
``` python
for i in range(1, 5):
subplot(2, 2, i)
xticks([]), yticks([])
text(0.5, 0.5, 'subplot(2, 2, %d)' % i, ha='center', size=18, alpha=0.75);
```
%% Output
%% Cell type:markdown id: tags:
subplot(2, 2, 1) indicates the first cell of a 2 row x 2 column matrix
subplot(2, 2, 4) indicates the fourth cell of a 2 column x 2 row matrix
%% Cell type:markdown id: tags:
# Subplots
More complicated layouts
%% Cell type:code id: tags:
``` python
subplot(1, 3, 1) # 1 row, 3 columns, cell 1
xticks([]), yticks([])
text(0.5, 0.5, '(1, 3, 1)', ha='center', size=18, alpha=0.75)
subplot(2, 3, 3) # 2 rows, 3 columns, cell 3
xticks([]), yticks([])
text(0.5, 0.5, '(2, 3, 3)', ha='center', size=18, alpha=0.75)
subplot(3, 2, 6) # 3 rows, 2 columns, cell 6
xticks([]), yticks([])
text(0.5, 0.5, '(3, 2, 6)', ha='center', size=18, alpha=0.75)
subplot(3, 3, 5) # 3 rows, 3 columns, cell 5
xticks([]), yticks([])
text(0.5, 0.5, '(3, 3, 5)', ha='center', size=18, alpha=0.75);
```
%% Output
%% Cell type:markdown id: tags:
# Subplots and Boxplots
Compare how the features are distributed by species
%% Cell type:code id: tags:
``` python
species = list(set(flowers.species))
print(species)
```
%% Output
['virginica', 'setosa', 'versicolor']
%% Cell type:code id: tags:
``` python
# make a dataset for each species
setosa = flowers[flowers.species == 'setosa']
versicolor = flowers[flowers.species == 'versicolor']
virginica = flowers[flowers.species == 'virginica']
```
%% Cell type:code id: tags:
``` python
figsize(10, 8)
for cell, feature in enumerate(features):
subplot(2, 2, cell + 1)
boxplot(
[setosa[feature], versicolor[feature], virginica[feature]],
labels=species
)
ylabel(feature)
```
%% Output
%% Cell type:code id: tags:
``` python
figsize(7,4)
```
%% Cell type:markdown id: tags:
# Sketch-style drawing
using xkcd mode
%% Cell type:code id: tags:
``` python
with xkcd():
hist(flowers['petal_length'], bins=20, facecolor='teal', edgecolor='black')
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
```
%% Output
%% Cell type:markdown id: tags:
# Saving to a file
Images can be saved to a file using savefig after the plotting commands:
```
savefig('myplot.pdf')
```
The format of the saved image will be inferred from the given file extension.
%% Cell type:markdown id: tags:
# The End
This lesson was based on previous work by [Jeroen Laros](mailto://j.f.j.laros@lumc.nl) and Martijn Vermaat
License: [Creative Commons Attribution 3.0 License (CC-by)](http://creativecommons.org/licenses/by/3.0)
%% Cell type:code id: tags:
``` python
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment