Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
P
Programming course
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Deploy
Releases
Model registry
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
courses
Programming course
Commits
28858e97
Commit
28858e97
authored
6 years ago
by
wgallard
Browse files
Options
Downloads
Patches
Plain Diff
changed css
parent
8871e2a8
No related branches found
Branches containing commit
No related tags found
1 merge request
!4
Visualization 2018
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
visualization/DataVisualization1.ipynb
+1
-1
1 addition, 1 deletion
visualization/DataVisualization1.ipynb
with
1 addition
and
1 deletion
visualization/DataVisualization1.ipynb
+
1
−
1
View file @
28858e97
...
...
@@ -49,7 +49,7 @@
"source": [
"from IPython.display import HTML\n",
"def css_styling():\n",
" styles = open('
../
styles/custom.css', 'r').read()\n",
" styles = open('styles/custom.css', 'r').read()\n",
" return HTML('<style>' + styles + '</style>')\n",
"css_styling()"
]
...
...
%% Cell type:code id: tags:
```
python
from
IPython.display
import
HTML
def
css_styling
():
styles
=
open
(
'
../
styles/custom.css
'
,
'
r
'
).
read
()
styles
=
open
(
'
styles/custom.css
'
,
'
r
'
).
read
()
return
HTML
(
'
<style>
'
+
styles
+
'
</style>
'
)
css_styling
()
```
%% Output
<IPython.core.display.HTML object>
%% Cell type:markdown id: tags:
# Data Visualization with Python
## Part 1: Python + Matplotlib
### [Guy Allard](mailto://w.g.allard@lumc.nl)
%% Cell type:markdown id: tags:
# Matplotlib
-
Plotting library for Python
-
High quality figures suitable for publication
-
Integrates with IPython, Jupyter and NumPy (in PyLab mode)
-
Established and robust
-
Large community / user base
%% Cell type:markdown id: tags:
# Interfaces
1.
Object Oriented
-
Best for larger development projects
-
Have to keep track of figures and axes
-
Steep learning curve
<br><br>
2.
Pyplot State Machine
-
For interactive plotting
-
Takes care of many housekeeping tasks
-
Easier to learn than the OO interface
<br><br>
3.
Pylab
-
Modelled on matlab
-
Imports common modules
-
Handles most housekeeping tasks
-
Easiest to learn
-
The one we will be using!
%% Cell type:markdown id: tags:
# Interfaces Example
1.
Object-oriented interface
```
python
import
matplotlib.pyplot
as
plt
import
numpy
as
np
x
=
np
.
arange
(
0
,
10
,
0.2
)
y
=
np
.
sin
(
x
)
fig
=
plt
.
figure
()
ax
=
fig
.
add_subplot
(
111
)
ax
.
plot
(
x
,
y
)
```
2.
State-machine environment (pyplot)
```
python
import
matplotlib.pyplot
as
plt
import
numpy
as
np
x
=
np
.
arange
(
0
,
10
,
0.2
)
y
=
np
.
sin
(
x
)
plt
.
plot
(
x
,
y
)
```
3.
PyLab mode
```
python
%
pylab
x
=
arange
(
0
,
10
,
0.2
)
y
=
sin
(
x
)
plot
(
x
,
y
)
```
%% Cell type:markdown id: tags:
# Getting help
Consult the built-in documentation, for example:
```
>>> help(subplot)
Help on function subplot in module matplotlib.pyplot:
subplot(*args, **kwargs)
Return a subplot axes positioned by the given grid definition.
...
```
%% Cell type:markdown id: tags:
# Useful Resources
-
Matplotlib Homepage
-
https://matplotlib.org/
<br><br>
-
Gallery
-
https://matplotlib.org/gallery.html
-
Many examples with source code
<br><br>
-
Online documentation
-
https://matplotlib.org/contents.html
-
Full API documentation
%% Cell type:markdown id: tags:
# First Steps
## Preparing the Jupyter Notebook
1.
Open a new Jupyter Notebook
2.
Run this code in the first empty cell:
```
%pylab inline
```
3.
Now any pylab plotting commands will display in the notebook
%% Cell type:code id: tags:
```
python
%
pylab
inline
```
%% Output
Populating the interactive namespace from numpy and matplotlib
%% Cell type:markdown id: tags:
# Grab some data
Use Pandas to load a dataset which contains population data for four countries
%% Cell type:code id: tags:
```
python
import
pandas
as
pd
populations
=
pd
.
read_csv
(
'
https://git.lumc.nl/courses/programming-course/raw/visualization-2018/visualization/data/populations.csv
'
)
```
%% Cell type:markdown id: tags:
Take a quick look at the data
%% Cell type:code id: tags:
```
python
populations
.
head
()
```
%% Output
Year Belgium Denmark Netherlands Sweden
0 1950 8.63930 4.28135 10.11365 7.01660
1 1951 8.67820 4.30370 10.26440 7.07040
2 1952 8.73040 4.33380 10.38210 7.12445
3 1953 8.77775 4.36930 10.49300 7.17145
4 1954 8.81940 4.40570 10.61535 7.21360
%% Cell type:markdown id: tags:
# Plot it!
Let's make a plot the population of the Netherlands on the y-axis, and the year on the x-axis
%% Cell type:code id: tags:
```
python
plot
(
populations
[
'
Year
'
],
populations
[
'
Netherlands
'
]);
```
%% Output
%% Cell type:markdown id: tags:
# Add titles and label the axes
%% Cell type:code id: tags:
```
python
plot
(
populations
[
'
Year
'
],
populations
[
'
Netherlands
'
])
title
(
'
Historical Population of The Netherlands
'
)
xlabel
(
'
Year
'
)
ylabel
(
'
Population (Millions)
'
);
```
%% Output
%% Cell type:markdown id: tags:
# Change some properties of the line
How about a 5px thick orange line?
%% Cell type:code id: tags:
```
python
plot
(
populations
[
'
Year
'
],
populations
[
'
Netherlands
'
],
linewidth
=
5
,
color
=
'
orange
'
)
title
(
'
Historical Population of The Netherlands
'
)
xlabel
(
'
Year
'
)
ylabel
(
'
Population (Millions)
'
);
```
%% Output
%% Cell type:markdown id: tags:
# Change some properties of the x-axis
Label at five-year intervals
Display the label vertically
%% Cell type:code id: tags:
```
python
plot
(
populations
[
'
Year
'
],
populations
[
'
Netherlands
'
],
linewidth
=
5
,
color
=
'
orange
'
)
title
(
'
Historical Population of The Netherlands
'
)
xlabel
(
'
Year
'
)
ylabel
(
'
Population (Millions)
'
)
xticks
(
range
(
1950
,
2016
,
5
),
rotation
=
90
);
```
%% Output
%% Cell type:markdown id: tags:
# Change which years are displayed
%% Cell type:code id: tags:
```
python
plot
(
populations
[
'
Year
'
],
populations
[
'
Netherlands
'
],
linewidth
=
5
,
color
=
'
orange
'
)
title
(
'
Historical Population of The Netherlands
'
)
xlabel
(
'
Year
'
)
ylabel
(
'
Population (Millions)
'
)
xticks
(
range
(
1950
,
2016
,
5
),
rotation
=
90
)
xlim
(
1970
,
1990
);
```
%% Output
%% Cell type:markdown id: tags:
# Change the y-axis scale
%% Cell type:code id: tags:
```
python
plot
(
populations
[
'
Year
'
],
populations
[
'
Netherlands
'
],
linewidth
=
5
,
color
=
'
orange
'
)
title
(
'
Historical Population of The Netherlands
'
)
xlabel
(
'
Year
'
)
ylabel
(
'
Population (Millions)
'
)
xticks
(
range
(
1950
,
2016
,
5
),
rotation
=
90
)
xlim
(
1970
,
1990
)
ylim
(
13
,
15
);
```
%% Output
%% Cell type:markdown id: tags:
# Clean up the number formatting on the y-axis
Integer tick labels
%% Cell type:code id: tags:
```
python
plot
(
populations
[
'
Year
'
],
populations
[
'
Netherlands
'
],
linewidth
=
5
,
color
=
'
orange
'
)
title
(
'
Historical Population of The Netherlands
'
)
xlabel
(
'
Year
'
)
ylabel
(
'
Population (Millions)
'
)
xticks
(
range
(
1950
,
2016
,
5
),
rotation
=
90
)
xlim
(
1970
,
1990
)
ylim
(
13
,
15
)
yticks
(
range
(
13
,
16
));
```
%% Output
%% Cell type:markdown id: tags:
# Plot multiple series
Calling
**plot**
multiple times within the same cell will add multiple series to the chart
Let's compare the Dutch with the Danes
%% Cell type:code id: tags:
```
python
plot
(
populations
[
'
Year
'
],
populations
[
'
Netherlands
'
],
color
=
'
orange
'
)
plot
(
populations
[
'
Year
'
],
populations
[
'
Denmark
'
],
color
=
'
red
'
)
title
(
'
Historical Populations of The Netherlands and Denmark
'
)
xlabel
(
'
Year
'
)
ylabel
(
'
Population (Millions)
'
)
xticks
(
range
(
1950
,
2016
,
5
),
rotation
=
90
);
```
%% Output
%% Cell type:markdown id: tags:
# Add a legend
1.
Give each plotted line a label
2.
Add a legend to the figure
%% Cell type:code id: tags:
```
python
plot
(
populations
[
'
Year
'
],
populations
[
'
Netherlands
'
],
color
=
'
orange
'
,
label
=
'
The Netherlands
'
)
plot
(
populations
[
'
Year
'
],
populations
[
'
Denmark
'
],
color
=
'
red
'
,
label
=
'
Denmark
'
)
legend
(
loc
=
'
upper left
'
)
title
(
'
Historical Populations of The Netherlands and Denmark
'
)
xlabel
(
'
Year
'
)
ylabel
(
'
Population (Millions)
'
)
xticks
(
range
(
1950
,
2016
,
5
),
rotation
=
90
);
```
%% Output
%% Cell type:markdown id: tags:
# Other plot types
Let's load a different dataset and take a look at some different plot types
%% Cell type:code id: tags:
```
python
flowers
=
pd
.
read_csv
(
'
https://git.lumc.nl/courses/programming-course/raw/visualization-2018/visualization/data/iris.csv
'
)
flowers
.
head
()
```
%% Output
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
%% Cell type:markdown id: tags:
# Boxplots
A simple boxplot of the sepal-length distribution
%% Cell type:code id: tags:
```
python
boxplot
(
flowers
[
'
sepal_length
'
],
labels
=
[
'
Sepal_length
'
]);
```
%% Output
%% Cell type:code id: tags:
```
python
``
`
%%
Cell
type
:
markdown
id
:
tags
:
# Boxplots
Distributions
of
multiple
features
%%
Cell
type
:
code
id
:
tags
:
```
python
# make a list containing the numeric feature column names
features = list(flowers.columns[:-1])
features
```
%% Output
['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
%% Cell type:code id: tags:
```
python
# plot the data
boxplot([flowers[f] for f in features], labels=features);
```
%% Output
%% Cell type:markdown id: tags:
# Controlling the size of the plot
Let's change the shape of the boxplot
%% Cell type:code id: tags:
```
python
# make the figure 10 'units' wide and 5 'units' high
figsize(10, 5)
# plot the data
boxplot([flowers[f] for f in features], labels=features);
```
%% Output
%% Cell type:code id: tags:
```
python
figsize(7,4)
```
%% Cell type:markdown id: tags:
# Histogram
%% Cell type:code id: tags:
```
python
hist(flowers['petal_length'])
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
```
%% Output
%% Cell type:markdown id: tags:
# Histogram
change the number of 'bins'
%% Cell type:code id: tags:
```
python
hist(flowers['petal_length'], bins=20)
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
```
%% Output
%% Cell type:markdown id: tags:
# Histogram
Some formatting
- Lines around the bars
- color
%% Cell type:code id: tags:
```
python
hist(flowers['petal_length'], bins=20, facecolor='teal', edgecolor='black', alpha=0.7)
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
```
%% Output
%% Cell type:markdown id: tags:
# Subplots
Separate plots with their own axes within a single figure
The syntax can be confusing!
%% Cell type:code id: tags:
```
python
for i in range(1, 5):
subplot(2, 2, i)
xticks([]), yticks([])
text(0.5, 0.5, 'subplot(2, 2, %d)' % i, ha='center', size=18, alpha=0.75);
```
%% Output
%% Cell type:markdown id: tags:
subplot(2, 2, 1) indicates the first cell of a 2 row x 2 column matrix
subplot(2, 2, 4) indicates the fourth cell of a 2 column x 2 row matrix
%% Cell type:markdown id: tags:
# Subplots
More complicated layouts
%% Cell type:code id: tags:
```
python
subplot(1, 3, 1) # 1 row, 3 columns, cell 1
xticks([]), yticks([])
text(0.5, 0.5, '(1, 3, 1)', ha='center', size=18, alpha=0.75)
subplot(2, 3, 3) # 2 rows, 3 columns, cell 3
xticks([]), yticks([])
text(0.5, 0.5, '(2, 3, 3)', ha='center', size=18, alpha=0.75)
subplot(3, 2, 6) # 3 rows, 2 columns, cell 6
xticks([]), yticks([])
text(0.5, 0.5, '(3, 2, 6)', ha='center', size=18, alpha=0.75)
subplot(3, 3, 5) # 3 rows, 3 columns, cell 5
xticks([]), yticks([])
text(0.5, 0.5, '(3, 3, 5)', ha='center', size=18, alpha=0.75);
```
%% Output
%% Cell type:markdown id: tags:
# Subplots and Boxplots
Compare how the features are distributed by species
%% Cell type:code id: tags:
```
python
species = list(set(flowers.species))
print(species)
```
%% Output
['virginica', 'setosa', 'versicolor']
%% Cell type:code id: tags:
```
python
# make a dataset for each species
setosa = flowers[flowers.species == 'setosa']
versicolor = flowers[flowers.species == 'versicolor']
virginica = flowers[flowers.species == 'virginica']
```
%% Cell type:code id: tags:
```
python
figsize(10, 8)
for cell, feature in enumerate(features):
subplot(2, 2, cell + 1)
boxplot(
[setosa[feature], versicolor[feature], virginica[feature]],
labels=species
)
ylabel(feature)
```
%% Output
%% Cell type:code id: tags:
```
python
figsize(7,4)
```
%% Cell type:markdown id: tags:
# Sketch-style drawing
using xkcd mode
%% Cell type:code id: tags:
```
python
with xkcd():
hist(flowers['petal_length'], bins=20, facecolor='teal', edgecolor='black')
title('Petal Length Distribution')
xlabel('petal length')
ylabel('count');
```
%% Output
%% Cell type:markdown id: tags:
# Saving to a file
Images can be saved to a file using savefig after the plotting commands:
```
savefig('myplot.pdf')
```
The format of the saved image will be inferred from the given file extension.
%% Cell type:markdown id: tags:
# The End
This lesson was based on previous work by [Jeroen Laros](mailto://j.f.j.laros@lumc.nl) and Martijn Vermaat
License: [Creative Commons Attribution 3.0 License (CC-by)](http://creativecommons.org/licenses/by/3.0)
%% Cell type:code id: tags:
```
python
```
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment