Python has mny add-on libraries for making static or dynamic visualizations, but will focus on:

  • Matplotlib is a comprehensive library for creating static, interactive, and animated visualizations in Python.
  • It serves as the basis for more specialized libraries such as Seaborn and Pandas plotting
  • Based on Matplotlib that provides a high-level interface for drawing attractive and informative statistical graphics.

Matplotlib Object Hierarchy:¶

  • A Figure object is the outermost container for a matplotlib graphic, which can contain multiple Axes objects. One source of confusion is the name: an Axes actually translates into what we think of as an individual plot or graph (rather than the plural of “axis,” as we might expect).
  • Axes: the actual plots
  • Smaller objects such as tick marks, individual lines, legends, and text boxes.
In [1]:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 3)
axes
Out[1]:
array([[<Axes: >, <Axes: >, <Axes: >],
       [<Axes: >, <Axes: >, <Axes: >]], dtype=object)

Create first plot¶

In [2]:
import numpy as np
data = np.random.standard_normal(30).cumsum()
data[:10]
Out[2]:
array([-2.33669142, -1.65042042, -0.11951624,  0.29966236,  0.58489135,
        1.41668467,  1.77966819,  2.28160376,  2.27045329,  3.36295798])
In [3]:
import numpy as np
fig, axes = plt.subplots(figsize=(10, 5))

# axes.plot(data)

#Change style
# axes.plot(data, color="black", linestyle='dashed');
axes.plot(data, color="black", linestyle='dashed', marker="o");

# #Set axis label
axes.set_xlabel("Stages")

# # Change ticks
axes.set_xticks([10, 20, 30, 40]);

# # Set plot title
axes.set_title("My first matplotlib plot");

#Change figure size
#figsize=(10, 10)  

Legends¶

In [4]:
#Legends 
fig, ax = plt.subplots()
ax.plot(np.random.randn(1000).cumsum(), color="black", label="one");
ax.plot(np.random.randn(1000).cumsum(), color="black", linestyle="dashed", label="two");
ax.plot(np.random.randn(1000).cumsum(), color="black", linestyle="dotted", label="three");

# Add legend
ax.legend()
                     
Out[4]:
<matplotlib.legend.Legend at 0x7feac57b90f0>

Save figure to file¶

In [5]:
# Saving to file
fig.savefig("figpath.png" )

Matplotlib global configuration¶

  • Default behavior can be customized via global parameters governing figure size, subplot spacing, colors, font sizes, grid styles, and so on.
In [6]:
plt.rc("figure", figsize=(10, 6))
  • The first argument to rc is the component you wish to customize, such as "figure", "axes", "xtick", "ytick", "grid", "legend", or many others.
In [7]:
plt.rc("font", family="monospace", weight="bold", size=10)

Plotting with Pandas¶

In [8]:
import pandas as pd
data = pd.Series(np.random.uniform(size=16), index=list("abcdefghijklmnop"))
data.plot()
# data
/Users/mac/anaconda3/lib/python3.10/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).
  from pandas.core import (
Out[8]:
<Axes: >

How to use matplotloib functions ?¶

In [9]:
fig, axes = plt.subplots(2, 1)
data.plot.bar(ax=axes[0], color="black", alpha=0.7)
data.plot.barh(ax=axes[1], color="black", alpha=0.7)
Out[9]:
<Axes: >

Histogram¶

In [10]:
sec_1 = pd.read_csv('section_1.csv').drop(columns=['Student ID']).rename(columns=lambda x: x.strip())
sec_1['Section'] = 1
sec_2 = pd.read_csv('section_2.csv').drop(columns=['Student ID']).rename(columns=lambda x: x.strip())
sec_2['Section'] = 2
df = pd.concat([sec_1, sec_2]).reset_index().drop(columns='index')
# df['Mid'] = df['Mid'].replace(" ",np.nan).astype('float')
df['Ass.1'] = df['Ass.1'] * 20 
df['Ass.2'] = df['Ass.2'] * 10 
df['Ass.3'] = df['Ass.3'] * 10 
df['Ass.4'] = df['Ass.4'] * 10 
df['Ass.5'] = df['Ass.5'] * 10 
df['Q 1'] = df['Q 1'] * 3.3 


df.head(5)

df['Mid'] = df['Mid'].replace(' ', np.nan).astype(float)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107 entries, 0 to 106
Data columns (total 8 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Ass.1    107 non-null    float64
 1   Ass.2    107 non-null    float64
 2   Ass.3    107 non-null    float64
 3   Ass.4    107 non-null    float64
 4   Ass.5    107 non-null    float64
 5   Q 1      107 non-null    float64
 6   Mid      102 non-null    float64
 7   Section  107 non-null    int64  
dtypes: float64(7), int64(1)
memory usage: 6.8 KB
In [11]:
df['Mid'].hist(bins=20);
In [12]:
#Change bins, x ticks, label
In [13]:
fig, axes = plt.subplots()
df['Mid'].hist(ax = axes, bins=15);
axes.set_xlabel('Midtrem score')
axes.set_xticks(np.arange(10,100,10));

Plot using seaborn¶

In [14]:
import numpy as np
sec_1['Mid'] = sec_1['Mid'].replace(" ", np.nan).astype(float) 
In [15]:
import seaborn as sns
fig, axes = plt.subplots(2,1, figsize=(8,10))
sns.histplot(ax=axes[0], data=sec_1['Mid'], bins=10, color="black");
axes[0].set_title("Midterm score for section 1");
axes[0].set_xticks(np.arange(10,110,10));
axes[0].set_xlabel("Score");
sns.histplot(ax =axes[1], data=sec_2['Mid'], bins=10, color="black");
axes[1].set_title("Midterm score for section 2");
axes[1].set_xticks(np.arange(10,110,10));
axes[1].set_xlabel("Score");
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):

Note that it has created labels automatically

Plot histogram for each section and compare them?¶

In [16]:
fig, axes = plt.subplots(2, 1)
sec_1 = df[df['Section'] == 1]
sec_2 = df[df['Section'] == 2]
sns.histplot(ax = axes[0], data=sec_1, x = 'Mid', bins=12)
sns.histplot(ax = axes[1], data=sec_2, x = 'Mid', bins=12)
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
Out[16]:
<Axes: xlabel='Mid', ylabel='Count'>
In [17]:
sns.histplot(data = df, x = 'Mid', hue='Section',  bins=20);
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1075: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1075: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  data_subset = grouped_data.get_group(pd_key)

Scatter plot¶

In [18]:
sns.scatterplot(data=df, x = "Q 1", y = "Mid");
In [19]:
# Regression plot
# df['Mid'] = df['Mid'].replace(" ",np.nan).astype('float')
sns.regplot(data=df, x = "Q 1", y = "Mid");
In [20]:
sns.pairplot(df, plot_kws={"alpha": 0.2})
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/mac/anaconda3/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
Out[20]:
<seaborn.axisgrid.PairGrid at 0x7feac9d06290>

Boxplot¶

In [21]:
sns.boxplot(data=df, x = 'Section', y='Mid')
#set x to Section
Out[21]:
<Axes: xlabel='Section', ylabel='Mid'>

Heatmap¶

In [22]:
sns.heatmap(data=df)
Out[22]:
<Axes: >
In [23]:
sns.heatmap(df.groupby('Section').mean())
Out[23]:
<Axes: ylabel='Section'>
In [24]:
sns.heatmap(df.corr())
Out[24]:
<Axes: >