## 35.    Seaborn examples gallery study

home

### description

• This topic is about my study on seaborn examples gallery in March 2019.
• seaborn examples gallery
• I studies api, tutorial, and intro first.
• The gallery has 40 demos. The plotting and codes are there.
• Four help below:
• plotting
• plotting label
• plotting title
• The purpose of visualization is to see the insight easily.
• My purposes to learn are as below:
• From each plotting, find out the pattern.
• From their code, find out how-to.
• To get more experience using python, pandas, numpy.
• To have more experience for pattern recognitions.
• I used the IDE as below for all the tests in this topic
• window 7
• installed Anaconda
• Following the instruction from Anaconda, install code visual studio.
• After the installation, I started Anaconda Navigator.
• In the navigator, I choosed code visual studio, click launch button.
• Ready to test the python coded.
• On the gallery web site, I have 4 plottings for one row. There are 10 rows in total.

### 01. row 1, plotting 1 - lmplot for linear regression

info
• plotting label: lmplot
• title: Anscomber's quartet
analysis
• lmplot is for regression, linear for this demo.
• scattering plotting for x-y relationship
• Coding, col="dataset", col_wrap=2 for 4 plottings.
• It is a high-level, easy plotting method.
• The left-top fits properly.
• The right-top should be used non-linear.
• The left-bottom has one outliner, it should be removed for regression.
• The right-bottom seems to use categorical data, it does not fit.
You can examine the data more precisely with the following code:
df2 = df.loc[df['dataset'] == 'IV']
print(df2)

### 02. row 1, plotting 2, barplot

info
• plotting label: barplot
• title: Color palette choices
analysis
• There are three plottings - barplot.
• # Set up the matplotlib figure
f, (ax1, ax2, ax3) = plt.subplots(3, 1,...)
# 3 rows, 1 column
• You can plottting each subplotting differently.
• Using sns.barplot, create 3 plottings with different styles.
• Barplot is for x-y relationship.
• Some numpy functions create demo data.

### 03. row 1, plotting 3, Different cubehelix palettes

info
• plotting label:kdeplot
• title: Different cubehelix palettes
examine the plotting, data and application
• There are 9 subplots, 3 rows,3 columns.
• Each subpot has x-y realtionship density plot in kde with different color palette.
• This application is addressing the following topics.
• For each subplot, how to prepare demo data with numpy.linspace method.
• How to iterate for each subplots with python zip class
• For each subplot, how to create color palette.
• plotting method sns.kdeplot is pretty typical.
```                    # step 1:  Set up the matplotlib figure, 3 rows X 3 columns
f, axes = plt.subplots(3, 3, ...)

# step 2:  instantiae a python zip object with 2 parameters
#   parameter 1: axes.flat,
#      it has 9 MAT subplots.
#   parameter 2: np.linspace(0, 3, 10))  evenly separated
#      0.00,  0.33,  0.66,  1.00, 1.33, 1.66, 2.00, 2.33, 2.66, 3.00
#      9 will be used in the loop for palette color configuration.

# step 3:  loop the zip
#        for each iteration
#          - create a cmap, using s value
#          - Generate  a random bivariate dataset,
#            x, y = rs.randn(2, 50)
#                      two dimension nparray
#                      50 points

# step 4:  plot it with method kdeplot
#          - different data
#          - different palette
```
comment:
• I think that the purpose of this demo is dmonstrate the whole process.

### 04. row 1, plotting 4, scatterplot, typical

typical scatterplot demo
• scatterplot for x-y relationship
• dataset: diamonds, x-corrdinate: carat, y-coordinate: price
• two semantic options: hue, size
• one small detail for clarity sort

### 05. row 2, plotting 1, distplot

info
• plotting label: distplot
• title: distribution plot options
analysis
• There are four plottings - distplot.
• # Set up the matplotlib figure
f, axes = plt.subplots(2, 2,...)
# 2 rows, 2 column
• You can plottting each subplotting differently.
• univariate
• options
• histogram, distplot default for 1.1
• kde & rug for 1.2
• filled kde for 2.1
• histogram and kde for 2.2
• Some numpy functions create data in ndarray.

### 06. row 2, plotting 2, Timeseries plot

info
• plotting label: lineplot
• title: Timeseries plot with error bands
analysis
• x-y relational ploting
• x-coordinate is in time unit.
• There are many scenarios for timeseries as below:
• GDP vs year
• stock price vs date
• body sugar level vs month
• In this demo, one mri lab, the plotting is the realtionships between signals and times for the brain activites.
• sns.lineplot for one plot
• Two semantic categorical variables.
• For one x-coordinate, there are many y values.

### 07. row 2, plotting 3, FacitGrid with Projection

info
• plotting label: FacitGrid
• title: FacetGrid with custom projection
a very different plotting style
• There are 3 plots for 3 different speeds of a moving objects.
• Method FacetGrid is used.
• It is also for high-level plotting.
• semantic parameter: speed - fast, medium, fast.
• using semantic parameter,
• subplot_kws=dict(projection='polar')
• The result plotting will be projected as shown.
• For each plot,
• no x-coordinate or y-coordinate
• timepoints are presented with dots.
• The distance between dots are for their distances.
• Each dot is located by
• the distance from the center
• the angle
• The plotting seems to be a good fit for this senario.
code for data
• The following extract code for data is for demo purpose.
• In real senarios, the data are from a csv file from some labs.
```    import numpy as np
import pandas as pd

# ---    step 1, using numpy to generate data
r = np.linspace(0, 10, num=100)
print(' r = ' + str(r))
# r = [ 0.   0.1   0.2    1.0   2.0... 9.8, 10.0 ]
# 100 number, evenly separated.

# ---     step 2, using pandas to create a dataframe - wide form
df = pd.DataFrame({'r': r, 'slow': r, 'medium': 2 * r, 'fast': 4 * r})
print('df = ' + str(df))
print('df.shape = ' + str(df.shape))
'''
r       slow       medium       fast
0    0.000000   0.000000   0.000000   0.000000
1    0.101010   0.101010   0.202020   0.404040

100 rows, 4 columns
'''

# ---     step 3, using pandas melt method to convert to long form
#The plotting method expects lon-form format.
df2 = pd.melt(df, id_vars=['r'], var_name='speed', value_name='theta')
print('df2 = ' + str(df2))
print('df2.shape = ' + str(df2.shape))
'''
r      speed    theta
0     0.000000  slow   0.000000
1     0.101010  slow   0.101010
300 rows, 3 columns
'''

```

### 08. row 2, plotting 4, FacetGrid, typical

typical FacetGrid demo
• The code is as below.
• Method sns.FacetGrid
There are 4 subplots matrix for parameter row and col.
• Mapping with method plt.hist
• Histogram is for distribution.
• a diagram consisting of rectangles
• whose area is proportional to the frequency of a variable
• and whose width is equal to the class interval
• x-coordinate: total_bill
• np.linspace(0, 60, 13) sets 13 bins from 0 to 60.
```        tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True)
bins = np.linspace(0, 60, 13)
g.map(plt.hist, "total_bill", color="steelblue", bins=bins)
```

### 09. row 3, plotting 1, sns.relplot, kind="line"

typical demo for sns.replot, kind="line"
• sns.replot, kind="line", with semantic parameter, col.
• There are two faccets becasue of col parameter.
• line plotting for show the x-y relationships, x-cooridate is time, time-series, categorical variable.

### 10. row 3, plotting 2, Grouped barplot

info
• plotting label: catplot
• title: Grouped barplots
analysis
• categorical variable - passenger class for x-coordinate.
• x-y relationship
• another categorical, sex is also used for grouping in addition to passenger class.
• y-coordinate is the grouped result for variable survived. It is the survived probability for each group.
• Method sns.catplot is used only. It is pretty high-level for easiness for this common pattern.

### 11. row 3, plotting 3, Grouped boxplot

info
• plotting label: boxplot
• title: Grouped boxpots
analysis
• distribution, boxplot
• grouped by two categorical variables
• day for x-coordinate
• smoker for semantic parameter
• total_bill is for y-coordinate.
• Method sns.boxplot is used only. It is pretty high-level for easiness for this common pattern.

### 13. row 4, plotting 1, Annotated headmaps

info
• plotting label: heatmap
analysis
• The heatmap is like pivot table.
• The first step is to load a csv file into a Pandas..Dataframe.
• Each row has colulmns - year, month, passengers.
• The second step is use Dataframe's pivot method to convert it into a pivot table.
• column names like 1949,.... 1960
• row names like January.......December
• Finally render the table.
• Each cell has the content of passenger number.
• The background color for each cell is configured automatically.

### 14. row 4, plotting 2, Hexbin plot with marginal distributions

info
• plotting label: jointplot
• title: Hexbin plot with marginal distributions
analysis
• Some numpy functions creates data for x, y in numpy.ndarray
• The following function creates plottings
sns.jointplot(x, y, kind="hex", color="#4CB391")
This is a high-level plotting function.
• The main plotting is kde for x-y relation. The kind is hex to show the density.
• x, y marginals plottings are for distributions.

### 15. row 4, plotting 3, Horizontal bar plots

info
• plotting label: barplot
• title: Horizontal bar plots
analysis
• barplot in horintal orientation seems more popular.
• Using sns.barplot twice.
• The first time is for total with one color.
• The second time is for one of the cause with another color.
• The second overrides the first.
• For each bar, it is a relation between state for y and car crash data for x.

### 16. row 4, plotting 4,Horizaontal boxplot with observations

info
• plotting label: boxplot
• title: Horizaontal boxplot with observations
examine the plot, and understand the application for
• continue from the previois demos, it is a horizontal oriented.
• A categorical variable for y-axis
• The value for x-axis
• The purpose to find out plants distances by using different methods.
• Use distribution in boxplot and add in points to show observations.
data features
• Some distance values are very small, some are very large.
• Using logarithmic x axis is needed by the following:
• ax.set_xscale("log")
• The x-axis are in logarithmic unit.
coding
• sns.boxplot for distribution
• sns.swarmplot for each observations

### 17. row 5, plot 1, horizontal jitter stripplot

info
• plotting label: stripplot
• title: Conditional means with obervations
application
• flower sampling
• Based on iris.shape, there 150 samples
• There are 3 species. For each one, there 50 samples.
• To show the distributions of 4 variables
• sepal_length
• sepal_width
• petal_length
• petal_width
plot
• horizontal-oriented plots
• y-cooridate: the above 4 variables
• The jitterplot is for the distribution.
• The plot is like a horizontal strip.
• The mean value is presented as a diamond marker.
code implementation
• iris dataset is used. like below:
sepal_length, sepal_width,petal_length,petal_width, species
5.1, 3.5,1.4,0.2, setosa
• After melting, it looks like as below:
species,measurement, value
setosa, sepal_length, 5.1
• After melting, the shape of new dataframe is (600, 3)
• One sample row is divided into 4 rows.
• Four column names are combined into one column, measurement.
• The reason for melting is for the plotting functions.
• sns.stripplot for the scattering plotting.
• sns.pointplot for having the mean marker.

### 18. row 5, plot 2, sns=jointplot, kind=kde

info
• plotting label: jointplot
• title: Joint kernel density estimation
description
• The ploting method is high-level.
• It creates a multi-panel figure
• the bivariate (or joint) relationship between two variables for Kernel density estimation
• the univariate (or marginal) distribution of each on separate axes.
• Some numpy functions create two ndarray variables for the plotting function.

### 20, row 5, plot 4, Plotting large distributions

info
• plotting label: boxenplot
• title: Plotting large distributions
analysis
• 53,950 rows in the dataframe.
• distribution presentation, y-coorinate is carat.
• x-coordinate is diamonds' clarity.
• categorical variable
• Its order is defined in the plotting method.
• sns.boxen is the method name.

### 21. row 6, plot 1, logistic regression

info
• plotting label: lmplot
• title: Faceted logistic regression
data
• dataset: titantic
• variable survived
1.0 or 0.0
It will be used to compute the value for y-coordinate.
• variable age for x-coordinate
• variable sex for semantic col parameter.
there are two plots
coding
• method: sns.lmplot
• y-coordinate is the survived probability
• not linear regression

### 22. row 6, col 2, FacetGrid and its map function, typical

description
• Some numpy functions help create a dataframe's parts.
• The following function to create a Facetgrid object.
grid = sns.FacetGrid(df, col="walk", ...col_wrap=4)
• For each subplots, the grid uses its map function to plot as below
grid.map(plt.plot, "step", "position", marker="o")
The map function also do some others.

### 23. row 6, col 3, heatmap on the left-bottom side

info
• plotting label: heatmap
• title: Plotting a diagonal correction matrix
ploting
• When you take a look at the heatmap, it shows only on the left-bottom side.
• If you exanine the correlations on for letter A or letter Z, they are 1.
• It simply serves the purposes.
• Code for mask is for this purpose.
• step 1: create a dataframe, 100 rows, 26 columns(for capital A2Z).
• step 2: create a matrix from the dataframe
• step 3: configure the mask and map.
• step 4: sns.heatmap to plot.

### 24. row 6, col 4, JointGrid - scatter and rug

info
• plotting label: JointGrid
• title: Scatterplot with marginals ticks
application
• There are four lines of code to generate bivariate dataset
• mean = [0, 0]
• cov = [(1, 0),(0, 2)]
• For the both above, x is the first part, y is the second part.
• cov stands for covariance.
• Just use them to generate two ndarrays for x, y coordinates.
• plotting
• sns.JointGrid
• plot_joint(plt.scatter,..)
• plot_marginals(sns.rugplot,..)

### 25. row 7, col 1, Multiple bivariate KDE plots

info
• plotting label: kdeplot
• title: Multiple bivariate KDE plots
application
• load iris dataset into a dataframe.
• Using dataframe's method query, create two subsets for two species.
• Using plt.subplots, setup the mat figure.
• Using the subset dataframes, execute sns.kdeplot twice to plot two kdes.
• x-y relationship, estimation, not regression.

### 26. row 7, col 2, Multiple linear regression

info
• plotting label: lmplot
• title: Multiple linear regression
application
• load iris dataset into a dataframe.
• Method sns.lmplot for linear.
• semantic hue for species.
• The plot type is seaborn.axisgrid.FacetGrid.
• regression, linear.

### 27. row 7, col 3, Paired density and scatterplot matrix

info
• plotting label: PairGrid
• title: Paired density and scatterplot matrix
application
• dataset:iris
• matrix is four variables vs themselves using PairGrid.
coding
• g = sns.PairGrid(df, diag_sharey=False)
g.map_lower(sns.kdeplot)
g.map_upper(sns.scatterplot)
g.map_diag(sns.kdeplot, lw=3)
plotting
• For map_lower and map_diag, they are for x-y relationship. You can choose kde or scatter plot.
• For map_diag is for the distribution of the specific variable.

### 28. row 7, col 4, Paired categorical plots

info
• plotting label: PairGrid
• title: Paired categorical plots
application
• Using Titantic data as example, people wants to know the differences.
• passenger class
• sex
• man, woman, first-child
• alone or group
• They are all categorical variables
plotting
• Four subplots for the above four categorical variables.
• Each plot is for x-y relationship
• x-coordinate is one of the four categorical variables.
• y-coordinate is is the survived probability.
• The dots are the mean values.
• There are lines between the dots to show the differences.
coding
• method: sns.PairGrid
y_vars="survived",
x_vars=["class", "sex", "who", "alone"],
• method for each paired
g.map(sns.pointplot,...)
between two means.

### 29. row 8, col 1, Pairgrid with dotplots

application,plotting, coding
• PairGrid
• Horizontal-oriented, the y-coordinate is US states.
• There are many types of car crashes.
• Each subplot is for each type of crash
• x-coordinate: the number of the specific crash
• y-coordicate: all the states together, paired with all the crashes for all subplots.
• The plotting is x-y relational.
• Method sns.PairGrid is used.
• column abbre is used for y-coodinate.
• The first five columns are used for crash types.
• Method sns.stripplot plots the dots for states.
• Using class zip, set the titles for subplots.

### 30. row 8, col 2, categorical variable for x-coordinate, x-y realtionship

info
• plotting label: catplot
• title: Plotting a three-way ANOVA
application and data
• One of topic is the study of the relationship betwen pulse and exercise time.
• x-coordinate is variable time
• In the csv file, they are 1 min, 15 min, 30 min with no quote. They are str.
• After the contents are loaded into a dataframe, its type is Pandas..Series, a hashable column.
Pandas one-dimensional array for one dataframe column.
• if you execute the following
print(str(df.time.unique))
the result like below:
Name: time, Length: 90, dtype: category
Categories (3, object): [1 min, 15 min, 30 min]
• y-coordinate is variable pulse, like 130, 140, 150, 180... in int64.
• Other categorical variables are kind and diet.
coding and plotting
• x-y relationship
• The method: sns.catplot
• x-coordinate is time
• Two sematic parameters
• hue:"kind"
• col:"diet"
• For the mean value for specific kind and diet is used for comparison.

### 31. row 8, col 3, jointplot, kind="reg"

info
• plotting label: jointplot
• title: Linear regression with marginal distributions
description
• This is pretty typical.
• tips dataset is used for this demo.
• total_bill for x-coordinate
• tip for y-coordinate
• y depends on x. This is pretty apparent.
• Method jointplot, kind="reg"
• It is a high-level method.

### 32. row 8, col 4, Plot the residuals after fitting a linear model

description
• Using numpy functions to prepare test data for x-y relationship.
• Method residplot
• Plot the residuals after fitting a linear model
• It becomes nonlinear model.

### 34. row 9, col 1, relplot, default kind: scatter, typical

description
• relplot is for x-y relationship.
• default kind: scatter.
• two semantic parameters - hue and size(dot size) is used.
• typical example

### 34. row 9, col 2, one distribution for many different

• Dataset iris is used.
• After pd.melt, four columns are merged into one column.
• Method swarmplot is used for distribution.
• semantic parameter hue is used.

### 35. row 9, col 3, pairplot

• Method sns.pairplot
• high-level method
• data: iris
• plotting features
• matrix of four variables - sepal lenth, sepal width, petal length, petal width.
• The subplots on the diag. are for shaded kde.
This is a univariate plotting.
The x-coordinate is a variable value.
The y-coordinate refers its occurrence probability. The total probabilities for one kde is 1.
• The rest are for scattering for bivariates x-y relationship plottings.

### 37. row 10, col 1, violinplot for high-level example

info
• plotting label: violinplot
• title: violinplots with observations
This is a example to use violinplot in high-level.
• Some numpy functions create two dimensional numpy.ndarrays, 40 x 8 .
• [0,1,2,3,4,5,6,7] will for x-coordinate.
• The inner arrays will be for y-coordinate for each x.
• The arrays will be for data parameter for function violinplot.
• In example 40, another high-level violinplot demo uses dataframe data as input.
• plotting
• There are 8 violins.
• distribution
• Some dots are shown in the middle of the violin.
note: function parameter inner is set to point.

### 38. row 10, col 2, df.corr() and sns.cluster for brain networks

info
• plotting label: clustermap
• title: Discovering structure in heatmap data
description
• It is a brain network hierachy.
• There are three levels - network, node, and hemi(left or right).
• There are many rows in the input csv file in wide-form format.
• From function corr, the matrix of correlations are created
• all the pair between itself and the rest hemis.
• For itself, the correlation value is 1.
• The values are mapped into color for easy visualization.
• The method is high-level.
• The plotting information is in detail.
• In later plotting #40, a distribution plotting will be addressed for the the same data, not in detail.
```        -------------    step 1:   examine the data...  ------------------------------
used_networks = [1, 5, 6, 7, 8, 12, 13, 17]
used_columns = (df.columns.get_level_values("network")
.astype(int)
.isin(used_networks))
df = df.loc[:, used_columns]
print('df')
'''
network           1                      5                           ...17
node              1                      1                           ...
hemi             lh          rh         lh         rh                ...
0         56.055744   92.031036 -35.898861  -1.889181...
1         55.547253   43.690075  19.568010  15.902983...
'''

note 1: There are 3 lines for dataframe heading.
note 2: The hierarchy of brain networks data are network, node, left or right.
note 3: the values are like 56.0, 15.90... for each hemi

```
```        -------------------- step 2:   generate the corrections form the data ----
df.corr() is the function argument in method sns.clustermap.
It genenerates correlations from the data.
'''
network                   1                   5
node                      1                   1
hemi                     lh        rh        lh        rh
network node hemi
1       1    lh    1.000000  0.881516  0.431619  0.418708
rh    0.881516  1.000000  0.431953  0.519916
5       1    lh    0.431619  0.431953  1.000000  0.822897
rh    0.418708  0.519916  0.822897  1.000000
'''

note 1: The values are like 0.88, 0.5, 1...
note 2: 1 means 100% correlation, it refers itself.
note 3: 0 means no correlation.
note 4: minus means opposite correlations.

```
----------------- step 3: code for plotting -------
• method: sns.clustermap
• It maps all the hemi and convert its corr values to the cell colors
• Also, the hierachy is provided to group many clusters.

### 39. row 10, col 3, Seaborn.line, data: DataFrame

info
• plotting label: lineplot
• title: Lineplot from a wide-form dataset
description
• In Seaborn.line documentation, lineplot has a more popular usage.
like sns.lineplot(x="timepoint", y="signal", data=fmri)
x, y : names of variables in data
• In this demo, method parameter data is for dataframe
• parameter columns for 4 continuous lines
• It is a high-level function for plotting.
• Both are time-seris for x-coordinate.
• Some functions are used to prepare the time-series dataframe.
```        The following is the wide-form dataset.
A         B         C         D
2016-01-01  0.167921  0.523505  0.817376  1.703846
2016-01-02 -1.979026  1.237704  0.057230  2.743267
....
```

### 40. row 10, col 4, multiple violin plots, dataframe, high-level method

info
• plotting label: violinplot
• title: Violinplot from a wide-form dataset
examine the plot
• The data is brain_networks used in demo demo 38.
• There are 12 violinplots for each brain network.
• distribution purposes.
• x-coordinate: 12 networks
• y-coordinate correlation coefficient
1 means 100 per cent, itself
negaive means opposite direction
• Unlike demo 38, this is not pairs for all hemis, not that detail.
plotting method
• sns.violinplot
• using dataframe
• high-level function
steps to process data
• load the data from web, and filter it.
• correlate it, 52 rows x 52 columns for all the hemis
• group rows for network , get means, 12 rows(network) x 52 columns(hemis)
• tanspose it, 52 rows(hemis) x 12 columns(network)
The plot expects this data format.