# Sharing some programming knowledge.

0%

In Python, there are many plotting libraries like `matplotlib`, `pandas visualization`, `seaborn` or `plotly`. Among them, `matplotlib` and `pandas visualization` are bases and they are the most common way to plot basic graphs. In this article, I would like to summarise the syntax of using them and try to offer a clear explaination in a plain way.

## Preparation

order_id quantity item_name choice_description item_price
0 1 1 Chips and Fresh Tomato Salsa NaN \$2.39
1 1 1 Izze [Clementine] \$3.39
2 1 1 Nantucket Nectar [Apple] \$3.39
3 1 1 Chips and Tomatillo-Green Chili Salsa NaN \$2.39
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... \$16.98

## Plotting with matplotlib

Matplotlib is the most popular python plotting library, which is the basis of pandas visualization. It offers more freedom at the cost of having to write more code and has more complex syntax.

### Scatter Plot

quantity item_price
order_id
1 4 11.56
2 2 16.98
3 2 12.67
4 2 21.00
5 2 13.70
``Text(0, 0.5, 'order price')`` fg, ax = plt.subplots(): create a figure and axis

fg is a `Figure` object and ax is a `AxesSubplot` object. So what are they?

Figure is the top level container for all the plot elements.

fg is a container, you can think it as a picure frame. Remember that your actual graph is not fg(Figure object), instead, your actual graph is presented by ax(AxesSubplot object). The figure is the part around your graph, and a figure can contain several subplots. Here we only have one, that is ax. So don’t get confused about the original meaning of “axe”, in matplotlib it represents a graph in a figure.

ax.scatter(x=scatter_data.quantity, y=scatter_data.item_price): scatter the quantity agaist the price

Since ax is the “actual graph”, so we are going to do scattering on it. Use `x=scatter_data.quantity` and `y=scatter_data.item_price` to give the data to plot.

ax.set_title(), ax.set_xlabel(), ax.set_ylabel(): polish your graph

Their function is quite self-explaining.

### Line Chart

``<matplotlib.legend.Legend at 0x7f68d3225290>`` ax.plot(line_data.index, line_data[column], label = column): plot a line chart

We can plot multiple lines in one chart with different labels, which are “quantity” and “item_price” in this example. You just have to use a loop.

ax.legend():

`ax.legend()` is to help you show the “label” corresponding to each data (i.e. each line in this example), so that readers can better understand your data structure.

### Histogram

Histogram is a chart specially used for representing a frequency distribution; heights of the bars represent observed frequencies. The purpose of histogram is to roughly assess the probability distribution of a given variable by depicting the frequencies of observations occurring in certain ranges of values.

order_id quantity item_name choice_description item_price
4 2 2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans... 16.98
5 3 1 Chicken Bowl [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... 10.98
13 7 1 Chicken Bowl [Fresh Tomato Salsa, [Fajita Vegetables, Rice,... 11.25
18 9 2 Canned Soda [Sprite] 2.18
19 10 1 Chicken Bowl [Tomatillo Red Chili Salsa, [Fajita Vegetables... 8.75 ax.hist(hist_data[‘item_name’]):

Note that `ax.hist()` will automatically calculate how often each value occurs(i.e. frequecy) and put the results as y-axis values.

### Bar Chart

The Bar Chart is useful for representing categorical data. ## Plotting with pandas

Pandas visualization is a handy tool to create plots out of a pandas dataframe and series. Using pandas visualization is like you are using internal matplotlib functions which exist in Pandas classes (i.e. DataFrame or Series), instead of using external ones which exist in matplotlib classes.

### scatter plot To draw a scatter plot, we just have to use `<data_set>.plot.scatter()` function. And pass `x='quantity'`, `y='item_price'` to specify which columns of the data_set to use. Optionally we can also pass it a title.

### Line Chart  ### Histogram To plot a histogram, you just need to use `<Series>.plot.hist()`.

Note that we can’t use `hist_data['item_name']` anymore because here it requires numerical values. This makes more sense as the columns of a histogram are normally positioned over a label that represents a quantitative variable (i.e. a range of numbers). While the columns of a bar chart are usually positioned over a label that represents a categorical variable.

### Bar Chart To plot a histogram, you just need to use `<Series>.plot.bar()` or `<dataframe>.plot.bar(x='',y='')`. To plot a horizontal histogram, replace bar() with barh(). For more parameter options in each plot function, look them up in official documents. It’s no use to remember all of them becauese there are too much…

matplotlib

pandas visualization