Sharing some programming knowledge.

0%

Simple Image Data Analysis Using Numpy

As we all known, images that we saw on the screen are the combination of lots of pixels. Every pixel has its own color and was recorded in computer by a string of binary numbers. Commonly, each pixel is represented by 24 bit (3 bytes) binary numbers, and each byte represents Red, Green, Blue respectively. So the value of each color is in range 0 ~ 255. Specifically, a pixel in python can be stored as a list: [255,0,254]. But it’s more convenient and efficient to use python library imageio. Once we have stored it, we can start to analysis it.

Here I will take Firefox icon, because it’s my most frequently used browser and I’m using it now to write this Jupyter notebook.

First, import relevant libraries:

1
2
3
import imageio
import matplotlib.pyplot as plt
import numpy as np

Then, store it using imageio:

1
2
photo_data = imageio.imread("./firefox.png")
print(type(photo_data)) # It's an imageio.core.util.Array object, but it behaves just like ndarray in numpy
<class 'imageio.core.util.Array'>

We can print to check what it looks like:

1
2
3
4
5
6
7
print(photo_data.shape)
print(photo_data)
print(photo_data.size)
print(photo_data.mean())
print(photo_data.min(), photo_data.max())
plt.figure(figsize=(5,5))
plt.imshow(photo_data)
(341, 419, 3)
[[[ 0  9 52]
  [ 0  9 52]
  [ 0  9 52]
  ...
  [ 1  8 52]
  [ 1  8 52]
  [ 1  8 52]]]
428637
71.33186122523254
0 255

png

From photo_data.shape we can see actually we can treat it as a rank 3 ndarray (or call it a three layered matrix). The first two numbers here are length and width, and the third number (i.e. 3) is for three layers: Red, Green and Blue. We can use the first 2 dimensions to locate pixels and use the last dimension to access its color value, which is a rank 1 ndarray.

Now we can try to change the pixels. Assume we want to change the pixels with low color values:

1
2
3
4
5
6
7
8
# set a filter, it's an ndarray with the same size as photo_data, but it only consists of boolean values
# note that we can use np.logical_and() to combine two conditions: filter = np.logical_and(a>10,a<100)
low_value_filter = photo_data < 100
print(low_value_filter)
# set all pixels with low color values (lower than 100) to 0, for example, [50,200,255] ->[0,200,255]
photo_data[low_value_filter] = 0
plt.figure(figsize=(5,5))
plt.imshow(photo_data)
[[[ True  True  True]
  [ True  True  True]
  [ True  True  True]
  ...
  [ True  True  True]
  [ True  True  True]
  [ True  True  True]]]

png

It became darker, meanwhile it also became more contrast.

we can also use slicing to change colors in a range, instead of using boolean method:

1
2
3
4
5
6
photo_data[0:20, : , :] = [255,0,0]
photo_data[:, 0:20 , :] = [255,0,0]
photo_data[321:341,: , :] = [255,0,0]
photo_data[:, 400:420 , :] = [255,0,0]
plt.figure(figsize=(5,5))
plt.imshow(photo_data)

png

In some cases, we may want to mask images:

1
2
3
4
total_rows, total_cols, total_layers = photo_data.shape
# use no.ogid() to generate 2 ndarrays, the first one is a column vector, the second one is a row vector
X, Y = np.ogrid[:total_rows, :total_cols]
print(X.shape, Y.shape)
(341, 1) (1, 419)
1
2
3
4
5
6
7
8
9
10
center_row, center_col = total_rows / 2, total_cols / 2
# dist_from_center is a 2D ndarray that contains the distance from any pixel to the center pixel
dist_from_center = (X - center_row)**2 + (Y - center_col)**2
print(dist_from_center.shape)
# create a circle with r = 20
radius = 20**2
circular_mask = (dist_from_center > radius)
photo_data[circular_mask] = 0
plt.figure(figsize=(5,5))
plt.imshow(photo_data)
(341, 419)

png

We may also want to mask some particular colors, instead of geographical areas:

1
2
3
4
5
6
photo_data = imageio.imread('./firefox.png')
# Let's mask blue.
blue_mask = photo_data[:, : ,2] > 150
photo_data[blue_mask] = 0
plt.figure(figsize=(5,5))
plt.imshow(photo_data)

png

And…blue truned black, as expected.

Applications?

Above are just some basic operations using numpy to analyse images. In reality, like satellite image analysis, which can help detect wildfire or track burnt areas, is far much more complicated.

( PS: This notebook is basically my learning note in UCSanDiegoX course: Python for Data Science )