Python Quick Tip #2: Plotting Image Histograms

Trevor Lancon
May 22, 2020
4 min read

Updated: Jun 11, 2020

TL;DR

Here is the full script to load an image, inspect its histogram, then replot the histogram with vertical lines representing various percentages in the data:

import numpy as np
import matplotlib.pyplot as plt
from skimage import io
from skimage.util import img_as_ubyte
 
# You need to change this to a valid image file on your computer
input_image = 'C:/FILES/leaf.tif'
 
gray_image = io.imread(input_image, as_gray=True)
gray_image = img_as_ubyte(gray_image)
io.imshow(gray_image); io.show()
 
print(f"Image shape: {gray_image.shape} and size: {gray_image.size}")
print(f"Un'ravel'ed shape: {gray_image.ravel().shape} and size: {gray_image.ravel().size}")
 
ax = plt.hist(gray_image.ravel(), bins=256)
plt.show()
 
approx_median = np.percentile(gray_image, 50)
 
ax = plt.hist(gray_image.ravel(), bins=256)
plt.axvline(approx_median, color='orange')
plt.show()
 
quartiles = [25, 50, 75]
ax = plt.hist(gray_image.ravel(), bins=256)
for q in np.percentile(gray_image, quartiles):
    plt.axvline(q, color='orange')
plt.show()

You can also find this script on our GitHub.

Line by Line

Import the necessary functionality first. As a personal choice, I like to organize my imports using a consistent hierarchy:

Pure “import” statements
“import module as” statements
“from module import function” statement

Within each of these tiers I like to organize by increasing line length.

import numpy as np
import matplotlib.pyplot as plt
from skimage import io
from skimage.util import img_as_ubyte

Set a variable to a valid image file on your computer:

# You need to change this to a valid image file on your computer
input_image = 'C:/FILES/leaf.tif'

Load the image as grayscale and display it. This builds on what we did in Python Quick Tip #1.

gray_image = io.imread(input_image, as_gray=True)
gray_image = img_as_ubyte(gray_image)
io.imshow(gray_image); io.show()

NumPy provides a function called ravel that is used to decompose n-dimensional arrays into 1D arrays. Let’s use f-strings to inspect what happens when we use it with default parameters on our 2D image:

print(f"Image shape: {gray_image.shape} and size: {gray_image.size}")
print(f"Unraveled shape: {gray_image.ravel().shape} and size: {gray_image.ravel().size}")

This prints the following for my image:

Image shape: (1080, 1920) and size: 2073600
Unraveled shape: (2073600,) and size: 2073600

The size of the array is constant and equal to the product of the array dimensions regardless of whether it’s a 2D array or 1D. The important thing that is all elements of the original image are preserved!

We can create a histogram directly from the unraveled array by passing it to Matplotlib’s hist function. Just like when we used scikit-image’s imshow and show together in Python Quick Tip #1, we need to explicitly call Matplotlib’s show to ensure we see our plot object.

ax = plt.hist(gray_image.ravel(), bins=256)
plt.show()

Thresholding is an integral part of almost any quantitative image processing workflow. We’ll cover more ways to do this later, but for now, let’s assume that we want to show where a threshold of 50% would fall on our histogram. NumPy has a function perfect for this (again) called percentile. Let’s set a variable representing the median to the 50th percentile of our image:

approx_median = np.percentile(gray_image, 50)

Now we can add this value as a vertical line overlaid on our histogram using Matplotlib’s axvline to show what it looks like in our data!

First, create the histogram:

ax = plt.hist(gray_image.ravel(), bins=256)

Add the axvline by passing the variable containing our image’s median value. Also shown is an optional argument to color the line so that it stands out against the default histogram bar color:

plt.axvline(approx_median, color='orange')
plt.show()

This combination of NumPy’s percentile and Matplotlib’s axvline is even a bit more flexible than this. If you followed the link above to the documentation, you may have noticed that percentile also accepts lists as input. This returns a list of corresponding values from the array. For example, let’s inspect the quartiles of our histogram.

Make a list of quartiles:

quartiles = [25, 50, 75]

Build the histogram from your image:

ax = plt.hist(gray_image.ravel(), bins=256)

Iterate over each of the grayvalues corresponding to the 25th, 50th, and 75th percentiles of the image using NumPy’s percentile and add vertical lines to the plot:

for q in np.percentile(gray_image, quartiles):
    plt.axvline(q, color='orange')
plt.show()

Great! We see that the central 50% of the data is crowded around a main peak as expected.

Additional Thoughts

“Ravel” seems like a strange name to me, and I could not find documentation on why this function is named so. My own head canon to help me remember is that we use this function when we want to unravel an array.

In this case we do not care how the array is decomposed (i.e. in what “order” the axes are flattened). If you encounter a case in which you care about the decomposition order, you can pass a string to the “order” parameter as shown in the documentation. For this simple example of 2D arrays with the default unraveling order, you can see that the array is unraveled by rows:

Speaking of flattening, NumPy also has flatten and reshape methods that can achieve the same output. You can see a discussion about why ravel is more appropriate in most cases here and here.

You may wonder why you even need to flatten an array at all. This is based on how Matplotlib builds histograms. I am not the most well-versed in this, but if you pass a 2D array to matplotlib.pyplot.hist, you will get something resembling a histogram equivalent to a grouped column chart, where each row has its own accumulation in each bin. For actual images it takes a long time to build this type of plot, but you can make a small example array and try it yourself. See an example below where the 2D array is plotted on the left, and the equivalent 1D array is plotted on the right:

Percentiles may seem like an elementary method for calculating thresholds for images, but it’s more consistent than judging a manual threshold for each image. In the future we’ll revisit this exercise using other methods to set thresholds. In the meantime, percentiles are an interesting way to investigate your image histograms and may be of value on their own.

In the Context of Aivia

Our RGB to luminance recipe has an optional parameter "Show Histogram" that is displayed to the user before it runs. If the user decides to show the histogram, a snippet from the code explained above is used to show the histogram of the luminance channel before it is returned to Aivia:

if show_histogram==1:
    ax = plt.hist(gray_data.ravel(), bins=256)
    plt.show()

For a fun exercise, try adapting this recipe for yourself to show two histograms side by side: one with the combined RGB channels overlaid, and the other with the single luminance channel.

Another benefit of this within Aivia would be to visually check that you have not clipped any values when converting between bit depths within your Python recipe.