Tuesday, February 23, 2016

K-Means Clustering of a Satellite Images using Scipy

Scipy can be used nicely to apply k-means classification on a satellite image. Most blog entries I find (such as this here) describe how to apply k-means to x/y data but not how to get a image classified into these k-means clusters.

In a first step you load the satellite image into a raster, but then flatten this raster to convert the row x column raster into a one-dimensional array

>>> flatsarraster = sarraster.flatten()
>>> sarraster.shape
(1486, 1109)
>>> flatsarraster.shape
(1647974,)
>>> sarraster
array([[-17.93579102, -17.26957893, -17.6971817 , ...,  -3.9816401 ,
         -4.78030491,  -5.49047613],
       [-15.8222208 , -18.36615753, -17.06410408, ...,  -5.16635323,
         -4.9837966 ,  -5.192698  ],
       [-13.59493637, -15.24811172, -13.07635021, ...,  -3.1499095 ,
         -3.1499095 ,  -5.5106535 ],
       ..., 
       [-15.59587097, -15.20659828, -14.20381069, ...,  -6.38638067,
         -6.44060993,  -3.72703171],
       [-15.73380089, -15.21032333, -14.21171951, ...,  -5.8094368 ,
         -6.5339241 ,  -4.01910543],
       [-15.65506649, -14.98183632, -14.98183632, ...,  -6.26982498,
         -5.8094368 ,  -4.90749454]], dtype=float32)
>>> flatsarraster
array([-17.93579102, -17.26957893, -17.6971817 , ...,  -6.26982498,
        -5.8094368 ,  -4.90749454], dtype=float32)

The command centroids, variance = kmeans(flatsarraster, [number of cluster]) then uses the flattened raster and calculates the centroids of the wanted numbers of clusters, in this case 8 clusters:
>>> from scipy.cluster.vq import *
>>> centroids, variance = kmeans(flatsarraster, 8)
>>> centroids
array([  0.44888815, -11.17466545, -20.26683044,  -5.95262289,
       -13.68685532,  -8.66477108,  -3.06361866, -16.28946304], dtype=float32)

 Now the command code, distance = vq(flatsarraster, centroids) gives a one dimensional array "code" of the same length as the flattened raster, so each position does not contain the number of the original image, but the number of the cluster it belongs to.

>>> flatsarraster
array([-17.93579102, -17.26957893, -17.6971817 , ...,  -6.26982498,
        -5.8094368 ,  -4.90749454], dtype=float32)
>>> code
array([7, 7, 7, ..., 3, 3, 3])
>>> 

So in a third step we reshape this array "code" into the "row x column" raster and get a image showing the clusters.

codeim = code.reshape(sarraster.shape[0], sarraster.shape[1])

Here the code in total, in this case creating a loop to classify 2 to 8 clusters:

# -*- coding: utf-8 -*-


import gdal, gdalconst
from scipy.cluster.vq import *
import matplotlib.pyplot as plt

# Read SAR image into Numpy Array
filename = "//mnt//glaciology//processedSARimages//SARimage.tif"
sarfile = gdal.Open(filename, gdalconst.GA_ReadOnly)
sarraster = sarfile.ReadAsArray()

# Flatten image to get line of values
flatsarraster = sarraster.flatten()

# Create figure to receive results
fig = plt.figure()
fig.suptitle('K-Means Classification')

# In first subplot add original SAR image
ax = plt.subplot(241)
plt.axis('off')
ax.set_title('Original Image')
plt.imshow(sarraster, cmap = 'gray')

# In remaining subplots add k-means classified images
for i in range(7):
    print "Calculate k-means with ", i+2, " cluster."
    
    #This scipy code classifies k-mean, code has same length as flattened
    #SAR raster and defines which class the SAR value corresponds to
    centroids, variance = kmeans(flatsarraster, i+2)
    code, distance = vq(flatsarraster, centroids)
    
    #Since code contains the classified values, reshape into SAR dimensions
    codeim = code.reshape(sarraster.shape[0], sarraster.shape[1])
    
    #Plot the subplot with (i+2)th k-means
    ax = plt.subplot(2,4,i+2)
    plt.axis('off')
    xlabel = str(i+2) , ' clusters'
    ax.set_title(xlabel)
    plt.imshow(codeim)
    
plt.show()

The resulting images, each color indicating a cluster:

5 comments:

  1. nice example. I like the automated comparison of different cluster sizes and the fact that the whole thing doesn't need much lines of code. Thanks!

    ReplyDelete
  2. Nice GIS blog. Max, I am new to python and numpy. I have 4 arrays avgb, lcl, lcm, lch. What i want to do is replace values in avgb with corresponding values from lcl, lcm and lch based on conditions:

    10<avgb<30 -----replace corresponding cell values in avgb from lcl,

    30<avgb<=46 -----replace values from lcm,

    46<avgb<75-----replace values from lch.

    My current code is:
    -------------------------------------
    37 proj = srca.GetProjection()
    38 newdata.SetProjection (proj)
    39 newband = newdata.GetRaster5and(i)
    40 newband.WriteArray(avgb,0,0)
    41 outarr = numpy.zeros ((rows,cols),numpy.float32)
    42 dratio = (band_a/band_d)
    43 ici = 959.35_975.22*(dratio)
    44 1cm = 1226.58_1245.48*(dratio)
    45 ich = 1607.24_1624.i4*(dratio)
    46 checka = numpy.where((10<avgb) & (avgb<30))
    47 avgb[checka]=1
    48 checkb = numpy.where((30<avgb) a (avgb<=46))
    49 avgb[checkbj=2
    50 checkc = numpy.where((46<avgb) a (avgb<75))
    51 avgb[checkc)=3
    ------------------------------------
    Instead of 1,2,3 in line nos 47,49,51, what should i write to satisfy the above criteria/conditions? Can you give me some ideas/inputs on how to do it in python/numpy?

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. I see that you are not using numpy.where correctly. It is numpy.where ( [condition], [if TRUE do this], [if FALSE do this] )
      so it is something like
      numpy.where((10<avgb) & (avgb<30), avgb = lcl, avgb)

      I have described a similar problem at this blogpost where based on a condition in one raster, another is modified. With that you should be able to adjust your code: http://geoinformaticstutorial.blogspot.no/2014/01/raster-calculations-with-numpywhere.html

      Since this is a very general question, unrelated to this blog post, you should check out gis.stackexchange.com for posting such question, where you will have more readers and replies.

      Delete
  3. Awesome!!! Thanks Max, It worked!!!

    ReplyDelete