In a first step you load the satellite image into a raster, but then flatten this raster to convert the row x column raster into a one-dimensional array

>>> flatsarraster = sarraster.flatten() >>> sarraster.shape (1486, 1109) >>> flatsarraster.shape (1647974,) >>> sarraster array([[-17.93579102, -17.26957893, -17.6971817 , ..., -3.9816401 , -4.78030491, -5.49047613], [-15.8222208 , -18.36615753, -17.06410408, ..., -5.16635323, -4.9837966 , -5.192698 ], [-13.59493637, -15.24811172, -13.07635021, ..., -3.1499095 , -3.1499095 , -5.5106535 ], ..., [-15.59587097, -15.20659828, -14.20381069, ..., -6.38638067, -6.44060993, -3.72703171], [-15.73380089, -15.21032333, -14.21171951, ..., -5.8094368 , -6.5339241 , -4.01910543], [-15.65506649, -14.98183632, -14.98183632, ..., -6.26982498, -5.8094368 , -4.90749454]], dtype=float32) >>> flatsarraster array([-17.93579102, -17.26957893, -17.6971817 , ..., -6.26982498, -5.8094368 , -4.90749454], dtype=float32)

The command centroids, variance = kmeans(flatsarraster, [number of cluster]) then uses the flattened raster and calculates the centroids of the wanted numbers of clusters, in this case 8 clusters:

>>> from scipy.cluster.vq import * >>> centroids, variance = kmeans(flatsarraster, 8) >>> centroids array([ 0.44888815, -11.17466545, -20.26683044, -5.95262289, -13.68685532, -8.66477108, -3.06361866, -16.28946304], dtype=float32)

Now the command code, distance = vq(flatsarraster, centroids) gives a one dimensional array "code" of the same length as the flattened raster, so each position does not contain the number of the original image, but the number of the cluster it belongs to.

>>> flatsarraster array([-17.93579102, -17.26957893, -17.6971817 , ..., -6.26982498, -5.8094368 , -4.90749454], dtype=float32) >>> code array([7, 7, 7, ..., 3, 3, 3]) >>>

So in a third step we reshape this array "code" into the "row x column" raster and get a image showing the clusters.

codeim = code.reshape(sarraster.shape[0], sarraster.shape[1])

Here the code in total, in this case creating a loop to classify 2 to 8 clusters:

# -*- coding: utf-8 -*- import gdal, gdalconst from scipy.cluster.vq import * import matplotlib.pyplot as plt # Read SAR image into Numpy Array filename = "//mnt//glaciology//processedSARimages//SARimage.tif" sarfile = gdal.Open(filename, gdalconst.GA_ReadOnly) sarraster = sarfile.ReadAsArray() # Flatten image to get line of values flatsarraster = sarraster.flatten() # Create figure to receive results fig = plt.figure() fig.suptitle('K-Means Classification') # In first subplot add original SAR image ax = plt.subplot(241) plt.axis('off') ax.set_title('Original Image') plt.imshow(sarraster, cmap = 'gray') # In remaining subplots add k-means classified images for i in range(7): print "Calculate k-means with ", i+2, " cluster." #This scipy code classifies k-mean, code has same length as flattened #SAR raster and defines which class the SAR value corresponds to centroids, variance = kmeans(flatsarraster, i+2) code, distance = vq(flatsarraster, centroids) #Since code contains the classified values, reshape into SAR dimensions codeim = code.reshape(sarraster.shape[0], sarraster.shape[1]) #Plot the subplot with (i+2)th k-means ax = plt.subplot(2,4,i+2) plt.axis('off') xlabel = str(i+2) , ' clusters' ax.set_title(xlabel) plt.imshow(codeim) plt.show()

The resulting images, each color indicating a cluster:

nice example. I like the automated comparison of different cluster sizes and the fact that the whole thing doesn't need much lines of code. Thanks!

ReplyDeleteNice GIS blog. Max, I am new to python and numpy. I have 4 arrays avgb, lcl, lcm, lch. What i want to do is replace values in avgb with corresponding values from lcl, lcm and lch based on conditions:

ReplyDelete10<avgb<30 -----replace corresponding cell values in avgb from lcl,

30<avgb<=46 -----replace values from lcm,

46<avgb<75-----replace values from lch.

My current code is:

-------------------------------------

37 proj = srca.GetProjection()

38 newdata.SetProjection (proj)

39 newband = newdata.GetRaster5and(i)

40 newband.WriteArray(avgb,0,0)

41 outarr = numpy.zeros ((rows,cols),numpy.float32)

42 dratio = (band_a/band_d)

43 ici = 959.35_975.22*(dratio)

44 1cm = 1226.58_1245.48*(dratio)

45 ich = 1607.24_1624.i4*(dratio)

46 checka = numpy.where((10<avgb) & (avgb<30))

47 avgb[checka]=1

48 checkb = numpy.where((30<avgb) a (avgb<=46))

49 avgb[checkbj=2

50 checkc = numpy.where((46<avgb) a (avgb<75))

51 avgb[checkc)=3

------------------------------------

Instead of 1,2,3 in line nos 47,49,51, what should i write to satisfy the above criteria/conditions? Can you give me some ideas/inputs on how to do it in python/numpy?

This comment has been removed by the author.

DeleteI see that you are not using numpy.where correctly. It is numpy.where ( [condition], [if TRUE do this], [if FALSE do this] )

Deleteso it is something like

numpy.where((10<avgb) & (avgb<30), avgb = lcl, avgb)

I have described a similar problem at this blogpost where based on a condition in one raster, another is modified. With that you should be able to adjust your code: http://geoinformaticstutorial.blogspot.no/2014/01/raster-calculations-with-numpywhere.html

Since this is a very general question, unrelated to this blog post, you should check out gis.stackexchange.com for posting such question, where you will have more readers and replies.

Awesome!!! Thanks Max, It worked!!!

ReplyDeleteExcellent post.. :) but when I'm reading my .tif file , the dataset is getting the shape (3, 475, 527) instead of (475, 527) . Why that is happening ?

ReplyDelete