Friday, September 28, 2012

Reading Raster Data with Python and gdal

I am trying to learn Python for Geoprocessing. Here are some very basic notes on "playing" with Python/gdal/ogr. The online documentation is, I must say, rather confusing, so I try it step by step at the command line as below.

I follow the very useful lecture notes at and the tutorial at

Importing both gdal and gdalconst; just "import gdalconst" does not work (and I don't understand why right now...):

    >>> import gdal
    >>> from gdalconst import *

Defining the filename (assuming it's located in the current working directory of python -- use os.getced and os.chdir):

    >>> filename = 'ERS1PRI_19920430o04133tr481fr1989_AppOrb_Calib_Spk_SarsimTC_LinDB.tif'

Now the file can be opened, the driver only needs to be imported for write-access if I understand correctly:

    >>> dataset = gdal.Open(filename, GA_ReadOnly)

Typing "dataset" shows the pointer to the opened file:

    >>> dataset
    <osgeo.gdal.Dataset; proxy of <Swig Object of type 'GDALDatasetShadow *' at 0x0000000003550EA0> >

Various information can be retrieved from the opened file:

    >>> cols = dataset.RasterXSize
    >>> rows = dataset.RasterYSize
    >>> bands = dataset.RasterCount
    >>> driver = dataset.GetDriver().LongName

    >>> cols
    >>> rows
    >>> bands


Geoinformation can be retrieved with GetGeoTransform().

>>> geotransform = dataset.GetGeoTransform()

The variable "geotransform" now contains a list with Geoinformation:

    >>> geotransform
    (368745.92379062285, 20.0, 0.0, 8828671.611738198, 0.0, -20.0) 

The answer to what these values mean are found in the documentation:

    adfGeoTransform[0] /* top left x */
    adfGeoTransform[1] /* w-e pixel resolution */
    adfGeoTransform[2] /* rotation, 0 if image is "north up" */
    adfGeoTransform[3] /* top left y */
    adfGeoTransform[4] /* rotation, 0 if image is "north up" */
    adfGeoTransform[5] /* n-s pixel resolution */

and one can retrieve a single value from this list for example with

    >>> originX = geotransform[0]
    >>> originY = geotransform[3]
    >>> pixelWidth = geotransform[1]
    >>> pixelHeight = geotransform[5]
    >>> originX
    >>> originY
    >>> pixelWidth
    >>> pixelHeight

But how to get the individual data values in the file?

Get the band and read the first line:

    >>> band = dataset.GetRasterBand(1)

    >>> bandtype = gdal.GetDataTypeName(band.DataType)
    >>> bandtype
    >>> scanline = band.ReadRaster( 0, 0, band.XSize, 1,band.XSize, 1, band.DataType)

Since I was not sure what the "ReadRaster" parameters meant, google led me to this useful page:

The ReadRaster() call has the arguments: def ReadRaster(self, xoff, yoff, xsize, ysize, buf_xsize = None, buf_ysize = None, buf_type = None, band_list = None ): The xoff, yoff, xsize, ysize parameter define the rectangle on the raster file to read. The buf_xsize, buf_ysize values are the size of the resulting buffer. So you might say "0,0,512,512,100,100" to read a 512x512 block at the top left of the image into a 100x100 buffer (downsampling the image).
which I found here

Typing "scanline" gives me long lines of this:

    >>> scanline

Note that the returned scanline is of type string, and contains xsize*4 bytes of raw binary floating point data. To convert this into readable values use struct.unpack and instead you get long lines of float numbers:

    >>> import struct
    >>> value = struct.unpack('f' * band.XSize, scanline)
    >>> value
    (-1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, ....

Now I can get individual values, but all are in "one line" and not in an array:

     >>> value[8]

I rather read the whole file into an array:

    >>>data = band.ReadAsArray(0, 0, cols, rows)
    >>> value = data[3500,4000]
    >>> value

Using the numpy library I can define the datatype in the array:
   >>> import numpy
   >>> data = band.ReadAsArray(0, 0, dataset.RasterXSize, dataset.RasterYSize).astype(numpy.float)
   >>> value = data[3500,4000]
   >>> value

One has  to be careful not confusing column and rows! Matrix is value = data[row, column], and it starts with '0', so the value -8.30476 is located at y=row=3501 and x=column=4001.

To be continued....


  1. Hey Max,

    Any chance you've got x and y swapped on the very last set of identities in this blog?

    Seems like it should be y=row=3501,x=column=4001

  2. oh yes, indeed, you are right!

    thanks for reading carefully!


  3. Haha, no problem ;-) I was actually trying to figure out why some of my own gdal arrays weren't behaving as I expected, so in the process of checking my own dimensions I ended up checking yours!

    Have you had any good experiences working with python/gdal and long raster time series?

  4. >Have you had any good experiences working with python/gdal and long raster time series?

    Works really fine for me and getting into it --- I do processing of large numbers of images with Python, gdal and ESA NEST called from Python. Like this one extracting images containing area of interest and creating then calibrated, map projected GeoTIFFS for all Radarsat images (>100) in a folder:

  5. oh cool! I hadn't seen the NEST library before; will keep that in mind for any future radar project.

    How are its interferometric tools?

  6. hello max,

    your code for read the tiff image using gdal was perfectly working.i want to detect the lunar's crater (TIFF image).and i am going to use the hough transform algorithm.please help me for to work with that.and also i am going to find the depth and diameter of the crater.

  7. Dear Prabakaran -- I don't think I can help you there, but try searching for example or post a question there. Greetings Max

  8. Hello Max,

    I tried to use your tutorial to read .bil data from BioClim, and unfortunately it did not work. When I tried to convert the data with struct.unpack, I got this error message:
    struct.error: unpack requires a string argument of length 172800
    Therefore, I checked the length of scanline and found 86400. Maybe this has something to do with the band type (which is Int16 and not Float32), what do you think?
    How can I solve this issue and read my data?

    1. Not sure about this, all formats for the method above are found here but not BIL. Have not heard about this one before

      Here they say to convert it to Geotiff ... Here I see how

      Apparently use gdal_translate filename.bil filename.tif and then use the tiff as I use it above

    2. Thanks for your quick answer, Unfortunately I translated the file thanks to gdal_translate, but I still get the same error message, the same band type and the length of the scanline remains the same. Are the tiffs you read tiled or stripped ? Or do you know if there is a method to convert the band type ?

    3. Hi again -- band type conversion is all gdal_translate --- this could be many things, maybe a corrupt file -- you could try posting your question at

    4. by accident I needed myself to read a *.bin" file these days, and information that helped me I found here

      I think I write a short post on how I did it

    5. You should use fmttypes in:

      values = struct.unpack(fmttypes[BandType] * band.YSize, scanline)

      The fmttypes could be defined in a dictionary:

      fmttypes = {'Byte':'B', 'UInt16':'H', 'Int16':'h', 'UInt32':'I', 'Int32':'i', 'Float32':'f', 'Float64':'d'}

      In my Blog (in spanish) there are examples with complete code and work perfectly.

  9. Hi Max
    your script works very good , but is there a way to get all this informations in a file .txt ?

    1. which information you want written into a text file? For rows and columns you could use

      textfile = open( outputtextfile, 'w')
      textfile.write( 'rows: ' + str(rows) + ' cols: '+ str(cols) + ' \n\n' )

  10. I need to convert a 2-d array,extracted from a raster image, into a text file. Text file should contain individual pixel values in different bands in that .tif image.

    1. the answer to the comment just above this one should give a hint on how to proceed? Otherwise check the mentioned tutorial at

  11. i need python code to show the values of each pixel in diffrent bands of a raster image (.tif file) into one single 2-d array...that new array will contain the bands as its columns..
    i am new to python so pls help.

    1. similar to the question answered before, loop through each pixel, read it and write it to a text-file ...

  12. please concatenate the python code to write that new array into a new textfile too.

    1. you may check at for such question ... as mentioned, with the comments above and the tutorial at you will find out ... I learned all what I need using this tutorial

  13. can you please provide me the code to carry on that looping. How would i read pixels and their values in four arrays of individual bands.

    1. as mentioned before, please check the tutorial above, in there exactly these things are explained ...

  14. This comment has been removed by the author.

  15. Hello Max
    Are the information contained into data array relative to height of terrain?

    1. Hi Alessandro,

      if I understand your question correctly ... then this depends on the kind of raster array you have. If it is a digital elevation model, than the vallue at a raster point is the height of the terrain. If it is an image, then it is a brightness or backscatter or albedo value ...

      is that what you meant?


    2. Thanks Max.
      I have a DEM file than the value contained into the array "data" represents a height map.
      More general, does it depend from the scope of specific application?

    3. Hi again, I am unfortunately not sure what your questioms aims at ... An array or raster in Python or gdal is simply a mathematical array like a matrix. What the array contains is up to your data source or interest. Using gdal, you are interested in a geographical array. But the values at a givem point can be anything. Surface temperature, elevation, albedo or backscatter in case of a visual image, wind speed etc etc ... The gdal array is simply meant to be an array where a given raster point at a geographic location has some physical property described by the array value ... Python allows to work with arrays, gdal adds geographical ability like applying map projection etc etc ... Is that what you were after?


  16. Hello Sir,
    How should I get all the pixel values of a tiff image? I m trying to get all the pixel values of land-use land cover image having 11 classes. I m new to the python programming.

    1. sorry, your question is not clear to me ... with the command above "data = band.ReadAsArray(0, 0, cols, rows)" the array "data" contains all the pixel values of the image. Otherwise have a look at the lecture notes linked at the beginning of this post

  17. hi....
    i want to show list of bands and their values. how to do this using gdal

  18. h I really find your blog useful, can you guide on extraction of calibrated value for radarsat 2 LUT files using this equation 𝑐𝑎𝑙𝑖𝑏𝑟𝑎𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 = 𝑑𝑖𝑔𝑖𝑡𝑎𝑙 𝑣𝑎𝑙𝑢𝑒2 + 𝐵 /𝐴
    I have three LUT files (gamma, beta and alpha) and product.xml file, how do I extract calibrated value out of that? Any python liberty that you can suggest?

  19. Hi
    Thanks for such a great post. I have a question-

    band1_12 = dataset1_12.GetRasterBand(1)
    max = band1_12.GetMaximum() // this gives a value of 2425

    data1_12 = np.array(band1_12.ReadAsArray(0,0,band1_12.XSize, band1_12.YSize))
    print(np.amax(data1_12)) // this gives 1977

    Can you please tell why there is a difference and how to rectify it in python.