Sunday, September 30, 2012

Writing Raster Data with Python / gdal

So now I have a SAR image in a an array called "data" as described in the previous post I want to save it, let's say, in an ERDAS img data format? This is basically converting the file format:

 I first get and register the appropriate driver for the output file:

        >>> driver = gdal.GetDriverByName('HFA')
    >>> driver.Register()

I want to use the same size from the input file which is in "dataset" as in the previous post  

    >>> cols = dataset.RasterXSize
    >>> rows = dataset.RasterYSize
    >>> bands = dataset.RasterCount
    >>> datatype = band.DataType

Then Creating the output file:

    >>> outDataset = driver.Create('ERS1PRI_19920430.pix', cols, rows, bands, datatype)

I want my outputfile to have the same georeferencing and projection information as the input file (the "0" shows the execution went fine):

        >>> geoTransform = dataset.GetGeoTransform()
    >>> outDataset.SetGeoTransform(geoTransform )
    >>> proj = dataset.GetProjection()
    >>> outDataset.SetProjection(proj)

That has to be assigned before writing the data to the output band, otherwise it's not in the file!

Before writing the data, I have to get the band from the newly created file:

     >>> outBand = outDataset.GetRasterBand(1)

Now I can write the input image to the new output image:

    >>> outBand.WriteArray(data, 0, 0)

clean variables and close files
      >>> dataset = None
   >>> band = None
   >>> outBand = None
   >>> outDataset = None

[still to find out: got this to work for PCI format, with ENVI hdr the image but not the geoinformation was transferred and Erdas img did not work -- file too big?]

Friday, September 28, 2012

Reading Raster Data with Python and gdal

I am trying to learn Python for Geoprocessing. Here are some very basic notes on "playing" with Python/gdal/ogr. The online documentation is, I must say, rather confusing, so I try it step by step at the command line as below.

I follow the very useful lecture notes at and the tutorial at

Importing both gdal and gdalconst; just "import gdalconst" does not work (and I don't understand why right now...):

    >>> import gdal
    >>> from gdalconst import *

Defining the filename (assuming it's located in the current working directory of python -- use os.getced and os.chdir):

    >>> filename = 'ERS1PRI_19920430o04133tr481fr1989_AppOrb_Calib_Spk_SarsimTC_LinDB.tif'

Now the file can be opened, the driver only needs to be imported for write-access if I understand correctly:

    >>> dataset = gdal.Open(filename, GA_ReadOnly)

Typing "dataset" shows the pointer to the opened file:

    >>> dataset
    <osgeo.gdal.Dataset; proxy of <Swig Object of type 'GDALDatasetShadow *' at 0x0000000003550EA0> >

Various information can be retrieved from the opened file:

    >>> cols = dataset.RasterXSize
    >>> rows = dataset.RasterYSize
    >>> bands = dataset.RasterCount
    >>> driver = dataset.GetDriver().LongName

    >>> cols
    >>> rows
    >>> bands


Geoinformation can be retrieved with GetGeoTransform().

>>> geotransform = dataset.GetGeoTransform()

The variable "geotransform" now contains a list with Geoinformation:

    >>> geotransform
    (368745.92379062285, 20.0, 0.0, 8828671.611738198, 0.0, -20.0) 

The answer to what these values mean are found in the documentation:

    adfGeoTransform[0] /* top left x */
    adfGeoTransform[1] /* w-e pixel resolution */
    adfGeoTransform[2] /* rotation, 0 if image is "north up" */
    adfGeoTransform[3] /* top left y */
    adfGeoTransform[4] /* rotation, 0 if image is "north up" */
    adfGeoTransform[5] /* n-s pixel resolution */

and one can retrieve a single value from this list for example with

    >>> originX = geotransform[0]
    >>> originY = geotransform[3]
    >>> pixelWidth = geotransform[1]
    >>> pixelHeight = geotransform[5]
    >>> originX
    >>> originY
    >>> pixelWidth
    >>> pixelHeight

But how to get the individual data values in the file?

Get the band and read the first line:

    >>> band = dataset.GetRasterBand(1)

    >>> bandtype = gdal.GetDataTypeName(band.DataType)
    >>> bandtype
    >>> scanline = band.ReadRaster( 0, 0, band.XSize, 1,band.XSize, 1, band.DataType)

Since I was not sure what the "ReadRaster" parameters meant, google led me to this useful page:

The ReadRaster() call has the arguments: def ReadRaster(self, xoff, yoff, xsize, ysize, buf_xsize = None, buf_ysize = None, buf_type = None, band_list = None ): The xoff, yoff, xsize, ysize parameter define the rectangle on the raster file to read. The buf_xsize, buf_ysize values are the size of the resulting buffer. So you might say "0,0,512,512,100,100" to read a 512x512 block at the top left of the image into a 100x100 buffer (downsampling the image).
which I found here

Typing "scanline" gives me long lines of this:

    >>> scanline

Note that the returned scanline is of type string, and contains xsize*4 bytes of raw binary floating point data. To convert this into readable values use struct.unpack and instead you get long lines of float numbers:

    >>> import struct
    >>> value = struct.unpack('f' * band.XSize, scanline)
    >>> value
    (-1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, ....

Now I can get individual values, but all are in "one line" and not in an array:

     >>> value[8]

I rather read the whole file into an array:

    >>>data = band.ReadAsArray(0, 0, cols, rows)
    >>> value = data[3500,4000]
    >>> value

Using the numpy library I can define the datatype in the array:
   >>> import numpy
   >>> data = band.ReadAsArray(0, 0, dataset.RasterXSize, dataset.RasterYSize).astype(numpy.float)
   >>> value = data[3500,4000]
   >>> value

One has  to be careful not confusing column and rows! Matrix is value = data[row, column], and it starts with '0', so the value -8.30476 is located at y=row=3501 and x=column=4001.

To be continued....