Monday, October 29, 2012

Reprojecting a shapefile with GDAL/OGR and Python

I needed to reproject shapefiles -- unprojected lat/long WGS84 -- and once again the page Geoprocessing with Python using Open Source GIS was of great help, so the credit goes to this page, which is very worthwhile studying as an introduction to using gdal/ogr and Python.

The code below is an adjustment of homework 2b in the page above for my case.

I tried to add explanations to the original code so that it's easy to follow by just reading it -- hope I'm right....



# -*- coding: utf-8 -*-  
# Import Modules 
import ogr, osr, os, sys 
 
# set file names 
infile = 'C:\Users\max\Documents\Icecharts\Data\IceChart_2012\ice20120608.shp' 
outfile = 'C:\Users\max\Documents\Icecharts\Data\IceChart_2012\ice20120608_epsg3575.shp' 
 
#get path and filename seperately 
(outfilepath, outfilename) = os.path.split(outfile) 
#get file name without extension            
(outfileshortname, extension) = os.path.splitext(outfilename)   #get file name without extension 
 
# Spatial Reference of the input file 
# Access the Spatial Reference and assign the input projection 
inSpatialRef = osr.SpatialReference() 
inSpatialRef.ImportFromEPSG(4326)         # unprojected WGS84 
 
# Spatial Reference of the output file 
# Access the Spatial Reference and assign the output projection 
# UTM 33N which we use for all Spitsbergen does not work since area too far  
# outside UTM 33 
outSpatialRef = osr.SpatialReference() 
 outSpatialRef.ImportFromEPSG(3575) # EPSG:3575: WGS 84 / North Pole LAEA Europe 
  
# create Coordinate Transformation 
coordTransform = osr.CoordinateTransformation(inSpatialRef, outSpatialRef) 
 
# Open the input shapefile and get the layer 
driver = ogr.GetDriverByName('ESRI Shapefile') 
indataset = driver.Open(infile, 0) 
if indataset is None: 
    print ' Could not open file' 
    sys.exit(1) 
inlayer = indataset.GetLayer() 
 
# Create the output shapefile but check first if file exists 
if os.path.exists(outfile): 
    driver.DeleteDataSource(outfile) 
 
outdataset = driver.CreateDataSource(outfile) 
 
if outfile is None: 
    print ' Could not create file' 
    sys.exit(1) 
outlayer = outdataset.CreateLayer(outfileshortname, geom_type=ogr.wkbPolygon) 
 
# Get the FieldDefn for attributes ICE_TYPE and ID and add to output shapefile 
# (which i know are in my file) 
feature = inlayer.GetFeature(0) 
fieldDefn1 = feature.GetFieldDefnRef('ICE_TYPE') 
fieldDefn2 = feature.GetFieldDefnRef('ID') 
 
outlayer.CreateField(fieldDefn1) 
outlayer.CreateField(fieldDefn2) 
 
# get the FeatureDefn for the output shapefile 
featureDefn = outlayer.GetLayerDefn() 
 
#Loop through input features and write to output file 
infeature = inlayer.GetNextFeature() 
while infeature: 
  
    #get the input geometry 
    geometry = infeature.GetGeometryRef() 
  
    #reproject the geometry, each one has to be projected seperately 
    geometry.Transform(coordTransform) 
  
    #create a new output feature 
    outfeature = ogr.Feature(featureDefn) 
  
    #set the geometry and attribute 
    outfeature.SetGeometry(geometry) 
    outfeature.SetField('ICE_TYPE', infeature.GetField('ICE_TYPE')) 
    outfeature.SetField('ID', infeature.GetField('ID')) 
  
    #add the feature to the output shapefile 
    outlayer.CreateFeature(outfeature) 
  
    #destroy the features and get the next input features 
    outfeature.Destroy 
    infeature.Destroy 
    infeature = inlayer.GetNextFeature() 
  
#close the shapefiles 
indataset.Destroy() 
outdataset.Destroy() 
 
#create the prj projection file 
outSpatialRef.MorphToESRI() 
file = open(outfilepath + '\\'+ outfileshortname + '.prj', 'w') 
file.write(outSpatialRef.ExportToWkt()) 
file.close() 
 
#end






The unprojected file:

The projected shape file:

Friday, October 26, 2012

Accessing vertices from a polygon with OGR/gdal and Python

It took me a bit to figure out how to access the single vertices of a polygon using gdal/ogr, but I found the missing key towards the answer finally on this page

The shapefile contains one or more layers. These layers again can contain one or more features, each having attributed and a geometry, which can be an polygon, for example. However, this geometry, being a polygon, contains another geometry, a "Linear Ring", and it is this geometry deep down I have to access to get individual points.

It's seen in the gml notation, where the polygon contains a Linear Ring:

     <gml:Polygon>
         <gml:outerBoundaryIs>
                 <gml:LinearRing>
                         <gml:posList>0,0 100,0 100,100 0,100 0,0</gml:posList>
                 </gml:LinearRing>
        </gml:outerBoundaryIs>
     </gml:Polygon>


So how to dig down to this Linear Ring and thus the individual vertices?

How to open shapefiles is nicely described in lecture one on this very helpful page  :

     import ogr, sys, os
     fn = 'C:/Users/max/Documents/GlacierMasks2000s_CryoClimMøte.shp'
     driver = ogr.GetDriverByName('ESRI Shapefile')
     dataSource = driver.Open(fn, 0)


I can check how many layers my file contains:

     >>> dataSource.GetLayerCount()
     1
     >>>


and since it is only one layer, I proceed to access this layer. Being one layer only, it has the index "0":

     >>> layer = dataSource.GetLayer(0)

I can check how many features this layer contains:

     >>> layer.GetFeatureCount()
     1560
     >>>


For this example I only want to get one of the features, also checking some info about it. The item list is the attribute table entry for this feature:

        >>> feature = layer.GetFeature(44)
    >>> feature.GetFieldCount()
    21
    >>> feature.items()
    {'Comment': None, 'Y_cent': 8869558.8657, 'SatelliteI': 'SPI 08-027    Svalbard', 'IDENT': 22210.0, 'OBJECTID_1': 45, 'NAME': 'Idunbreen', 'OBJECTID': 46, 'NumLines': 2, 'Shape_Area': 333164842.804, 'X_cent': 592561.435311, 'SOURCE2': 'SPIRIT-SPOT DEM, ASTER DEM (GDEM), NPI DEM, Unpubl SPRI Interferometric velocities', 'SOURCE': 'SPOT5HRS-IPY_SPIRIT_PROJECT (Korona et. al.,2009)', 'YEAR_': 2008, 'AvgLength': 24053.902741, 'DDMM': 1408, 'MaxLength': 25786.323167, 'New_additi': 0, 'Shape_Le_1': 87176.4155756, 'TIDEWATER': 1, 'ANALYST': 'Rickard Pettersson, Uppsala University, Sweden', 'Shape_Leng': 87176.415576}


But I want to access the geometry in this feature, finding out if it is a point, a polygon or...

       >>> geometry = feature.GetGeometryRef()
    >>> geometry.GetGeometryCount()
    1
    >>> geometry.GetGeometryName()
    'POLYGON'
    >>>


At this point I thought, I already could access the vertices of the polygon, but this does not work:

      >>> geometry.GetPointCount()
   0
   >>> geometry.GetX()
   0.0
   ERROR 6: Incompatible geometry for operation
   >>> 


I have to dig one level deeper and access the geometry contained within the polygon:

      >>> ring = geometry.GetGeometryRef(0)
   >>> ring.GetGeometryName()
   'LINEARRING'
   >>> ring.GetPointCount()
   578
   >>> ring.GetPoint(500)
   (599109.9332999997, 8882494.9067, 0.0)
   >>> 


The last line is the latitude, longitude and elevation information for point number 500.

Finally a code collecting all of this and looping through the points and writing them into a text file:


import ogr, os, sys


# Open Shapefile

fn = 'C:/Users/max/Documents/GlacierMasks2000s_CryoClimMøte.shp'
driver = ogr.GetDriverByName('ESRI Shapefile')
dataSource = driver.Open(fn, 0)

# Get Layer
print 'The file contains ', dataSource.GetLayerCount(), ' Layers'
layer = dataSource.GetLayer(0)

# Get Features
print layer.GetName(), ' contains ', layer.GetFeatureCount(), ' features'
feature = layer.GetFeature(44)

# Get Geometry
geometry = feature.GetGeometryRef()
print ' Feature contains the Geometry', geometry.GetGeometryName()
print ' It contains', geometry.GetGeometryCount(), geometry.GetGeometryName()

# Get Geometry inside Geometry
ring = geometry.GetGeometryRef(0)
print geometry.GetGeometryName(), ' contains the Geometry', ring.GetGeometryName()
print ' It contains', ring.GetPointCount(), ' points in a ', ring.GetGeometryName()

# Write points in Vectors and Textfile
pointsX = []; pointsY = []; pointsZ = []
numpoints = ring.GetPointCount()

file = open(r'C:\Users\max\Documents\PythonProjects\text.txt','w')

for p in range(numpoints):
        lon, lat, z = ring.GetPoint(p)
        pointsX.append(lon)
        pointsY.append(lat)
        pointsZ.append(z)
        x = [str(lon), str(lat), str(z), '\n']
        file.writelines(x)
  
print ' the first 5 points ', pointsX[0:10], pointsY[0:10] , pointsZ[0:10]      

dataSource = None
file.close()

#end



Matlab comparison

In Matlab I can read the same shapefile with

    >> shape = shaperead('C:/Users/max/Documents/GlacierMasks2000s_CryoClimMøte.shp')

   shape =

   1560x1 struct array with fields:
    Geometry
     BoundingBox
    X
    Y
    OBJECTID_1
    OBJECTID
    NAME
    Comment
    IDENT
    New_additi
    YEAR_
    DDMM
    ANALYST
    SOURCE
    SOURCE2
    SatelliteI
    Shape_Leng
    X_cent
    Y_cent
    TIDEWATER
    MaxLength
    AvgLength
    NumLines
    Shape_Le_1
    Shape_Area



To access the same polygon as in the example above, Matlab uses the following command. But be aware, Python counts the indices from "0" and Matlab from "1". THe "feature.item()" command in Python above shows the OBJECTID of 46 for feature 44, which in Matlab is feature 45: >> shape(45).OBJECTID

ans =

    46

>> length( shape(45).X )

ans =

   579 


The X and Y in the struct array corresponds to the vertices called pointX and pointY in the Python code above. The attribute fields are accessed in Python via feature.item and others, whil in Matlab with shape(featurenumber).OBJECTID and so on.


Wednesday, October 3, 2012

Creating a Subset of Raster Data with Python / gdal

Continuing to play with Python and gdal...  Reading a GeoTIFF, subsetting it and saving as ENVI file.

First importing the libraries and setting working directory:

   >>> import gdal
   >>> from gdalconst import *
   >>> import os
   >>> os.chdir('C:\Users\max\Documents\PythonProjects')


Opening the file:

   >>> filename = 'ERS1PRI_19920430o04133tr481fr1989_AppOrb_Calib_Spk_SarsimTC_LinDB.tif'
   >>> dataset = gdal.Open(filename, GA_ReadOnly)


I get the band from the opened file. The ReadAsArray with the offset in x and y direction and the x dimensions and y dimensions of the subset:

    >>> band = dataset.GetRasterBand(1)
  >>> subset = band.ReadAsArray(3500, 3500, 50, 50)


The subset can be inspected:

   >>> subset
  array([[-11.49956703, -10.51208401,  -9.58573532, ..., -11.87286568,
        -12.77604961, -15.28282642],
       [-13.6348238 , -14.13398838, -13.61361694, ..., -13.09877014,
        -13.3733263 , -15.26138687],
       [-14.01767826, -12.75454712, -12.39109707, ..., -13.30418301,
        -14.24573517, -15.0796051 ],
       ...,
       [ -9.57748413, -11.63577938, -14.19717503, ...,   3.46974254,
          0.29983696,   0.51451778],
       [-11.11393261, -12.06205177, -13.13371372, ...,   2.4157486 ,
          3.08015943,   0.57344818],
       [-11.68845177, -11.69376755, -11.36115456, ...,   1.95159626,
          1.82419062,   3.39742208]], dtype=float32)
 



Now I register the ENVI driver and Create the ENVI file using the dimensions of the subset:

    >>> driver = gdal.GetDriverByName('ENVI')
  >>> driver.Register()
  >>> outDataset = driver.Create('ERS1PRI_19920430_ENVI_subset', 50, 50, 1, gdal.GDT_Float32)


Finally I get the band of the created ENVI file and write the subset array into the file. After that I can close the files.
   >>> outBand = outDataset.GetRasterBand(1)
   >>> outBand.WriteArray(subset, 0, 0)
   >>> band = None

   >>> dataset = None
   >>> outDataset = None
   >>> subset = None


Closing files is not a "close" command but simply setting the variables to "None"

[I seem to have problems with WriteArray not always writing the file correctly -- unsure yet when and why, the one above works]