Friday, October 23, 2015

Batch Downloading Sentinel Images from ESA's Scihub with Python

ESA's Sentinel Satellite Images can be downloaded from https://scihub.esa.int/ with batch scripting as described here: https://scihub.esa.int/userguide/5APIsAndBatchScripting

The following describes a simple script on how to batch-download Sentinel images over an area of interest using Python. The search request in this version below is hard-coded into the script, but solves the main-task of downloading these files. The following is a first attempt and certainly will be improved. I will update this page then.

Please note that scihub has now two alternative download locations. The one called "dhus" contains for now data a longer time back, but is not very stable. The one called "apihub" contains the last 30 days, but is dedicated for script download and more stable. For apihub replace "dhus" in all links below.


Authenticate at Webpage 

Since scihub needs registration, any web-request passed to scihub needs authentication. So first you have to open your access to this page. The tools within Python's urllib library needed for this are described at this page.

    
    # The lower level url to authenticate
    'https://scihub.copernicus.eu/dhus/'
    
    # Create a password manager
    # as described at 
    # https://docs.python.org/2.7/howto/urllib2.html#openers-and-handlers
    username = 'username'
    password = 'password'
    p = urllib2.HTTPPasswordMgrWithDefaultRealm() 
    p.add_password(None, url, username, password)
    handler = urllib2.HTTPBasicAuthHandler(p)

    # create "opener" (OpenerDirector instance)
    opener = urllib2.build_opener(handler)
    
    # Install the opener.
    # Now all calls to urllib2.urlopen use our opener.
    urllib2.install_opener(opener)


Create the query and save resulting xml-response

They query itself is a weblink created as described here: https://scihub.esa.int/twiki/do/view/SciHubUserGuide/5APIsAndBatchScripting#Open_Search

As an example, the following weblink searches for all GRD type products ingested in scihub between the given dates within the area of interest defined by the polygon:


The first issue is, however, to reformat the link above in order for te urllib library to be able to read it. Pasting the link above into the browser will give you the list of results. However, inspecting the link closely shows that what is actually passed on is a link, where some but not all special characters are replaced: 

In order to replace the needed characters, you can use the urllib.quote(string[, safe]) command. The safe parameter defines all the characters, which are not to be replaced, and after some trial and error I find that this string need to be
':()[]/?=,&'

For the url therefore you use urllib.quote, then read the result and write it to an xml file:


    urlrequest = urllib.quote('https://scihub.esa.int/dhus/search?q=productType:GRD AND ingestionDate:[2015-09-30T00:00:00.000Z TO 2015-09-30T23:59:59.999Z] AND footprint:"Intersects(POLYGON((-16.11154750824 69.190260085686,-2.7521725082398 69.190260085686,-2.7521725082398 76.370271635301,-16.11154750824 76.370271635301,-16.11154750824 69.190260085686)))"&rows=10000&start=0', ':()[]/?=,&')
    
    # Read page response into variable
    page = urllib2.urlopen(urlrequest).read()
    
    # Write returned xml list to textfile
    textfile = open('/home/max/Documents/test.xml', 'w')
    textfile.write(page)
    textfile.close()

This result creates an xml-file containing a list of matching products.

Parsing the xml file

The next step is now to parse this xml file for the resulting and matching files and download these. How to do this, I found nicely described in "Dive Into Python" It is worth playing a bit with the different examples from chapter 12 to understand how the ElementTree library works.

Basically, you create a list containing all elements called "entry" since these contain the matching scenes. For each entry tag you need to find the Identifier in the "id" element and the title.

The id element may, for example contain the uuid "18f7993d-eae1-4f7f-9d81-d7cf19c18378", from which we then know that the file can be downloaded under the adress
https://scihub.copernicus.eu/dhus/odata/v1/Products('18f7993d-eae1-4f7f-9d81-d7cf19c18378')/$value

The name of the image itself is in the "title" element and is needed to name the downloaded Sentinel-file.

This code loops through all entries and finds the uuid element, the name of the Sentinel image and creates the link where the file can be downloaded from:

    tree = etree.parse('/home/max/Documents/test.xml')
    entries = tree.findall('{http://www.w3.org/2005/Atom}entry')
    for entry in range(len(entries)):
        uuid_element = entries[entry].find('{http://www.w3.org/2005/Atom}id')
        sentinel_link = "https://scihub.copernicus.eu/dhus/odata/v1/Products('" + uuid_element.text + "')/$value"
        title_element = entries[entry].find('{http://www.w3.org/2005/Atom}title')

Downloading each matching image


The last task is now to download each file. I found some hints on this here and adding this to the for-loop above, I get this code:

    tree = etree.parse('/home/max/Documents/test.xml')
    entries = tree.findall('{http://www.w3.org/2005/Atom}entry')
    for entry in range(len(entries)):
        uuid_element = entries[entry].find('{http://www.w3.org/2005/Atom}id')
        sentinel_link = "https://scihub.copernicus.eu/dhus/odata/v1/Products('" + uuid_element.text + "')/$value"
        title_element = entries[entry].find('{http://www.w3.org/2005/Atom}title')

        # Full path where downloaded file is to be stored
        destinationpath =  "/home/max/Documents/SentinelDownloads/" +    title_element.text + '.zip'
        
        #Open the file and read the data
        print "Downloading ", title_element.text
        downloadfile = urllib2.urlopen(sentinel_link)
        data =  downloadfile.read()
        #Write the data to the destination zipfile
        with open(destinationpath, "wb") as code:
           code.write(data)

Using all this now, you have to create your own query and replace the urlrequest variable. Add your username and password, as well as define your destinationpath:

The complete script


import urllib2, urllib
import xml.etree.ElementTree as etree

# Hard coded link returning list of matching scenes
# https://scihub.esa.int/twiki/do/view/SciHubUserGuide/5APIsAndBatchScripting#Open_Search

# Authenticate at scihub webpage
url =  'https://scihub.copernicus.eu/dhus/'
username = 'your_username'
password = 'your_password'
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()

password_mgr.add_password(None, url, username, password)
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)

# The request query
urlrequest = urllib.quote('https://scihub.copernicus.eu/dhus/search?q=productType:GRD AND ingestionDate:[2015-09-30T00:00:00.000Z TO 2015-09-30T23:59:59.999Z] AND footprint:"Intersects(POLYGON((-16.11154750824 69.190260085686,-2.7521725082398 69.190260085686,-2.7521725082398 76.370271635301,-16.11154750824 76.370271635301,-16.11154750824 69.190260085686)))"&rows=10000&start=0', ':()[]/?=,&')
# Read the response into page and write it to a xml-file
page = urllib2.urlopen(urlrequest).read()
textfile = open('/home/max/Documents/test.xml', 'w')
textfile.write(page)
textfile.close()

#Parsing the xml-file, the entry tag contains the results
tree = etree.parse('/home/max/Documents/test.xml')
entries = tree.findall('{http://www.w3.org/2005/Atom}entry')
print "Number of Scenes Found: ", len(entries)
for entry in range(len(entries)):
    #The uuid element allows to create the path to the file
    uuid_element = entries[entry].find('{http://www.w3.org/2005/Atom}id')
    sentinel_link = "https://scihub.copernicus.eu/dhus/odata/v1/Products('" + uuid_element.text + "')/$value"
    
    #the title element contains the corresponding file name
    title_element = entries[entry].find('{http://www.w3.org/2005/Atom}title')
    
    #Destinationpath with filename where download to be stored
    destinationpath =  "/home/max/Documents/SentinelDownloads/" +    title_element.text + '.zip'
    
    print
    print "Scene ", entry + 1 , "of ", len(entries)
    
    #Check if file already was downloaded
    print sentinel_link
    if os.path.exists(destinationpath):
        print title_element.text, ' already downloaded'
        
        continue
    
    #Download file and read
    print "Downloading ", title_element.text
    downloadfile = urllib2.urlopen(sentinel_link)
    data =  downloadfile.read()
    
    # Write the download into the destination zipfile
    with open(destinationpath, "wb") as code:
        code.write(data)

print "Done downloading"