Fun With Shapefiles

Shapefiles are a format developed by ESRI (the makers of ArcGIS) to store and share geospatial data. Many interesting datasets are freely available in shapefile format. Shapefiles can be viewed with a number of freely available applications, such as ArcGIS Explorer (which requires the .NET framework or Silverlight) or ArcReader (which is multi-platform but closed-source). Open-source GIS packages such as Quantum GIS and GRASS can also view and edit shapefiles, and recent versions can be installed from the UbuntuGis PPA.

However GIS applications can be tricky to use without training. Moreover, sometimes you may want to use shapefile-data for some other purpose, outside of traditional GIS applications. Here are a few useful tricks you can do with shapefiles, and ways to get data out of a shapefile and into another application.

Reprojecting a shapefile

If you want to combine data from different sources, you'll have to make sure that they use the same coordinate system, reference point (datum), geographic projection and representation of the Earth's surface (geoid).
These days, it is common to convert everything to WGS84 latitude/longitude coordinates. To convert a given shapefile to WGS84 coordinates, you can use the GDAL utility ogr2ogr. In Ubuntu, first install the package gdal-bin and then execute the following command: ogr2ogr -t_srs EPSG:4326 output_wgs84.shp input.shp

The parameter -t_srs specifies the spatial reference system of the output file, which is usually specified through an EPSG code. In this case, it tells the program to convert all coordinates in the shapefile to geographic 2D WGS84 coordinates (EPSG:4326). Use EPSG:4979 for 3D coordinates.

Converting a shapefile to KML (Google Earth/Maps)

One of the more interesting ways to visualise data stored in a shapefile is to convert it into a KML file (Keyhole Markup Language), which can then be used in e.g. Google Earth (or with Google Maps). To do this in Ubuntu, first install the package gdal-bin and then execute the following command: ogr2ogr -f "KML" -t_srs EPSG:4326 output.kml input.shp

Converting a shapefile to KMZ (compressed Google Earth)

The problem with XML-based formats such as KML is that they can get rather big if you have a large shapefile with lots of data. To save some disk-space, you can compress the KML file into a KMZ file. A KMZ file is simply a ZIP-file that contains a KML-document (doc.kml) and optional directories with other files that are referenced from the KML-file. To directly convert a shapefile into KMZ, you can use this shell-script (shp2kmz). Just copy it to your home-directory or to /usr/local/bin and make it executable (e.g. chmod a+x shp2kmz. It uses ogr2ogr and zip, so in Ubuntu it requires that the gdal-bin package is installed. Usage: shp2kmz input.shp output.kmz

Merging shapefiles

Merging two or more shapefiles is easy with ogr2ogr. A merge of two shapefiles 'file1.shp' and 'file2.shp' into a new file 'file_merged.shp' is performed like this:

ogr2ogr file_merged.shp file1.shp
ogr2ogr -update -append file_merged.shp file2.shp -nln file_merged

The first command just copies the first shapefile to a new file (file_merged.shp).The second command opens file_merged.shp in update mode, tries to find existing layers and appends the features being copied. The -nln option sets the name of the layer to be copied to.

Converting a shapefile to a spreadsheet

You may be able to load some of the shapefile attributes into a spreadsheet, by opening the associated DBF file in a spreadsheet-application that supports dBASE-files, such as Gnumeric. However, often this will exclude interesting data, such as coordinates and some types of attributes. A better way is to load the shapefile into Quantum GIS, open the attribute table, select all the rows and copy the data to the clipboard. You can then paste everything into a spreadsheet. It may get big and messy though...

Converting a shapefile to CSV (textfile)

If your shapefile contains points rather than polygons, you can export the data as a CSV file using ogr2ogr and the GDAL CSV-driver, e.g: ogr2ogr -f "CSV" -t_srs EPSG:4979 -lco GEOMETRY=AS_XYZ output input.shp This will create a directory containing one or more CSV-files containing attributes and X/Y/Z coordinates (use ogr2ogr -f "CSV" -t_srs EPSG:4326 -lco GEOMETRY=AS_XY if you need only latitude and longitude). Add SEPARATOR=TAB if you prefer a tab-separated file rather than commas.

However this won't export coordinates if your shapefile contains polygons. In that case you can export the entire geometry using: ogr2ogr -f "CSV" -t_srs EPSG:4326 -lco GEOMETRY=AS_WKT output input.shp

Unfortunately if you just want latitude and longitude of the polygon centroids, you'll need to compute them yourself. The polygons will be in stored in a WKT-representation. If they are small, you can approximate the centroid by simply averaging the longitude and latitude coordinates. Rather than parsing everything yourself, you can for instance use Python and the shapely module, e.g.:

from shapely.wkt import loads
polygon = loads('POLYGON ((-8.254538773542281 54.511112320794687,-8.25368883970682 54.5107852540
30836,-8.25331620402474 54.510641858248889))')
centroid = polygon.centroid
lat = centroid.y
lon = centroid.x

For Perl take a look at the Geo-Point modules. If your polygons are big or accuracy is important, it's probably best to use R and the maptools package (see below), or load the data into a spatialite database and use the CENTROID SQL-function. I'll try to put together an example at some point.

An alternative way to convert shapefiles to text is by using the shp2text utility.

Working with shapefiles in Python

Rather than first exporting to CSV, you can easily load shapefiles into Python directly using the ogr module, and then process them with shapely or other modules. For example:

from osgeo import ogr
from shapely.wkb import loads
source = ogr.Open(“testpoly.shp”)
couche = source.GetLayerByName(“testpoly”)
for element in couche:
    geom = loads(element.GetGeometryRef().ExportToWkb())

Working with shapefiles in Perl

Of course you can also read shapefiles directly in Perl, have a look at the Geo::ShapeFile module.

Working with shapefiles in R

The readShapePoly function from the maptools package will read a shapefile and create a spatialPolygonsDataFrame object.

The rest I still need to figure out...

Some potentially useful resources:

http://science.nature.nps.gov/im/monitor/stats/R/
http://:cran.r-project.org/web/packages/spatstat/vignettes/shapefiles.pdf
https://github.com/hadley/ggplot2/wiki/plotting-polygon-shapefiles
http://help.nceas.ucsb.edu/R:_Spatial
http://spatialanalysis.co.uk/files/2011/01/intro_to_spatial_data.txt
http://www.statistik.uni-muenchen.de/~hoehle/teaching/stepi2010/visualiz...
http://pvanb.wordpress.com/2010/02/27/a-map-of-deforestation-in-africa-u...
http://stackoverflow.com/questions/5130860/access-spatial-objects-in-spa...
http://www.people.fas.harvard.edu/~zhukov/spatial.R
http://www.nceas.ucsb.edu/files/scicomp/GISSeminar/UseCases/CreateKMLFil...
http://www.est.ufpr.br/aRT/docs/aRTsp.pdf
http://www.nceas.ucsb.edu/scicomp/usecases/computepolygoncentroids