Wednesday, March 13, 2019

Geographical data migration



A lot has happened in the GIS (geographic information system) landscape since the last post on the GeoFS blog. This may be my excuse for the long delay.

GeoFS is making a massive consumption of geographical data: Digital elevation models (the mountains and landscape), satellite and aerial images, map tiles, etc.

Most of the major data providers that used to have very generous (sometimes free) offering have turned commercial. Those who were already commercial, like Google Maps, have increased their fees dramatically. 

The cost of running GeoFS on these data sources had become impossible to sustain. So I made the decision, last year, to purchase, process and host this data myself. This is a biggie. We are talking about a huge amount of data (several TB) that has to be paid for, downloaded, processed and munched, remapped, calibrated, rendered, organised in tiles... and finally hosted on servers that can handle the delivery to thousands of users each days, fast and reliable.

This has been an idea of mine for a long time but I always considered it completely impossible to achieve: the kind of stuff reserved to Google and such. Under the pressure of either dropping GeoFS altogether or trying to go the hard route, I just went for it.

Elevation data is coming from NASA Shuttle Radar Topography Mission (SRTM). It is public domain but had to be processed into a format that's usable by GeoFS. That took a couple of months work in all. The result still has some issues and I will need to do it all over again soon.

Low resolution satellite images are Sentinel 2. The same used by Cesium Ion. I purchased the whole planet in 10m resolution and organised the tiles in a file structure easy to host. This, strangely enough was quite straight forward. The only issue here is the weight of the data that makes it difficult to move around easily.



Finally, Google Maps was replaced by OpenstreetMap data, pre-rendered in tiles with a special GeoFS style sheet in order to achieve a sort of aeronautical chart look and feel. This was probably the most difficult par of the whole migration. OSM data is difficult to work with. A lot of tools exist but processes are very complex and documentation is lacking. Rendering is painfully slow. The last deployed zoom level (13) took about 40 days. Zoom 14 has now been running for a month and is only 40% done.

Hosting is handled by two dedicated servers for redundancy and load balancing. Nginx is doing the excellent job of delivering static files at lightning speed. This is why all the data was pre-rendered: static files are very easy to serve and the result is a an infrastructure that is robust and easy to maintain and scale.

Have a nice flight!