The OpenStreetMap data model

The OpenStreetMap data model is a powerful yet simple way to represent geographic information. Understanding the data model enables you to interact with OpenStreetMap data in its raw form so that you can manipulate it into formats that are more useful for what you're trying to do. You'll want to understand the data model if you are interested in writing a map editor, converting the raw OpenStreetMap data into a format for use in an application, or if any of the existing software tools don't provide functionality that you want.

In traditional GIS the map data is represented in three different ways: a point is a single location in space defined by coordinates, a linestring is a linear feature representing a road or a border, and a polygon is an enclosed area. Data describing these geographical features is usually attached as an afterthought in a secondary database. In OpenStreetMap, these three concepts are distilled down into nodes, ways, and relations with tags being a way to describe each feature. This chapter discusses these core OpenStreetMap concepts in more detail.

This is an advanced topic that will help you if you decide to delve deeper into working with OpenStreetMap data, perhaps to implement something interesting that does not already exist, or improve a current tool. It will help you if the task requires low level control of map data, either when working with static datasets or interacting with an OpenStreetMap API, such as the live REST API used to edit the OSM data. To learn more about the REST API, visit the wiki page at http://wiki.openstreetmap.org/wiki/API_v0.6.

Nodes

Points on Earth are called nodes and are represented by a latitude, longitude and as many tags as may be appropriate. For example, nodes are used to represent shops, bus stops, benches, and post boxes. A node without any tags will always be a subelement of another element.

Ways

An ordered list of nodes is called a way. A way has a maximum of 2000 nodes to ensure that tools and users are not overwhelmed with very large structures that are difficult to manipulate. They are used for representing linear features like footpaths, roads, rail lines, and power lines.

Areas do not have a specific data type, and are simply a kind of closed way where the first node is the same as the last node. They are used to represent building outlines, parks, and landuse.

Relations

Relations are ordered lists of nodes, ways or relations. Each member of a relation has an optional role that gives an additional piece of information about that subelement. These roles are strings up to 255 characters long, like tag values. Relations can represent road or bicycle routes, turn restrictions, and administrative boundaries, again depending on the set of tags attached.

You should remember that relations are not categories and should not be used solely to group things together. You should probably use a tag to group things together instead. For more information about this topic, visit http://wiki.osm.org/wiki/Relations/Relations_are_not_Categories.

Identifiers

Any element in an OpenStreetMap dataset, of any of the three types above, is identified by a unique numerical id. These numbers have no other purpose than to allow referencing of individual features, and have no special meaning. A relation or a way uses these identifiers to reference its subelements. Two ways are said to meet only if they reference the same node identifier, rather than two nodes with identical coordinates. Closed ways representing areas need to reference the same node Id twice.

OpenStreetMap file formats

OpenStreetMap data files are traditionally distributed in an XML format representing the node, way, and relation concepts using a simple schema. Without compression, this XML format can be extremely large, so it is usually distributed using an efficient compression algorithm like gzip or bz2. Most of the tools designed to work with the OSM XML data format can also handle the compressed XML.

To solve the size and XML parse speed problem of XML, a separate format using Google's Protocol Buffers project was created to pack as much OSM data into a binary file as possible. The PBF format allows the data to be compressed even further than gzip and bz2 and allows much faster processing than compressed XML. There are libraries for several languages to help read this file format.