Thematic extracts of OpenStreetMap data in cloud-native file formats
OpenStreetMap’s native file format is OSM PBF, but this 80GB ‘planet file’ is unwieldy and not supported by all GIS software. Layercake is OSM data extracted into thematic layers (buildings, transportation, etc) and converted to cloud-native file formats that are easy to use with software from DuckDB to QGIS.
Layercake data is available from data.openstreetmap.us. Generally, you’ll put the URL for the layer you’d like to use into DuckDB or other software that supports GeoParquet files.
Schema
All Layercake layers are available as GeoParquet files. Every layer has the following columns:
type(string): the OSM element type (node,way, orrelation)id(int64): the OSM element IDbbox(struct): the xmin, ymin, xmax, and ymax of the element’s geometrygeometry(binary): a WKB-encoded Geometry
Each layer also has additional columns corresponding to OSM tags. These vary by layer and are documented below. Most are strings, but some columns have been parsed into richer types (lists, maps, or integers).
The following layers are currently available:
buildings
URL: https://data.openstreetmap.us/layercake/buildings.parquet
Columns: building, building:levels, building:flats, building:material, building:colour, building:part, building:use, name, addr:housenumber, addr:street, addr:city, addr:postcode, website, wikipedia, wikidata, height, roof:shape, roof:levels, roof:colour, roof:material, roof:orientation, roof:height, start_date, access, wheelchair
highways
URL: https://data.openstreetmap.us/layercake/highways.parquet
Columns: highway, service, crossing, cycleway, cycleway:left, cycleway:right, footway, construction, name, ref, bridge, covered, lanes, layer, lit, sidewalk, smoothness, surface, tracktype, tunnel, wheelchair, width, access, bicycle, bus, foot, hgv, maxspeed, motor_vehicle, motorcycle, oneway, toll
boundaries
URL: https://data.openstreetmap.us/layercake/boundaries.parquet
Columns: boundary, admin_level, name (list), names (map), official_name (list), official_names (map), int_name (list), alt_name (list), alt_names (map), place, border_type, ISO3166-2, ISO3166-1:alpha2, ISO3166-1:alpha3, wikidata, wikipedia, disputed_by (list), claimed_by (list), controlled_by (list), recognized_by (list)
settlements
URL: https://data.openstreetmap.us/layercake/settlements.parquet
Columns: place, name, names (map), alt_name, alt_names (map), official_name, official_names (map), wikidata, wikipedia, population (int64)
parks
URL: https://data.openstreetmap.us/layercake/parks.parquet
Columns: boundary, protected_area, leisure, name (list), names (map), short_name (list), short_names (map), official_name (list), official_names (map), protect_class, protection_title, protected, iucn_level, access, operator, operator:type, owner, ownership, start_date, related_law, website, wikidata, wikipedia
Examples
One use for Layercake is to download a subset of data that is of interest for your use case. For example, you could download buildings in Colorado that are taller than 5 floors, and write the results to a GeoJSON file for further processing.
D copy (
from 'https://data.openstreetmap.us/layercake/buildings.parquet'
select type as osm_type, id as osm_id,
building, "building:levels", name, height, geometry
where try_cast("building:levels" as int) > 5
and bbox.xmin > -109.05
and bbox.ymin > 36.99
and bbox.xmax < -102.04
and bbox.ymax < 41.00
) to 'colorado_tall_buildings.geojson' with (format GDAL, driver 'GeoJSON');
You can also join two Layercake layers together. The query below uses a spatial join to find the largest cities in California by joining settlements against the boundaries layer.
D select settlements.type, settlements.id, settlements.name, settlements.population
from 'https://data.openstreetmap.us/layercake/settlements.parquet'
join 'https://data.openstreetmap.us/layercake/boundaries.parquet'
on st_within(settlements.geometry, boundaries.geometry)
where boundaries.name[1] = 'California'
order by settlements.population desc
limit 10;
┌─────────┬────────────┬───────────────┬────────────┐
│ type │ id │ name │ population │
│ varchar │ int64 │ varchar │ uint64 │
├─────────┼────────────┼───────────────┼────────────┤
│ node │ 1738808199 │ Los Angeles │ 3898747 │
│ node │ 1824135555 │ San Diego │ 1386932 │
│ node │ 1690212988 │ San Jose │ 1013240 │
│ node │ 26819236 │ San Francisco │ 873965 │
│ node │ 1956099531 │ Fresno │ 520052 │
│ node │ 150959789 │ Sacramento │ 490712 │
│ node │ 6474240715 │ Long Beach │ 469450 │
│ node │ 150980683 │ Oakland │ 433031 │
│ node │ 1979182884 │ Bakersfield │ 373640 │
│ node │ 1837296118 │ Anaheim │ 350742 │
└─────────┴────────────┴───────────────┴────────────┘
Analytics queries work too. The example below finds all of the values of surface used on highways in OSM, sorted by how common they are.
$ duckdb
D from 'https://data.openstreetmap.us/layercake/highways.parquet'
select surface, count(*) as count
where type = 'way'
group by surface
order by count desc;
┌──────────────────────┬───────────┐
│ surface │ count │
│ varchar │ int64 │
├──────────────────────┼───────────┤
│ NULL │ 179401445 │
│ asphalt │ 29916184 │
│ unpaved │ 12156065 │
│ paved │ 4095349 │
│ concrete │ 3923954 │
│ paving_stones │ 3771049 │
│ ground │ 3387599 │
│ gravel │ 2139921 │
│ dirt │ 1688384 │
│ compacted │ 1192208 │
│ grass │ 851898 │
│ sett │ 485161 │
│ fine_gravel │ 444866 │
│ sand │ 295180 │
│ wood │ 217383 │
│ concrete:plates │ 193862 │
│ earth │ 146823 │
│ cobblestone │ 139089 │
│ pebblestone │ 130414 │
│ metal │ 45100 │
│ · │ · │
│ · │ · │
│ · │ · │
│ metl │ 1 │
│ curved │ 1 │
│ Via de Joaquim Gomis │ 1 │
│ 0 │ 1 │
│ earth_grass │ 1 │
│ unkno │ 1 │
│ driving_plates │ 1 │
│ 砕石舗装w │ 1 │
│ trawaw │ 1 │
│ azaq │ 1 │
│ surface=asphalt │ 1 │
│ dirt/sand;paved │ 1 │
│ murrum │ 1 │
│ rubber car tires │ 1 │
│ آهنگ_۳ │ 1 │
│ ground,_gravel,_sand │ 1 │
│ ail │ 1 │
│ bewachsener_boden │ 1 │
│ pu │ 1 │
│ dirt4 │ 1 │
├──────────────────────┴───────────┤
│ 5410 rows (40 shown) 2 columns │
└──────────────────────────────────┘
