<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Vincent Sarago on Medium]]></title>
        <description><![CDATA[Stories by Vincent Sarago on Medium]]></description>
        <link>https://medium.com/@_VincentS_?source=rss-754c34eee3ad------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/2*eX6iuOw3JUjVQysF56ucRA.jpeg</url>
            <title>Stories by Vincent Sarago on Medium</title>
            <link>https://medium.com/@_VincentS_?source=rss-754c34eee3ad------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 18 Apr 2026 16:57:30 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@_VincentS_/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[COG Talk 4ter — Distributed processes]]></title>
            <link>https://medium.com/devseed/cog-talk-4ter-distributed-processes-8ee280f71080?source=rss-754c34eee3ad------2</link>
            <guid isPermaLink="false">https://medium.com/p/8ee280f71080</guid>
            <category><![CDATA[earth-observation]]></category>
            <category><![CDATA[satellite-imagery]]></category>
            <category><![CDATA[serverless]]></category>
            <category><![CDATA[cog]]></category>
            <category><![CDATA[maps]]></category>
            <dc:creator><![CDATA[Vincent Sarago]]></dc:creator>
            <pubDate>Mon, 17 Feb 2020 21:48:49 GMT</pubDate>
            <atom:updated>2020-02-18T20:25:25.317Z</atom:updated>
            <content:encoded><![CDATA[<h3>COG Talk 4ter — Distributed processes</h3><h4>Last week in <a href="https://medium.com/devseed/cog-talk-part-4-enabling-spatio-temporal-data-processing-at-scale-e9cb23e33281">COG Talk 4</a> we discussed large scale processing using Cloud Optimized GeoTIFF and MosaicJSON. Here is another example on how we can use mosaicJSON and dynamic tiler to create high resolution large scale mosaic.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qGZjqD6zPOdIf7RusElqfg.jpeg" /></figure><h3>Divide, Select and Conquer</h3><p>With COG and dynamic tiling you create tiles at the time of request from the raw data. Usually this lets you apply rescaling or color correction to enable a better <em>look</em> for web map display. With mosaicJSON and <strong>rio-tiler-mosaic</strong>, introduced in <a href="https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df">COG Talk 2</a>, we extended the idea of dynamic tiling by adding the <strong>pixel selection</strong> operation. When we have multiple overlapping datasets, you can tell rio-tiler-mosaic which pixel you want to keep or what operation you want to perform on the stack of pixel.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/603/1*xpcizjvrwyWbTBtigomeNg.jpeg" /><figcaption>Pixel selection methods applied on Landsat-8 NDVI values for all 2018 observations over Montreal area.</figcaption></figure><p>In this post we want to show how we can use the combination of COGs, dynamic tiler and mosaicJSON to create <em>large scale, good looking and high resolution</em> mosaic of Landsat 8 data.</p><h3>1. Create mosaicJSON</h3><p>For this demo, we are using our <a href="https://github.com/developmentseed/awspds-mosaic">awspds-mosaic</a> stack. The stack has a mosaic/create endpoint which accepts <a href="https://github.com/sat-utils/sat-api">sat-api</a> queries to create mosaicJSON of Landsat 8 data hosted on <a href="https://landsatonaws.com">AWS PDS</a>.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/cf94177aedff2f17567398a16a6ccfbe/href">https://medium.com/media/cf94177aedff2f17567398a16a6ccfbe/href</a></iframe><p>In 👆 the above we:</p><ul><li>define an area of interest (AOI), date range and cloud filter for the <a href="https://github.com/sat-utils/sat-api">SAT API</a> search endoint (Note: it should work the same with any other STAC api)</li><li>POST the query to the mosaic endpoint with optional parameters (season, tile format…)</li></ul><p>The result of the requests is a tileJSON like <a href="https://github.com/developmentseed/cogeo-mosaic-tiler/blob/master/cogeo_mosaic_tiler/handlers/app.py#L135-L143">object</a>.</p><h3>2. Create list of mercator tiles</h3><p>We are going to distribute our processes using web mercator tiles at zoom 11 (512x512 px tiles have the same resolution as zoom12 256x256 tiles). For our area of interest, this represents 783 tiles.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/ea18a8e9c05551cea5613ef9d448abb8/href">https://medium.com/media/ea18a8e9c05551cea5613ef9d448abb8/href</a></iframe><h3>3. Create tile URL</h3><p>In this step we define the creation option for the tiles. The query_params will be added to the tile url obtained in <strong>1</strong>.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/808bc7adb9803041d334426d501aa943/href">https://medium.com/media/808bc7adb9803041d334426d501aa943/href</a></iframe><ul><li><strong>bands=&quot;4,3,2&quot;</strong>: We pass the RGB band combination as a coma separated list. Here 4,3,2 correspond to Landsat 8 band combination for True Color.</li><li><strong>color_ops=&quot;gamma RGB 3.5, saturation 1.7, sigmoidal RGB 15 0.35&quot;</strong>: The Landsat data is stored as Uint16 data type, in order to obtain a good looking result we apply a <a href="https://github.com/mapbox/rio-color">rio-color</a> formula.</li><li><strong>pixel_selection=&quot;median&quot;</strong>: This is were the magic happens. The <a href="https://github.com/cogeotiff/rio-tiler-mosaic/blob/master/rio_tiler_mosaic/methods/defaults.py#L86">median</a> pixel selection option means that for each tile, the dynamic tiler will return the median value for the whole stack of data.</li></ul><p>Note: In the _worker function you could likely add an inference step to apply ML on the output tile.</p><h3>4. Distribute and collect</h3><p>We now can call the tiler and get the resulting tiles (⚠ it can take up to 1 min). We also use some Rasterio code to merge the tiles into one raster file which we then translate to Cloud Optimized GeoTIFF.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/a230b3ccd2f6f5dd72090e2791cdba91/href">https://medium.com/media/a230b3ccd2f6f5dd72090e2791cdba91/href</a></iframe><h3>5. The result</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*JS_TszDIIlQVUdqm" /><figcaption>High Resolution mosaic over Britany using median pixel selection for all 2019’s spring and summer Landsat 8 scenes with less than 5% of cloud.</figcaption></figure><p>Doing this over Brittany is great, but can we scale it to a country size area ? 👇</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*XWtOdvF68zmbuozRrvdYbA.jpeg" /><figcaption>High Resolution mosaic over France using median pixel selection for all summer Landsat 8 scenes (2013 → 2019) with less than 10% of cloud.</figcaption></figure><p>Checkout the high resolution image <a href="https://cogeo.xyz/?url=https://s3.amazonaws.com/opendata.remotepixel.ca/cogs/L8_France_MedianPixel_2013_2019.tif">here</a>.</p><p>Doing this over an entire country takes a bit more time, however using AWS Lambda and our cogeo/mosaic-* tools it takes less than 10 minutes 😱. Here are some numbers to put this in context:</p><ul><li><strong>~7 min</strong> for the tiles creation step (it takes almost longer to merge the tiles into one COG)</li><li><strong>7802</strong> number of Zoom 11 tiles to fetch (== AWS Lambda calls)</li><li><strong>522</strong> different Landsat 8 scenes (x3 bands)</li><li><strong>&gt;76 Gb</strong> of data fetched (522 x 3 x ~50Mb per file)</li><li><strong>48128 x 42496</strong> output raster size</li></ul><p>The whole notebook can be found here: <a href="https://github.com/developmentseed/awspds-mosaic/blob/master/notebooks/LargeScaleMosaic.ipynb">https://github.com/developmentseed/awspds-mosaic/blob/master/notebooks/LargeScaleMosaic.ipynb</a></p><h3>Got Data ?</h3><p>We’re always looking for interesting problems to tackle using COGs, if you have a raster dataset and want to learn how COG, STAC or mosaicJSON could help, please feel free to ping me on <a href="https://twitter.com/_VincentS_"><strong>Twitter</strong></a> or <a href="https://www.linkedin.com/in/vincentsarago/"><strong>LinkedIn</strong></a>! And if you are interested in joining Development Seed to help us build technology that helps solve global challenges take a look at our <a href="https://developmentseed.org/careers/jobs/">open positions</a>!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8ee280f71080" width="1" height="1" alt=""><hr><p><a href="https://medium.com/devseed/cog-talk-4ter-distributed-processes-8ee280f71080">COG Talk 4ter — Distributed processes</a> was originally published in <a href="https://medium.com/devseed">Development Seed</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Facebook’s population dataset]]></title>
            <link>https://medium.com/@_VincentS_/facebooks-population-dataset-47cfd1780af8?source=rss-754c34eee3ad------2</link>
            <guid isPermaLink="false">https://medium.com/p/47cfd1780af8</guid>
            <category><![CDATA[cog]]></category>
            <category><![CDATA[satellite-imagery]]></category>
            <category><![CDATA[facebook]]></category>
            <category><![CDATA[maps]]></category>
            <category><![CDATA[earth-observation]]></category>
            <dc:creator><![CDATA[Vincent Sarago]]></dc:creator>
            <pubDate>Fri, 14 Feb 2020 16:34:28 GMT</pubDate>
            <atom:updated>2020-02-14T16:34:28.799Z</atom:updated>
            <content:encoded><![CDATA[<h4>Earlier this week in <a href="https://medium.com/devseed/cog-talk-part-4-enabling-spatio-temporal-data-processing-at-scale-e9cb23e33281">COG Talk 4</a> we talked about doing large scale processing using Cloud Optimized GeoTIFF and mosaicJSON. Here is another example of how to create simple visualization tools when you store the data as Cloud Optimized GeoTIFF.</h4><h3>Data For Good: High Resolution Population Density Maps</h3><p>Ref: <a href="https://registry.opendata.aws/dataforgood-fb-hrsl/">https://registry.opendata.aws/dataforgood-fb-hrsl/</a> Format: .TIFF (Cloud Optimized GeoTIFF)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*2XTPhCjDfIKC3noR" /><figcaption>Dataset&#39;s coverage.</figcaption></figure><p>The dataset is formed of <strong>294</strong> Cloud Optimized GeoTIFF, representing six different variables stored in separate file.</p><pre>$ aws s3 ls dataforgood-fb-data/tif/month=2019-06/country=ZWE/ --recursive | grep &quot;.tif$&quot;</pre><pre>type=children_under_five/ZWE_children_under_five.tif type=elderly_60_plus/ZWE_elderly_60_plus.tif<br>type=men/ZWE_men.tif type=women/ZWE_women.tif type=women_of_reproductive_age_15_49/ZWE_women_of_reproductive_age_15_49.tif <br>type=youth_15_24/ZWE_youth_15_24.tif</pre><p>See it live: <a href="https://cogeo.xyz/projects/Facebook/index.html">https://cogeo.xyz/projects/Facebook/index.html</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*AfvT2DfjKdfLLElW" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/0*IJtVdgjKHunj5j7J" /><figcaption>We can visualize the mosaic-json using cogeo-mosaic-tiler stack.</figcaption></figure><h3>COG → MosaicJSON</h3><pre><strong>1. Download list of files</strong></pre><pre>$ aws s3 cp s3://dataforgood-fb-data/index.txt .</pre><pre><strong>2. Split list in each variables</strong></pre><pre># type=elderly_60_plus</pre><pre>$ cat index.txt| grep &quot;.tif$&quot; | grep &quot;type=elderly_60_plus/&quot; | awk &#39;{print &quot;s3://dataforgood-fb-data/&quot;$NF}&#39;  &gt; list_elderly_60_plus.txt</pre><pre># type=men</pre><pre>$ cat index.txt| grep &quot;.tif$&quot; | grep &quot;type=men/&quot; | awk &#39;{print &quot;s3://dataforgood-fb-data/&quot;$NF}&#39;  &gt; list_men.txt</pre><pre># type=women_of_reproductive_age_15_49</pre><pre>$ cat index.txt| grep &quot;.tif$&quot; | grep &quot;type=women_of_reproductive_age_15_49/&quot; | awk &#39;{print &quot;s3://dataforgood-fb-data/&quot;$NF}&#39;  &gt; list_women_of_reproductive_age_15_49.txt</pre><pre># type=women</pre><pre>$ cat index.txt| grep &quot;.tif$&quot; | grep &quot;type=women/&quot; | awk &#39;{print &quot;s3://dataforgood-fb-data/&quot;$NF}&#39;  &gt; list_women.txt</pre><pre># type=youth_15_24</pre><pre>$ cat index.txt| grep &quot;.tif$&quot; | grep &quot;type=youth_15_24/&quot; | awk &#39;{print &quot;s3://dataforgood-fb-data/&quot;$NF}&#39;  &gt; list_youth_15_24.txt</pre><pre># type=children_under_five</pre><pre>$ cat index.txt| grep &quot;.tif$&quot; | grep &quot;type=children_under_five/&quot; | awk &#39;{print &quot;s3://dataforgood-fb-data/&quot;$NF}&#39;  &gt; list_children_under_five.txt</pre><pre>3<strong>. Create MosaicJSON (using </strong><a href="https://github.com/developmentseed/cogeo-mosaic"><strong>cogeo-mosaic</strong></a><strong>).</strong></pre><pre>$ cat list_elderly_60_plus.txt | cogeo-mosaic create - -o elderly_60_plus.json</pre><pre>$ cat list_men.txt | cogeo-mosaic create - -o men.json</pre><pre>$ cat list_women_of_reproductive_age_15_49.txt | cogeo-mosaic create - -o women_of_reproductive_age_15_49.json</pre><pre>$ cat list_women.txt | cogeo-mosaic create - -o women.json</pre><pre>$ cat list_youth_15_24.txt | cogeo-mosaic create - -o youth_15_24.json</pre><pre>$ cat list_children_under_five.txt | cogeo-mosaic create - -o children_under_five.json</pre><pre><strong>4. Upload to cogeo.xyz</strong></pre><pre>$ curl -X POST -d @elderly_60_plus.json https://mosaic.cogeo.xyz/add</pre><pre>$ curl -X POST -d @men.json https://mosaic.cogeo.xyz/add</pre><pre>$ curl -X POST -d @women_of_reproductive_age_15_49.json https://mosaic.cogeo.xyz/add</pre><pre>$ curl -X POST -d @women.json https://mosaic.cogeo.xyz/add</pre><pre>$ curl -X POST -d @youth_15_24.json https://mosaic.cogeo.xyz/add</pre><pre>$ curl -X POST -d @children_under_five.json https://mosaic.cogeo.xyz/add</pre><h3>Got Data ?</h3><p>We’re always looking for interesting problems to tackle using COGs, if you have a raster dataset and want to learn how COG, STAC or mosaicJSON could help, please feel free to ping me on <a href="https://twitter.com/_VincentS_"><strong>Twitter</strong></a> or <a href="https://www.linkedin.com/in/vincentsarago/"><strong>LinkedIn</strong></a>! And if you are interested in joining Development Seed to help us build technology that helps solve global challenges take a look at our <a href="https://developmentseed.org/careers/jobs/">open positions</a>!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=47cfd1780af8" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[COG Talk 4bis — Montreal LIDAR dataset]]></title>
            <link>https://medium.com/devseed/cog-talk-4bis-montreal-lidar-dataset-8e8dc24e6617?source=rss-754c34eee3ad------2</link>
            <guid isPermaLink="false">https://medium.com/p/8e8dc24e6617</guid>
            <category><![CDATA[satellite-imagery]]></category>
            <category><![CDATA[maps]]></category>
            <category><![CDATA[cog]]></category>
            <category><![CDATA[earth-observation]]></category>
            <category><![CDATA[raster]]></category>
            <dc:creator><![CDATA[Vincent Sarago]]></dc:creator>
            <pubDate>Wed, 12 Feb 2020 16:58:41 GMT</pubDate>
            <atom:updated>2020-02-12T16:58:41.124Z</atom:updated>
            <content:encoded><![CDATA[<h3>COG Talk 4bis — Montreal LIDAR dataset</h3><h4>Another example of large scale mosaicJSON</h4><p>Earlier this week in <a href="https://medium.com/devseed/cog-talk-part-4-enabling-spatio-temporal-data-processing-at-scale-e9cb23e33281">COG Talk 4</a> we shared how to do large scale processing using Cloud Optimized GeoTIFF and mosaicJSON. Here is another example of how to create simple visualization tools when you store the data as Cloud Optimized GeoTIFF.</p><h3>Montreal LIDAR dataset</h3><p>Ref: <a href="http://donnees.ville.montreal.qc.ca/dataset/lidar-aerien-2015">http://donnees.ville.montreal.qc.ca/dataset/lidar-aerien-2015</a></p><p>Format: .LAZ (Point Cloud)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*MFQXn0kiou_l5KKM" /><figcaption>Coverage of the Montreal opendata LIDAR dataset.</figcaption></figure><p>The dataset is formed of <strong>684</strong> different COG created from .LAZ file using a modified version of our <a href="https://github.com/developmentseed/pointcloud-to-cog">cogeo-watchbot-light</a> stack. Each COG has a 25cm pixel resolution and two bands (Min and Max, see <a href="https://pdal.io/stages/writers.gdal.html">PDAL docs</a>).</p><p>See it live: <a href="https://cogeo.xyz/projects/MTLidar/index.html">https://cogeo.xyz/projects/MTLidar/index.html</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*hR-P8-ozZNNCAnyJirL4-g.gif" /><figcaption>We can visualize the mosaicJSON using cogeo-mosaic-tiler stack.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*uepR8j84sXxx2JintT4GQw.gif" /><figcaption>Using rio-tiler-mvt, introduced in COG Talk part 3, we can also create Vector Tiles from the COGs directly.</figcaption></figure><h3>.LAZ → COG → MosaicJSON</h3><p>Here are the steps we took to translate the .LAZ to COG and create the mosaicJSON:</p><pre><strong>1. Create list of files to translate</strong></pre><pre>$ curl <a href="http://donnees.ville.montreal.qc.ca/dataset/9ae61fa2-c852-464b-af7f-82b169b970d7/resource/ec35760c-5cbe-44a0-8ad1-30c037174b0a/download/indexlidar2015.csv">http://donnees.ville.montreal.qc.ca/dataset/9ae61fa2-c852-464b-af7f-82b169b970d7/resource/ec35760c-5cbe-44a0-8ad1-30c037174b0a/download/indexlidar2015.csv</a> | tail -n +2 | cut -d&quot;,&quot; -f3 &gt; list_of_files.txt</pre><pre><strong>2. Deploy </strong><a href="https://github.com/developmentseed/pointcloud-to-cog"><strong>pointcloud-to-cog</strong></a><strong> (a serverless stack based on AWS Lambda to run rio-cogeo at scale).</strong></pre><pre>$ git clone <a href="https://github.com/developmentseed/pointcloud-to-cog">https://github.com/developmentseed/pointcloud-to-cog</a><br>$ cd pointcloud-to-cog<br>$ make build &amp;&amp; sls deploy --stage production --bucket my-bucket --region us-east-1</pre><pre><strong>3. Send jobs to queue.</strong></pre><pre>$ pip install rio-cogeo rio-tiler cogeo-mosaic<br>$ cd scripts/<br>$ cat ~/list_of_files.txt | python -m create_jobs - \<br>    -p webp \<br>    --co blockxsize=256 \<br>    --co blockysize=256 \<br>    --op overview_level=6 \<br>    --op dtype=float32 \<br>    --op web_optimized=True \<br>    --prefix cogs/MTLLidar \<br>    --topic arn:aws:sns:us-east-1:{AWS_ACCOUNT_ID}:pdal-watchbot-production-WatchbotTopic</pre><pre><strong>4. Create a MosaicJSON (using </strong><a href="https://github.com/developmentseed/cogeo-mosaic"><strong>cogeo-mosaic</strong></a><strong>).</strong></pre><pre>$ aws s3 ls s3://my-bucket/cogs/MTLLidar/ | awk &#39;{print &quot;s3://my-bucket/cogs/MTLLidar/&quot;$NF}&#39; | cogeo-mosaic create - -o mosaic.json</pre><pre><strong>5. Use </strong><a href="https://cogeo.xyz/"><strong>cogeo.xyz</strong></a><strong> to visualize the mosaic</strong></pre><pre>$ curl -X POST -d @mosaic.json https://mosaic.cogeo.xyz/add | jq -r &quot;.id&quot;</pre><pre>&gt; <strong>d4c05a130c8a336c..........2cbc5c34aed85feffdaafd01ef</strong></pre><pre>open https://cogeo.xyz/mosaic.html?mosaicid=<strong>d4c05a130c8a336c..........2cbc5c34aed85feffdaafd01ef</strong></pre><h3>Got Data ?</h3><p>We’re always looking for interesting problems to tackle using COGs, if you have a raster dataset and want to learn how COG, STAC or mosaicJSON could help, please feel free to ping me on <a href="https://twitter.com/_VincentS_"><strong>Twitter</strong></a> or <a href="https://www.linkedin.com/in/vincentsarago/"><strong>LinkedIn</strong></a>! And if you are interested in joining Development Seed to help us build technology that helps solve global challenges take a look at our <a href="https://developmentseed.org/careers/jobs/">open positions</a>!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8e8dc24e6617" width="1" height="1" alt=""><hr><p><a href="https://medium.com/devseed/cog-talk-4bis-montreal-lidar-dataset-8e8dc24e6617">COG Talk 4bis — Montreal LIDAR dataset</a> was originally published in <a href="https://medium.com/devseed">Development Seed</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[COG Talk — Part 4: Enabling Spatio-temporal data processing at scale]]></title>
            <link>https://medium.com/devseed/cog-talk-part-4-enabling-spatio-temporal-data-processing-at-scale-e9cb23e33281?source=rss-754c34eee3ad------2</link>
            <guid isPermaLink="false">https://medium.com/p/e9cb23e33281</guid>
            <category><![CDATA[cog]]></category>
            <category><![CDATA[earth-observation]]></category>
            <category><![CDATA[maps]]></category>
            <category><![CDATA[raster]]></category>
            <category><![CDATA[satellite-imagery]]></category>
            <dc:creator><![CDATA[Vincent Sarago]]></dc:creator>
            <pubDate>Mon, 10 Feb 2020 16:55:06 GMT</pubDate>
            <atom:updated>2020-02-10T16:55:06.191Z</atom:updated>
            <content:encoded><![CDATA[<h3>COG Talk — Part 4: Enabling Spatio-temporal data processing at scale</h3><h3>This blog is the fourth in a series called COG Talk, which looks at ways to use Cloud Optimized GeoTIFFs to efficiently render and analyze planetary data at massive scale.</h3><p>After a refresh of what COGs are in <a href="https://medium.com/devseed/cog-talk-part-1-whats-new-941facbcd3d1">Part 1</a>, the introduction of mosaics in <a href="https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df">Part 2</a>, and a fun experiment in <a href="https://medium.com/devseed/cog-talk-part-3-vector-tiles-4a37bf6d865f">Part 3</a>, today we are going to see how COGs can be useful for large scale spatio-temporal dataset.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*50LUm0R9ThnAyhcx" /><figcaption>Introduction slide from a talk given at GéoMTL conference (<a href="https://github.com/vincentsarago/conferences/blob/master/GeoMTL_nov2019.pdf">slides</a>).</figcaption></figure><h3>Cloud Optimized GeoTIFFs (COGs)</h3><p>First, the basics. As of today, the Cloud Optimized GeoTIFF specification can be summarized as a tiny list of requirements:</p><ul><li>the data has to be tiled (internally split into chunks of regular size)</li><li>the file has a header with the location of each tile</li><li>the file can have internal overview</li></ul><p>Basically, you take a well known open format (<a href="https://en.wikipedia.org/wiki/TIFF">created in the 80&#39;s</a>), enforce good usage and internal architecture, and then have a binary file optimized for remote access. Because the header has a map to the internal tiles and the geographic information, libraries like GDAL can easily understand which tiles to fetch (using GET Range-Requests) for a given area of interest (AOI), minimizing data transfer and HTTP requests.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/400/0*AzzkPJvnymfeYEOx" /><figcaption>Like this cute raccoon, GDAL is able to take just what it needs from a COG and runs really fast.</figcaption></figure><p><strong>Web map tile</strong> is a common format for distributed processing (see <a href="https://github.com/developmentseed/chip-n-scale-queue-arranger">chip-n-scale</a>) or for simple raster dataset visualization. Because we can read partial parts of the COG in an optimized way, and, if present, obtain a preview of the high resolution data from internal overviews, we can dynamically generate the tiles from COGs at request time.</p><p>Web maps are often based on static raster tiles stored as <strong>jpeg</strong> or <strong>png</strong>. A full set of tiles is created for <strong>each zoom level</strong> and stored in a tree-based file structure that allows users to zoom and pan a map. This approach requires you to pre-generate a tile tree consisting of millions of files. With a COG you can use internal overviews to create multiple zooms and internal tiles to stand in for map tiles, and thus, only have one file to manage for a large area. We call this process &quot;<strong>dynamic tiling</strong>&quot; because we access the raw data (e.g. surface reflectance or elevation) and then apply algorithms on it before creating the tile to display in the browser.</p><h4>How to create valid COGs</h4><p>Common Geographic Information System <strong>(GIS)</strong> software like QGIS supports exporting raster data to COG natively but if you want to do it programmatically you can use GDAL commands</p><pre># First add internal overviews <br>$ gdaladdo my-file.tif</pre><pre># Then translate the geotiff to a COG (`TILED=YES`) and keep overviews <br>$ gdal_translate -of GTiff -co TILED=YES -co COPY_SRC_OVERVIEWS=YES -co COMPRESS=DEFLATE my-file.tif my-cog.tif</pre><pre>$ rio cogeo my-file.tif my-cog.tif --cog-profile deflate</pre><p>or use <a href="https://github.com/cogeotiff/rio-cogeo">rio-cogeo</a> (see <a href="https://medium.com/devseed/cog-talk-part-1-whats-new-941facbcd3d1">COG Talk — Part 1</a>)</p><pre>$ rio cogeo my-file.tif my-cog.tif --cog-profile deflate</pre><h3>COGs everywhere</h3><p>More organizations are storing their data as a COG ( <a href="https://www.usgs.gov/land-resources/nli/landsat/landsat-collection-2?utm_source=twitter&amp;utm_medium=social&amp;utm_term=c8cf39f8-923f-4944-bf52-4f6035df8112&amp;utm_content=&amp;utm_campaign=&amp;qt-science_support_page_related_con=1#qt-science_support_page_related_con">Landsat level 2</a>, <a href="https://www.usgs.gov/news/usgs-digital-elevation-models-dem-switching-new-distribution-format">USGS DEM</a>, <a href="https://registry.opendata.aws/modis-astraea/">MODIS on AWS</a>), and while it’s not the most storage-efficient format (in comparison to JPEG2000), a COG is a more user-friendly format that enables <strong>fast and cheap access</strong> to the data (see this <a href="https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f">blog</a> for a comparison).</p><p>Having access to more data creates another kind of problem (a good one): due to the increase in data availability, we need to implement easier ways to <em>access/process/share</em> them at scale. Development Seed regularly advocates for open datasets, but what we love even more is to enable people to access and use the data. Take for instance, the example of opening up Landsat data. It was a big win for the open data community, but it posed several challenges using it with Earth Explorer. Libra was born as a result of needing a better tool and process for using this data, and still to this day has 3000 visitors per month 😱 !</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*qrYf-TxJXWvZAxUG" /></figure><h4>One scene or a million!</h4><p>The combination of distributed cloud services and an increase in datasets being stored as COGs, means users are now pivoting from single scene workflows to large scale processing (e.g. state, country wide). To support this, we created the <a href="https://github.com/developmentseed/mosaicjson-spec"><strong>mosaic-json</strong></a> specification, an open standard for representing metadata about sets of Cloud-Optimized GeoTIFF (see <a href="https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df">COG Talk -</a> <a href="https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df">Part-2</a>). With simplicity and performance in mind, the specification uses a simple <strong>quadkey</strong> based spatial index and allows overlapping scenes to create <strong>spatio-temporal mosaics</strong>.</p><p>In a dynamic tiling workflow, the <strong>mosaicJSON</strong> is a simple JSON file that acts as a proxy between Web Map tile requests (using Z-X-Y <a href="https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames">Slippy map tilenames</a>) and the list of files intersecting with this tile.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/0*vykrD4rbATy4zO1q.jpg" /><figcaption>Spatio-temporal mosaic-json.</figcaption></figure><h3>Real World example</h3><p>ABoVE: Landsat-derived Annual Dominant Land Cover Across ABoVE Core Domain, 1984–2014</p><ul><li>Ref: <a href="https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1691">https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1691</a></li><li>Format: GeoTIFF + internal overview</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*4Jdvj0wLoYIXyJ4W" /><figcaption>ABoVE dataset footprint.</figcaption></figure><p>The ABoVE dataset is comprised of 175 different GeoTIFFs at 30m resolution derived from Landsat (5 and 7) surface reflectance values. While it covers a pretty large area, the other interesting part of this dataset is the temporal aspect, because each file has 31 different bands, one for each year between 1984 to 2014.</p><p>The dataset is distributed as a GeoTIFF with internal overview. Sadly, they are not aligned with the Cloud Optimized GeoTIFF specification because they are not internally tiled and the overviews are located at the end of the files.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/d87777b7b6fd5c3c1380b5f9a7d57404/href">https://medium.com/media/d87777b7b6fd5c3c1380b5f9a7d57404/href</a></iframe><h4>GeoTIFF → COG → mosaicJSON</h4><p>To create user interfaces (UI) and tools that are as responsive as possible, we need the files stored as proper Cloud Optimized GeoTIFFs.</p><p>Here are the steps we took to convert the files:</p><pre><strong>1. Download the whole dataset and upload the GeoTIFFs to S3.<br></strong><br># head over https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1691 and login<br>$ wget {url of the zip}<br>$ unzip Annual_Landcover_ABoVE_1691.zip<br>$ aws s3 sync . s3://my-bucket/raw/ABoVe/<br><br><br><strong>2. Deploy </strong><a href="https://github.com/developmentseed/cogeo-watchbot-light"><strong>cogeo-watchbot-light</strong></a><strong> (a serverless stack based on AWS Lambda to run rio-cogeo at scale).<br></strong><br>$ git clone https://github.com/developmentseed/cogeo-watchbot-light<br>$ cd cogeo-watchbot-light<br>$ sls deploy --stage production --bucket my-bucket --region us-east-1<br><br><strong>3. Create COGs.<br></strong><br># create list of files to translate<br>$ aws s3 ls s3://my-bucket/raw/ABoVE/ | grep &quot;Simplified&quot; | awk &#39;{print &quot;s3://my-bucket/cogs/ABoVE/&quot;$NF}&#39; &gt; list_raw_files.txt<br><br># Send jobs to the stack<br>$ pip install rio-cogeo rio-tiler<br>$ cd scripts/<br>$ cat list_raw_files.txt | python -m create_jobs - \<br>   -p webp \<br>   --co blockxsize=256 \<br>   --co blockysize=256 \<br>   --op overview_level=6 \<br>   --op overview_resampling=bilinear \<br>   --prefix cogs/ABoVE \<br>   --topic arn:aws:sns:us-east-1:{AWS_ACCOUNT_ID}:cogeo-watchbot-light-production-WatchbotTopic<br><br><strong>4. Create a MosaicJSON (using </strong><a href="https://github.com/developmentseed/cogeo-mosaic"><strong>cogeo-mosaic</strong></a><strong>).<br></strong><br>$ aws s3 ls s3://my-bucket/cogs/ABoVE/ | awk &#39;{print &quot;s3://my-bucket/cogs/ABoVE/&quot;$NF}&#39; | cogeo-mosaic create - -o mosaic.json</pre><h4><strong>Explore</strong></h4><p>When the COGs and the mosaicJSON are formatted correctly we can use the <a href="https://github.com/developmentseed/cogeo-mosaic-tiler">cogeo-mosaic-tiler</a> stack to create web map tiles dynamically and visualize the data in a web map.</p><p>See it live: <a href="https://cogeo.xyz/projects/ABoVE/index.html">https://cogeo.xyz/projects/ABoVE/index.html</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*i6KL3VazV21XX1fUIEml1w.gif" /></figure><h4><strong>The temporal side</strong></h4><p>With the introduction of the mosaicJSON specification (see <a href="https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df">COG Talk </a>— <a href="https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df">Part 2</a>), we released an open source python module <a href="https://github.com/cogeotiff/rio-tiler-mosaic">rio-tiler-mosaic</a> to handle the creation of tiles using multiple files. One important feature of this plugin is its ability to do pixel selection dynamically, meaning the user can choose how each pixel value is created (e.g. take the pixel value from the first image or from the last one). It also enables custom pixel selection methods:</p><pre>&quot;&quot;&quot;Custom stddev pixel selection method.&quot;&quot;&quot;</pre><pre>import numpy<br>from rio_tiler_mosaic.methods.base import MosaicMethodBase</pre><pre>class bidx_stddev(MosaicMethodBase):<br>    &quot;&quot;&quot;Return bands stddev.&quot;&quot;&quot;</pre><pre>def __init__(self):<br>        &quot;&quot;&quot;Overwrite base and init bands stddev method.&quot;&quot;&quot;<br>        super(bidx_stddev, self).__init__()<br>        self.exit_when_filled = True</pre><pre>def feed(self, tile):<br>        &quot;&quot;&quot;Add data to tile.&quot;&quot;&quot;<br><strong>        tile = numpy.ma.std(tile, axis=0, keepdims=True)<br></strong>        if self.tile is None:<br>            self.tile = tile</pre><pre>        pidex = self.tile.mask &amp; ~tile.mask<br>        mask = numpy.where(pidex, tile.mask, self.tile.mask)<br>        self.tile = numpy.ma.where(pidex, tile, self.tile)<br>        self.tile.mask = mask</pre><p>The <a href="https://github.com/developmentseed/cogeo-mosaic-tiler/blob/master/cogeo_mosaic_tiler/custom_methods.py">pixel selection</a> method above was specifically design for the ABoVE dataset. On each tile request the dynamic tiler using this method will return the standard deviation value for the stack of bands, enabling us to find where the land cover classification values have changed over the 31 year span.</p><p>See it live: <a href="https://cogeo.xyz/projects/ABoVE/stddev.html">https://cogeo.xyz/projects/ABoVE/stddev.html</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*J8QEFMAAR_e23DHq" /><figcaption>Standard deviation values for the full stack of 31 values (years).</figcaption></figure><h3>❤️<strong> STAC + COG</strong></h3><p>DevelopmentSeed is advancing the adoption of the new <a href="https://github.com/radiantearth/stac-spec">STAC</a> metadata specification (checkout our latest <a href="https://medium.com/devseed/sat-api-pg-a-postgres-stac-api-af605cafd88d">sat-api-pg</a> project). Standardized and well-formatted metadata is an important step toward the democratization of remote sensed data and it also help us to create nice visualization tools.</p><h4><strong>Introducing </strong><a href="https://landsatlive.live"><strong>landsatlive.live</strong></a></h4><p>By combining both STAC and mosaicJSONs we built a small demo called <a href="https://landsatlive.live/#8/25.993/-80.123"><strong>landsatlive.live</strong></a> (shoutout to the great <a href="http://landsat.live"><strong>landsat.live</strong></a> by our friends at Mapbox). This demo lets you visualize mosaics created dynamically using <a href="https://github.com/sat-utils/sat-api">sat-api</a> (our STAC search api) for the Landsat 8 dataset hosted on AWS.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/0*t5kTe6-sK6IJp1Sm" /><figcaption>A mosaic of Landsat 8 scenes. Web map tiles created dynamically using sat-api + mosaic-json + rio-tiler-mosaic.</figcaption></figure><h3>COG for the best</h3><p><strong>Cloud Optimized GeoTIFF</strong> is one of the go-to data formats for when organizations are looking to store their data in the cloud. Utilizing this approach breaks down the barrier of accessing data. We’ve put together an open and user-friendly suite of tools that allows users to process data by combining STAC, mosaicJSON, and the <a href="https://github.com/developmentseed?utf8=%26%2310003%3B&amp;q=cogeo&amp;type=&amp;language=">cogeo-*</a> tools, allowing anyone to access and analyze planetary data at scale.</p><p>For more information on how COGs can solve many of your data-related problems feel free to ping me on <a href="https://twitter.com/_VincentS_"><strong>Twitter</strong></a> or <a href="https://www.linkedin.com/in/vincentsarago/"><strong>LinkedIn</strong></a><strong>! </strong>If you are interested in joining Development Seed to help us build further and faster take a look at our <a href="https://developmentseed.org/careers/jobs/">open positions</a>!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e9cb23e33281" width="1" height="1" alt=""><hr><p><a href="https://medium.com/devseed/cog-talk-part-4-enabling-spatio-temporal-data-processing-at-scale-e9cb23e33281">COG Talk — Part 4: Enabling Spatio-temporal data processing at scale</a> was originally published in <a href="https://medium.com/devseed">Development Seed</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[COG Talk — Part 3: Translate COG to Mapbox Vector Tiles]]></title>
            <link>https://medium.com/devseed/cog-talk-part-3-vector-tiles-4a37bf6d865f?source=rss-754c34eee3ad------2</link>
            <guid isPermaLink="false">https://medium.com/p/4a37bf6d865f</guid>
            <category><![CDATA[visualization]]></category>
            <category><![CDATA[earth-observation]]></category>
            <category><![CDATA[cog]]></category>
            <category><![CDATA[maps]]></category>
            <category><![CDATA[vector-tiles]]></category>
            <dc:creator><![CDATA[Vincent Sarago]]></dc:creator>
            <pubDate>Mon, 24 Jun 2019 15:35:05 GMT</pubDate>
            <atom:updated>2019-06-24T19:50:01.302Z</atom:updated>
            <content:encoded><![CDATA[<h3>COG Talk — Part 3: Translate COG to Mapbox Vector Tiles</h3><h4>Today we’re releasing <strong>rio-tiler-mvt, </strong>a rio-tiler plugin to create Mapbox Vector Tiles from Cloud-Optimized GeoTIFFs (COGs). It enables better dynamic web map visualizations especially for sparse datasets stored. This is the result of recent work where we had the need to visualize LiDAR data in-browser. We experimented with generating our visualizations on-the-fly by generating vector data directly from the source Cloud-Optimized GeoTIFFs (COGs). While the initial approach felt clumsy at the time, we’ve since polished it up into a proper plugin with impressive performance.</h4><p>​​This is the third post of our COG Talk series (check out the introduction in <a href="https://medium.com/devseed/cog-talk-part-1-whats-new-941facbcd3d1">Part 1</a> and use of COG mosaics in <a href="https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df">Part 2</a>).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/772/0*dCO1zaCKxS7LimhB" /><figcaption>Lidar dataset displayed as vector tiles (top) or raster (bottom). Data from <a href="http://donnees.ville.montreal.qc.ca/dataset/lidar-aerien-2015">Montreal Open Data</a>.</figcaption></figure><p>Cloud Optimized GeoTIFF is an excellent format for storing remote sensing data because the file structure provides a convenient method for data access and visualization. When we want to access a smaller raster — either as an array for analysis or a PNG/JPEG for visualization — we can easily read just that portion of the data. Most tools stop at this point and return a raster value, which is exactly what we want in most cases. But with sparse datasets, this isn’t always the best approach.</p><p>The <a href="https://data.humdata.org/dataset/highresolutionpopulationdensitymaps">population dataset</a> mentioned in our <a href="https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df">second post</a> is a great example for showing how COG mosaics work on a technical level because we need to combine multiple large COGs into one seamless product. But when visualizing this as a raster, you’ll notice that the output isn’t ideal because the sparse data makes it difficult to see individual pixels.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*6jPnQ_79T-mFcW8_" /><figcaption>High-resolution population density data from Facebook AI (<a href="https://ai.facebook.com/blog/mapping-the-world-to-help-aid-workers-with-weakly-semi-supervised-learning">link</a>) displayed as raster tiles.</figcaption></figure><p>To solve this problem, we want more control over the rendering of the data. We would ideally have a simple way to query data from COGs but return data in another non-raster format. For point visualization, Mapbox Vector Tile (<a href="https://docs.mapbox.com/vector-tiles/reference/">MVT</a>) format seems to be a better fit and it also enables client-side rendering and user interaction (e.g. changing colors dynamically and clicking on the data).</p><p><strong>🎉 rio-tiler-mvt 🎉</strong></p><p>Today we’re releasing <a href="http://github.com/cogeotiff/rio-tiler-mvt"><strong>rio-tiler-mvt</strong></a><strong>, </strong>a rio-tiler plugin to encode tile arrays as Mapbox Vector Tiles. We started this as an experiment to see how far we could push Vector Tiles encoding from COG tiles. With some help from <a href="https://github.com/yohanboniface">Yohan Boniface</a>, we updated the <a href="https://github.com/tilery/python-vtzero">python-vtzero</a> library which enables fast MVT encoding in python (wrapping Mapbox’s <a href="https://github.com/mapbox/vtzero">vtzero</a> C++ library) and then created this small <a href="https://github.com/cogeotiff/rio-tiler-mvt/">rio-tiler plugin</a> to convert raster tile values to vector features on the fly.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/457e552d8abe0c3e291e970efde08a03/href">https://medium.com/media/457e552d8abe0c3e291e970efde08a03/href</a></iframe><p>The resulting vector tiles make visualizing sparse data much easier but is surprisingly fast for dense data sets as well.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/0*Lbc_yt8J8LaVRIUv" /><figcaption>Lidar dataset stored as Cloud Optimized GeoTIFF and served as raster or vector tiles. Data from <a href="http://donnees.ville.montreal.qc.ca/dataset/lidar-aerien-2015">Montreal Open Data</a>.</figcaption></figure><p>And it also works well with COG mosaics, like the Facebook population data from above (<a href="http://bl.ocks.org/vincentsarago/raw/122985dd22722750ff6802f8f46d3c77/">link</a>).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/0*I1nUcECO0Mn0U0XA" /><figcaption>COG mosaic from high-resolution population density data from Facebook AI (<a href="https://ai.facebook.com/blog/mapping-the-world-to-help-aid-workers-with-weakly-semi-supervised-learning">link</a>) displayed as vector tiles + extrusion (3d rendering).</figcaption></figure><h3>⚠️ Important notes:</h3><ul><li>This is an experiment and there is still some work to be done on python-vtzero (<a href="https://github.com/tilery/python-vtzero/issues">issues</a>) to enable better data encoding.</li><li>A COG is still a COG. When creating tiles at lower zoom level than the raster’s native resolution, rio-tiler is fetching overviews (a <strong>downsampled </strong>version of the raw data) so the displayed vector value is not always equal to the raw value.</li><li>Does it work with LiDAR data? Yes and no. LiDAR datasets are usually very dense, meaning each tile (256x256px) will create 65,536 points values and might be too much to handle for the web client.</li><li>Looking forward to VT3. The next iteration of the Mapbox Vector Tiles specification (3) will add better 3D (X,Y, Z) data support (<a href="https://github.com/mapbox/vector-tile-spec/issues/111">link</a>).</li></ul><h3>Pushing to the limits</h3><p>If you are not scared about <a href="https://github.com/mapbox/mapbox-gl-js">mapbox-gl-js</a> burning your laptop, you can try this <a href="http://bl.ocks.org/vincentsarago/raw/baa945af223caad62d088e4dde261d9f/#9/37.715/-119.5591">demo</a> in which we use <a href="https://www.mapbox.com/maps/satellite/">Mapbox&#39;s awesome satellite base map</a> and <a href="https://registry.opendata.aws/terrain-tiles/">terrain data hosted on AWS PDS</a> to create <strong>RGB + Elevation</strong> vector tiles and display it as extruded colored polygons (with x2.5 vertical exaggeration).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*m9p73YlcrXS3X84U" /><figcaption><a href="http://bl.ocks.org/vincentsarago/raw/baa945af223caad62d088e4dde261d9f/#10.48/37.8021/15.0563/0/60">Mount Etna</a> volcano, Italy.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*M2ru9-lAjch_UNCI" /><figcaption><a href="http://bl.ocks.org/vincentsarago/raw/baa945af223caad62d088e4dde261d9f/#13.39/-25.33761/131.04674/44.1/53">Uluru Inselberg</a> (also known as Ayers Rock), Australia.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/0*Q5efAt4zjbKXyRYi" /><figcaption><a href="http://bl.ocks.org/vincentsarago/raw/baa945af223caad62d088e4dde261d9f/#10.93/37.3009/-110.72/9/60">Glen Canyon National Recreation Area</a>, United States.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*yjGdEPs1xnxcbXtV" /><figcaption><a href="http://bl.ocks.org/vincentsarago/raw/baa945af223caad62d088e4dde261d9f/#10.12/37.7742/-122.389/17.6/59">San Francisco area</a>, United States.</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*iGSN0j-aQSWeTyRzl5irGg.gif" /><figcaption><a href="http://bl.ocks.org/vincentsarago/raw/baa945af223caad62d088e4dde261d9f/#11.07/-39.2819/174.0796/0.7/50">Mount Taranaki</a>, New Zealand.</figcaption></figure><p>Please feel free to ping me <a href="https://twitter.com/_VincentS_">@_VincentS</a> if you have questions or want to hear more about the work we are doing to make open data more accessible and easier to use.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4a37bf6d865f" width="1" height="1" alt=""><hr><p><a href="https://medium.com/devseed/cog-talk-part-3-vector-tiles-4a37bf6d865f">COG Talk — Part 3: Translate COG to Mapbox Vector Tiles</a> was originally published in <a href="https://medium.com/devseed">Development Seed</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[COG Talk — Part 2: Mosaics]]></title>
            <link>https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df?source=rss-754c34eee3ad------2</link>
            <guid isPermaLink="false">https://medium.com/p/bbbf474e66df</guid>
            <category><![CDATA[queryearth]]></category>
            <category><![CDATA[remote-sensing]]></category>
            <category><![CDATA[earth-observation]]></category>
            <category><![CDATA[maps]]></category>
            <category><![CDATA[cog]]></category>
            <dc:creator><![CDATA[Vincent Sarago]]></dc:creator>
            <pubDate>Thu, 23 May 2019 19:30:31 GMT</pubDate>
            <atom:updated>2019-05-23T19:30:31.432Z</atom:updated>
            <content:encoded><![CDATA[<h3>COG Talk — Part 2: Mosaics</h3><h4>This blog is the second in a series called COG Talk, which looks at ways to use Cloud Optimized GeoTIFF, and <em>why</em> we use them.</h4><p>The<strong> </strong><a href="https://medium.com/devseed/cog-talk-part-1-whats-new-941facbcd3d1"><strong>first post</strong></a> is a refresh on the COG format and announces the release of version 1.0.0 of rio-tiler and rio-cogeo. Here, we’ll see how we can use them to build mosaics for web maps.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*slf42UkOdCur8UKL" /><figcaption>Multiple high resolution Cloud Optimized GeoTIFF hosted on OpenAerialMap (<a href="https://map.openaerialmap.org/#/-84.45327758789061,40.55763465737646,9/user/5ca3a1d0e5635f000690af69?_k=71blxk">link</a>)</figcaption></figure><h3>COG vs Map Tiles</h3><p>Cloud Optimized GeoTIFF files, as the name implies, are specifically designed for easily accessing remote raster data. Because of the internal tiling and internal overviews, people often ask: <strong>can</strong> <strong>COGs replace map tiles</strong>? The usual response is: <strong>yes, but</strong>…</p><p>Cloud Optimized GeoTIFF can replace .mbtiles or statically generated map tiles by using a proxy to render tiles dynamically (e.g <a href="https://github.com/vincentsarago/lambda-tiler"><strong>lambda tiler</strong></a>). But when it comes to large datasets, the GeoTIFF files become too big and lose the advantage of fast remote reads.</p><p>There is no size limit for GeoTIFF and when working with a country or worldwide dataset, COGs can get quite large. Even using compression to create a reasonably sized file, it’s likely that the resulting GeoTIFF <a href="https://medium.com/devseed/cog-talk-part-1-whats-new-941facbcd3d1"><strong>header</strong> </a>— used to look up internal data tile locations — will be very large and will slow down the dynamic tiling processes.</p><p>If we instead decide to read a large collection of files and create a mosaic, we have a new set of issues: how can we decide which pixel to display given overlapping tiles? How can we update this pixel choice decision if we have daily updated data?</p><p>To help solve those problems, we are releasing <a href="https://github.com/cogeotiff/rio-tiler-mosaic"><strong>rio-tiler-mosaic</strong></a><strong>,</strong> a rio-tiler plugin allowing Mercator tile creation from multiple observations, and the associated <a href="https://github.com/developmentseed/mosaicjson-spec"><strong>mosaicJSON</strong></a> specification.</p><h3>rio-tiler-mosaic</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/349/0*3eD-O2fwLVz7kV8x" /></figure><p>Creating a mosaic, in its simplest form, involves choosing pixels from multiple images to create a single image. To provide a dynamic map tile endpoint, we’ll need to repeat this process for each individual Web-Mercator tile. rio-tiler-mosaic provides two important methods to make this simple: <strong>pixel selection</strong> for merging the tile arrays together and <strong>smart</strong> <strong>multi-threading</strong> for quickly managing a large number of images.</p><h4><strong>Pixel selection</strong></h4><p>Creating map tiles using COGs means we are dynamically generating the image array in response to a tile request. When working with mosaics, it also means we need to choose which pixel we display when we have an overlapping dataset. For a given tile, we iterate over all intersecting input images to decide which pixel to choose from. By default, rio-tiler-mosaic provides four different pixel selections rules:</p><ul><li><strong>First</strong>: take the pixel in the first matching image and return when the tile is full</li><li><strong>Brightest</strong>: loop through all images and return the highest pixel value</li><li><strong>Darkest</strong>: loop through all images and return the lowest pixel value</li><li><strong>Last</strong>: take the pixel in the last matching image and return when the tile is full</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/603/0*LbdWWF5Tr9-IfROV" /><figcaption>Pixel selection methods applied on Landsat-8 NDVI values for all 2018 observations over Montreal area.</figcaption></figure><h4><strong>Smart multi-threading</strong></h4><p>For each pixel selection method, we need to either process the whole stack of images or just the first few until the array is full. We are using <a href="https://docs.python.org/3/library/concurrent.futures.html">multi-threading</a> to fetch and read data in parallel to speed up the process. Because the First and Last pixel-selection methods should return as soon as the tile is totally filled, we implemented a <strong>partial multi-threading approach</strong> by processing chunks of assets in parallel instead of the full list (see <a href="https://github.com/cogeotiff/rio-tiler-mosaic/blob/master/rio_tiler_mosaic/mosaic.py#L80-L82">code</a>). This is particularly handy when the list of assets is long.</p><h4>Usage</h4><p>The mosaic tile handler is designed to roughly match <a href="https://github.com/cogeotiff/rio-tiler#usage">rio-tiler</a>’s tile handlers and returns tile and mask arrays. Here is an example of generating a mosaic tile.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/c70056e15c5718d4606b3721368d7164/href">https://medium.com/media/c70056e15c5718d4606b3721368d7164/href</a></iframe><p>We have published a first beta version of <strong>rio-tiler-mosaic</strong> on <a href="https://pypi.org/project/rio-tiler-mosaic/">Pypi</a> and the source code is available on <a href="https://github.com/cogeotiff/rio-tiler-mosaic">Github</a>.</p><h3>mosaicJSON</h3><p>In addition to rio-tiler-mosaic, today we are releasing a <a href="https://github.com/developmentseed/mosaicjson-spec">specification</a> for representing Web Mercator mosaics constructed from multiple observations. The <strong>mosaicJSON</strong> specification can be seen as the GDAL VirtualRaster ( <a href="https://www.gdal.org/gdal_vrttut.html">VRT</a>), but for indexing files by Web Mercator <a href="https://developer.here.com/documentation/traffic/common/map_tile/topics/quadkeys.html">quadkeys</a>. <em>Quadkeys</em> (a contraction of <strong>quadtree</strong> and <strong>key</strong>) are &quot;one-dimensional strings&quot; representing unique Z-X-Y (level-row-column) Web Mercator map tiles.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*EVVzzBFv86JSAc7j" /><figcaption>COG footprint and Quadkey index</figcaption></figure><p>The goal of the metadata specification is to provide a simple spatial index linking the COGs to the XYZ Web Mercator map tile to render. Note that in the rio-tiler-mosaic examples above, we assume that the list of input image assets for a given tile is already &quot;known&quot;: the mosaicJSON file can provide that input.</p><p>The most important requirement is that each <strong>quadkey</strong> has a zoom level equal to the mosaicJSON <strong>minzoom</strong>. We use this to calculate the parent tile for a given input tile between the mosaic’s <strong>minzoom</strong> and <strong>maxzoom</strong>.</p><p>While this is still a <strong>Work in Progress</strong> the main features are:</p><ul><li><strong>quadkey</strong> based file index</li><li>simple JSON format (enabling high ratio compression)</li></ul><h4>Specification</h4><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/6ea6f190acd4494393a867debbdb9a38/href">https://medium.com/media/6ea6f190acd4494393a867debbdb9a38/href</a></iframe><p>A complete example of a mosaic definition based on mosaicJSON specification can be found <a href="https://github.com/developmentseed/mosaicjson-spec/blob/master/0.0.1/example/dg_post_idai.json"><strong>here</strong></a>.</p><h4><strong>Example of implementation</strong></h4><p>Here is a simple implementation of mosaic tiling using the specification. On each tiler&#39;s call, our handler ( my_mosaic_handler) function fetches the mosaic file and looks for the assets indexed by the Web Mercator parent tile (at <strong>minzoom</strong> level) for the input XYZ tile. We then use the rio-tiler-mosaic to combine the different image assets into one tile.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/44d3be61c41cb35d40cf7bc7b32e8cfe/href">https://medium.com/media/44d3be61c41cb35d40cf7bc7b32e8cfe/href</a></iframe><h4>Creating a mosaicJSON definition</h4><p>Now that we know how to use it, let’s create a new mosaicJSON definition. Let’s say we have 28 files (e.g. <a href="https://data.humdata.org/dataset/highresolutionpopulationdensitymaps">Facebook population density</a>) covering almost the whole continent of Africa.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*c8PSRyUNhQdeMbz9" /><figcaption>High-resolution population density maps COGs from Facebook AI (<a href="https://ai.facebook.com/blog/mapping-the-world-to-help-aid-workers-with-weakly-semi-supervised-learning">link</a>)</figcaption></figure><h4><strong>Quadkey base zoom (or mosaic min-zoom):</strong></h4><p>To construct the spatial index we need to define a minimal set of <strong>quadkeys</strong>. By definition, all COGs intersect with the tile 0-0-0 (zoom 0) but we can do a little math to minimize the set further. Inspecting the input TIFs, we see that the native resolution is around <strong>0.00027 degrees </strong>with <strong>8 levels of overviews.</strong> The native resolution corresponds to Web Mercator <strong>zoom</strong> <strong>level</strong> <strong>12 </strong>so we can guess that we should start tiling at minzoom = 4 (12 - 8 = 4). We&#39;ll use this as the base zoom for <strong>quadkey</strong> keys.</p><pre># Create Footprint</pre><pre>$ parallel -j4 rio bounds ::: $(aws s3 ls opendata.remotepixel.ca/facebook/ --recursive | grep &quot;.tif&quot; | awk &#39;{print &quot;s3://opendata.remotepixel.ca/&quot;$NF}&#39;) | jq -c &#39;.features[0]&#39; | fio collect &gt; facebook.geojson</pre><pre># Find ourZoom 4 quadkeys(there are 13) </pre><pre>$ cat facebook.geojson | supermercado burn 4 | mercantile quadkey | paste -s -d&quot;,&quot; -</pre><pre>0330,0331,1220,1221,0333,1222,1223,3000,3001,3010,3002,3003,3012</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*n0pQlnQZVQLGp6sM" /><figcaption>Zoom 4 Mercator tiles intersecting with the facebook population dataset.</figcaption></figure><p>The mosaic definition is then built by finding the COGs intersecting with each of the 13 zoom 4 tiles:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/2d2d68d35182d3a12f81fa186727f5c8/href">https://medium.com/media/2d2d68d35182d3a12f81fa186727f5c8/href</a></iframe><h3>cogeo-mosaic: a CLI and a Serverless stack to create and use mosaicJSON</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/548/0*TEQJ9--0Z9yfa8nP" /><figcaption><a href="https://github.com/developmentseed/cogeo-mosaic">https://github.com/developmentseed/cogeo-mosaic</a></figcaption></figure><p>Wrapping up this new specification and the new rio-tiler-mosaic plugin we are also releasing <a href="https://github.com/developmentseed/cogeo-mosaic"><strong>cogeo-mosaic</strong></a><strong>,</strong> a Serverless stack (based on AWS Lambda) to create and use the mosaicJSON specification. You can also install <strong>cogeo-mosaic</strong> locally and use the built-in CLI to create mosaicJSON from a set of files.</p><pre>$ pip install https://github.com/developmentseed/cogeo-mosaic</pre><pre>$ cogeo-mosaic create my_list_of_files.txt -o mosaic.json</pre><h3>Is mosaicJSON appropriate for all mosaic map tile problems?</h3><p>Obviously not, but we have been testing this solution on different projects and we find it simple and fast in most cases. That said, here are the pro/cons:</p><p><strong>Pro</strong></p><ul><li>Less filesto manage: In the case of the Facebook dataset, we ended up having 29 files stored in the cloud instead of <strong>~600 000</strong> if we created all the Mercator tiles image files.</li></ul><pre># Get number of tiles from zoom 4 to zoom 12 using <a href="https://github.com/mapbox/supermercado/pull/26">https://github.com/mapbox/supermercado/pull/26</a></pre><pre>$ cat facebook.geojson | supermercado burn 4..12 | sort | uniq | wc -l</pre><pre>  613 456</pre><ul><li>Flexibility: When we create the tile, we have the ability to decide whichpixel should have priority and be rendered on top.</li><li>mosaicJSON files can be relatively small (especiallyif using compression) and can be cached to increase performance.</li></ul><p><strong>Cons</strong></p><ul><li>Because we need a dynamic tiler, creating a tile on the fly will always be slightly slower than having the tile already ready to be served to the client.</li><li>Tile creation can take several seconds when using <strong>darkest/brightest </strong>methods (becauseit has to read all the COGs).</li><li>Only supports theWeb Mercator projection (as for rio-tiler).</li><li>Generating mosaicJSON files can be slowfor large areas with many images.</li></ul><h3>Community</h3><p>Both the <a href="https://github.com/developmentseed/mosaicjson-spec"><strong>mosaicJSON</strong></a><strong> </strong>specification and <a href="https://github.com/cogeotiff/rio-tiler-mosaic"><strong>rio-tiler-mosaic</strong></a><strong> </strong>are published on Github and we welcome any feedback and/or contributors. Please feel free to ping me on Twitter <a href="https://twitter.com/_VincentS_">@_VincentS_</a> if you have questions or want to hear more about the work we are doing to make open data more open and easier to use.</p><h3>Demo</h3><p>We built a simple demo page where you can explore some mosaicJSON examples and even share your own: <a href="https://bl.ocks.org/vincentsarago/raw/815884188c243b636ab8d927d8942a4d/">https://bl.ocks.org/vincentsarago/raw/815884188c243b636ab8d927d8942a4d/</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*nk7MbFvO-wveUQcjNp_Puw.gif" /><figcaption>Hundreds of mercator tiles created dynamically based on mosaicJSON definition.</figcaption></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=bbbf474e66df" width="1" height="1" alt=""><hr><p><a href="https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df">COG Talk — Part 2: Mosaics</a> was originally published in <a href="https://medium.com/devseed">Development Seed</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[COG Talk — Part 1: What’s new?]]></title>
            <link>https://medium.com/devseed/cog-talk-part-1-whats-new-941facbcd3d1?source=rss-754c34eee3ad------2</link>
            <guid isPermaLink="false">https://medium.com/p/941facbcd3d1</guid>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[cog]]></category>
            <category><![CDATA[earth-observation]]></category>
            <category><![CDATA[queryearth]]></category>
            <category><![CDATA[remote-sensing]]></category>
            <dc:creator><![CDATA[Vincent Sarago]]></dc:creator>
            <pubDate>Fri, 03 May 2019 18:28:56 GMT</pubDate>
            <atom:updated>2019-05-03T18:33:19.203Z</atom:updated>
            <content:encoded><![CDATA[<h3>COG Talk — Part 1: What’s new?</h3><h4>This blog is the first in a series called <strong>COG Talk,</strong> which looks at ways to use Cloud Optimized GeoTIFF, and <em>why</em> we use them.</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*OdP7uZcuYIVeP6Ir" /><figcaption><a href="https://github.com/RemotePixel/remotepixel-tiler">remotepixel-tiler </a>uses rio-tiler to dynamically create Web Map tiles from Landsat-8 data hosted on AWS.</figcaption></figure><p>For more than a year, we’ve been working on building out a suite of tools to make Cloud Optimized GeoTIFFs (COGs) easy to work with. Today we are excited to announce we are releasing version 1 of <a href="https://github.com/cogeotiff/rio-tiler">rio-tiler</a> and <a href="https://github.com/cogeotiff/rio-cogeo">rio-cogeo</a> 🎂!</p><p>Both modules are:</p><ul><li>well tested</li><li>actively maintained</li><li>support python 2 and python 3</li><li>easy to install (thanks to rasterio <a href="https://github.com/mapbox/rasterio#binary-distributions">wheels</a>)</li></ul><h3>COGs — The Basics</h3><p>Let’s start with a quick refresher on the <a href="https://github.com/cogeotiff/cog-spec/blob/master/spec.md">COG specification</a>:</p><p>COGs are powerful because of how the data is structured internally. If done properly, the data can be accessed via HTTP range requests, meaning you can read only a small portion of a file instead of downloading the whole thing. This matters because the size of an individual block of data within the image can be small and easy to download with a simple GET request. To enforce this, COGs bigger than 1024 pixels by 1024 pixels have to be internally tiled.</p><p>The metadata header has a specific structure (by construction) and holds the <a href="https://www.awaresystems.be/imaging/tiff/faq.html#q3">Image File Directory</a> (IFD) of each data block (internal tile). The IFD is critical to a COG, because it holds information (TileOffsets and TileByteCounts) about each internal tile. This means that by fetching only the first few bytes of the data we can then construct an internal map of the data.</p><p>The other (optional) feature is the <strong>overview</strong>. By adding <strong>internal</strong> overviews (reduced resolution versions of the raw data), we can now preview the data using fewer range requests.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/512/0*Nu3Qy2DueYg10Zs4" /></figure><p>Refs: <a href="https://github.com/cogeotiff/cog-spec/blob/master/spec.md">https://github.com/cogeotiff/cog-spec/blob/master/spec.md</a></p><h3>Rio-cogeo</h3><pre>$ pip install rio-cogeo~=1.0</pre><p>While Cloud Optimized GeoTIFFs are beginning to see wider use, the creation of such files can still be a tricky process and when we started working on rio-cogeo there wasn’t an easy standalone solution. The goal was to build a simple yet powerful CLI to create and validate COGs.</p><h4><strong>COG creation</strong></h4><pre><strong>BEFORE</strong></pre><pre># Add overviews <br>$ gdaladdo in.tif </pre><pre># Enforce internal tiling, add compression and re-organize internal structures <br>$ gdal_translate in.tif cog.tif -co TILED=YES -co COPY_SRC_OVERVIEWS=YES -co COMPRESS=DEFLATE</pre><pre><strong>NOW</strong> - with rio-cogeo</pre><pre>$ rio cogeo create int.tif cog.tif</pre><p>rio-cogeo does the exact same thing as the GDAL commands (creating overviews, tiling and compressing) but it also provides seven different <a href="https://github.com/cogeotiff/rio-cogeo#default-cogeo-profiles"><strong>profiles</strong></a> to help the user choose the best configuration for their needs. Each profile can be extended using the --co options.</p><h4><strong>Web Optimized COG (WOG?!)</strong></h4><p>One important feature we found valuable to add was the <a href="https://github.com/cogeotiff/rio-cogeo#web-optimized-cog">--web-optimized</a> options, which enables the creation of a web-tiling friendly COG. This aligns the internal tiles with the web mercator grid and overview levels match the standard slippy map zoom levels. This is similar in concept to mbtiles with the advantage of allowing fast remote access of partial data reads.</p><h4><strong>Interpretation of the specification</strong></h4><p>While rio-cogeo respects the <a href="https://github.com/cogeotiff/cog-spec/blob/master/spec.md">COG specifications</a>, by default this plugin enforces features like:</p><ul><li><strong>Internal overviews</strong> (User can remove overviews with option <a href="https://github.com/cogeotiff/rio-cogeo#web-optimized-cog">--overview-level 0</a>)</li><li><strong>512x512 px internal tiles </strong>(can be overwritten with --co options)</li></ul><h4><strong>Example</strong></h4><p>rio-cogeo has a nice CLI but it can also be used directly inside your own scripts. Checkout <a href="https://github.com/developmentseed/sentinel-2-cog">sentinel-2-cog</a> to see how you could convert the whole Sentinel-2 Catalog for $90K ( <a href="https://gist.github.com/vincentsarago/0e8c3fba19a4b2b6855bca77f18b88fb">link</a>)</p><h4><strong>COG validation</strong></h4><p>The other feature we wanted to add was a validation option. Until now, people have had to rely on downloading the standalone script <a href="https://raw.githubusercontent.com/vincentsarago/gdal/master/gdal/swig/python/samples/validate_cloud_optimized_geotiff.py">validate_cloud_optimized_geotiff.py</a> or using Radiant Earth’s <a href="http://cog-validate.radiant.earth/html">hosted version</a>. Now you can easily validate with a single command:</p><pre>$ rio cogeo validate cog.tif </pre><h3>Rio-tiler</h3><pre>$ pip install rio-tiler~=1.2</pre><p>Before creating <a href="https://github.com/cogeotiff/rio-cogeo"><strong>rio-cogeo</strong></a> we started working on <a href="https://github.com/cogeotiff/rio-tiler">rio-tiler,</a> a library to improve the ability to visualize COGs available on <a href="https://registry.opendata.aws">AWS Public Datasets</a>. rio-tiler is generally used as part of a web map server to dynamically generate a <a href="https://wiki.openstreetmap.org/wiki/Tiles">map tile</a> from an underlying COG source file (rather than generating them beforehand). Initially the library was built for specific satellites ( <a href="https://blog.mapbox.com/combining-the-power-of-aws-lambda-and-rasterio-8ffd3648c348">Landsat-8</a>, then Sentinel-2 and CBERS-4), but it can now be used with any COGs.</p><h4><strong>Get Mercator tile from a cloud hosted file</strong></h4><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/b90d29e4b8ca16e812ed45a5a8894e30/href">https://medium.com/media/b90d29e4b8ca16e812ed45a5a8894e30/href</a></iframe><h4><strong>Features in rio-tiler~=1.0</strong></h4><ul><li>Rasterio 1.0</li><li>Support for Landsat-8, CBERS-4 and Sentinel-2* AWS Public dataset</li><li>Better image encoding using GDAL (previously done using Pillow)</li><li>Colormap for output tile image (rio-tiler can apply pre-defined or custom colormap on the output tile image)</li><li><strong>Expression</strong> support for band ratios (e.g request expr=((b1-b2)/(b1+b2)))</li><li><strong>Statistical</strong> functions (get min/max/histogram)</li></ul><p>(see <a href="https://github.com/cogeotiff/rio-tiler/blob/master/CHANGES.txt#L1-L204">Changelog</a>)</p><p>*sentinel-2 data on AWS is stored as JPEG2000 and in a requester-pays bucket. User will need to assume cost for each tile request (<a href="https://github.com/cogeotiff/rio-tiler#partial-reading-on-cloud-hosted-dataset">link</a>).</p><p><strong>rio-tiler</strong> plays a major role in most of our dynamic tiler related projects. With the release of 1.0, we have a solid baseline to move forward with other features, so stay tuned and subscribe to the <a href="https://github.com/cogeotiff/rio-tiler">rio-tiler</a> repo to follow our progress.</p><h4>Community</h4><p>Thanks to our friends at Mapbox, we agreed to move <a href="https://github.com/cogeotiff/rio-tiler"><strong>rio-tiler</strong></a><strong> </strong>and <a href="https://github.com/cogeotiff/rio-cogeo"><strong>rio-cogeo</strong></a> to a new organization: <a href="https://github.com/cogeotiff"><strong>cogeotiff</strong></a>. We’re partnering with Chris Holmes to build out an ecosystem of<strong> open source</strong> tools around COGs and we welcome anyone who’s interested in contributing to reach out on <a href="https://twitter.com/_VincentS_">Twitter</a> or comment on our <a href="https://github.com/cogeotiff">repos</a>.</p><h3>Further reading</h3><ul><li>Cloud Optimized GeoTIFF Webpage: <a href="http://www.cogeo.org/">www.cogeo.org</a></li><li>Chris Holmes series on COG: <a href="https://medium.com/tag/cloud-native-geospatial/latest">https://medium.com/tag/cloud-native-geospatial/latest</a></li><li>Why COG is a great format for open dataset: <a href="https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f">https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f</a></li><li>Peter Becker AWS re:Invent talk about COG and MRF: <a href="https://www.youtube.com/watch?v=U486YxlDoeM&amp;index=4&amp;list=PL5PrbYlLsiiM5PUtGOR56GXmRF1wjU1JJ&amp;t=3021s">https://www.youtube.com/watch?v=U486YxlDoeM&amp;index=4&amp;list=PL5PrbYlLsiiM5PUtGOR56GXmRF1wjU1JJ&amp;t=3021s</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=941facbcd3d1" width="1" height="1" alt=""><hr><p><a href="https://medium.com/devseed/cog-talk-part-1-whats-new-941facbcd3d1">COG Talk — Part 1: What’s new?</a> was originally published in <a href="https://medium.com/devseed">Development Seed</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Do you really want people using your data ?]]></title>
            <link>https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f?source=rss-754c34eee3ad------2</link>
            <guid isPermaLink="false">https://medium.com/p/ec94cd94dc3f</guid>
            <category><![CDATA[open-data]]></category>
            <category><![CDATA[cogeo]]></category>
            <category><![CDATA[data-access]]></category>
            <category><![CDATA[cloud-computing]]></category>
            <category><![CDATA[raster]]></category>
            <dc:creator><![CDATA[Vincent Sarago]]></dc:creator>
            <pubDate>Mon, 19 Nov 2018 20:29:02 GMT</pubDate>
            <atom:updated>2019-02-05T16:20:33.449Z</atom:updated>
            <content:encoded><![CDATA[<h3>Do you really want people using your data ?</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wlWVEaSSp4STlY_8oaYFhg.jpeg" /></figure><p><em>Note: This post was originally called: `The Ultimate data format`</em></p><p>In this post we will focus on Cloud Optimized GeoTIFF and other formats used by public dataset (AWS pds, Digitalglobe Opendata, …). This post is mostly a brain dump of some though and knowledge I needed to share since the remotepixel&#39;s huge AWS bill happened last august. I hope this will give some clue or at least some idea to people who want to open/share raster dataset.</p><p>First, can you guess the difference between both images 👇</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*oWE0Vf9p4cMaa_Ef4pYf3Q.jpeg" /></figure><p>Both are the same file, on the left is the raw data from <a href="https://www.digitalglobe.com/opendata/all-events">Digitalglobe Open Data Program</a> and on the right side is the same file transformed to COG using <a href="https://github.com/mapbox/rio-cogeo">rio-cogeo</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/990/1*KQgfrCJLbbypUH6KPJ0Mrg.jpeg" /></figure><p><em>PS: I ❤️ DigitalGlobe and the goal of this whole introduction is not to blame them for the format, we can’t blame them to give us free data 😃 especially for disaster responses (</em><a href="https://www.digitalglobe.com/opendata/all-events">Digitalglobe Open Data Program</a>)<em>.</em></p><p>Well, files are almost the same, except COGs have internal overviews and internal tilling. The biggest difference is the storage size: <strong>1.5 Gb vs 69 Mb</strong> 😱</p><p>So how to produce a file which is 22x lighter ? Well the answer is <strong>compression</strong>! I won&#39;t go too deep into compression itself but you should check this awesome article by <strong>Koko Alberti: </strong><a href="https://kokoalberti.com/articles/geotiff-compression-optimization-guide/">https://kokoalberti.com/articles/geotiff-compression-optimization-guide/</a>.</p><p>For the file above we used <strong>WEBP</strong> compression, which has just been added to GDAL libtiff by <strong>Norman Barker</strong> and <strong>Even Roault</strong> in <a href="https://github.com/OSGeo/gdal/pull/704">#704</a>. &quot;<em>WebP is a modern </em><strong><em>image format</em></strong><em> that provides superior </em><strong><em>lossless and lossy</em></strong><em> compression for images on the web</em>&quot; (<a href="https://developers.google.com/speed/webp/">source</a>), develloped by Google. This compression schema claims to be better then JPEG (lossy) and PNG (lossless):</p><blockquote>WebP lossless images are <a href="https://developers.google.com/speed/webp/docs/webp_lossless_alpha_study#results">26% smaller</a> in size compared to PNGs. WebP lossy images are <a href="https://developers.google.com/speed/webp/docs/webp_study">25–34% smaller</a> than comparable JPEG images at equivalent <a href="https://en.wikipedia.org/wiki/Structural_similarity">SSIM</a> quality index.</blockquote><p>The WEBP format is supported by most browsers (except Safari) and image software… and now inside GeoTIFF 🎉 (supported in QGIS if build against GDAL 2.4.0 or HEAD).</p><p>Can you spot the difference 👇 ?</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*p4WL03dKbvOOG9a87YpbLQ.gif" /><figcaption>WebP vs Raw (it&#39;s a GIF)</figcaption></figure><p>👆Look closely, this is a GIF that shows the difference between Raw and WEBP. It&#39;s really hard to spot but WEBP compression introduces artifacts (when using default parameters) which should be acceptable at least for visualisation.</p><p>Alright enought with WebP. (Note: JPEG compression would have saved a lot of space too).</p><h3>AWS Public Dataset: PDS</h3><p>Let&#39;s see what are the formats used by three major AWS Public Dataset: CBERS-4, Landsat-8, Sentinel-2</p><p><em>Note: Most of the following numbers comes from </em><a href="https://github.com/vincentsarago/awspds-benchmark"><em>https://github.com/vincentsarago/awspds-benchmark</em></a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hW0i8kTWKS6VqR0Js3R7tg.jpeg" /><figcaption>CBERS-4, Landsat-8 and Sentinel-2 data formats.</figcaption></figure><p>Those three dataset have their own 👍/👎(e.g Landsat and CBERS are both GeoTIFF but Landsat uses external overviews). The biggest difference is for Sentinel-2 which use JPEG2000 compression.</p><h4>Format matters</h4><p>For years I&#39;ve heard multiple times that users shouldn&#39;t need to download the data but should use/create services to access them via the cloud. While this is a good idea (who wants to store Gb of data on their own laptop), the data format has a huge impact on processing/access cost which can result in thousand $ bill.</p><p><strong>RemotePixel use case</strong></p><p>If you see this post it might be because you also know my side project <a href="http://remotepixel.ca"><strong>RemotePixel.ca</strong></a> and maybe you remember my last post:</p><h3>Remote Pixel on Twitter</h3><p>Dear Friends, https://t.co/aqc4TNnJoF is finally back online. Here is what happened https://t.co/r4vzl5P3Gy</p><p>Let&#39;s see how my August AWS bill is related to Sentinel-2 data format. The bill was mostly due to GET/LIST requests which are billed to the AWS users since the <em>sentinel-2-l1c</em> bucket is in `requester-pays`mode.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*yMHvqRRnQ1o4SIEQwt5p0w.png" /><figcaption>Remotepixel&#39;s AWS cost in August 2018.</figcaption></figure><p>The LIST requests (2 600$) were due remotepixel <a href="https://remotepixel.ca/blog/searchapi_20171211.html">simple search api</a> which now seems not simple but totally dumb.</p><p>The other part of the bill was due to GET requests (nearly 1 Billion Get requests 😱).</p><p>I believe most of the sentinel-2 data requests came from <a href="http://viewer.remotepixel.ca"><strong>Remotepixel viewer</strong></a><strong> </strong>which was at the time a really simple AWS PDS viewer (now only Landsat and CBERS-4 data are available on the viewer). So basically, users were able to visualized Sentinel-2 data using a <a href="https://github.com/RemotePixel/remotepixel-tiler">tile server based on AWS Lambda</a>. The idea behind the tile server is 1 tile = 1 Lambda call, but when checking the number of AWS Lambda calls there was something odd. There was only 1 Million calls … responsible for 1 Billion GET calls 🤔. How this is possible ? lets check how many GET requests GDAL does when reading a file over the internet 👇</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cZNyH7c6C_Ru5hX9EZM0Ew.jpeg" /><figcaption>AWS PDS benchmark: <a href="https://github.com/vincentsarago/awspds-benchmark">https://github.com/vincentsarago/awspds-benchmark</a></figcaption></figure><p>We have our anwser 🎉 😱 😢, getting a mercator tile for Sentinel-2 data needs <strong>&gt; 100 http call (GET) per band</strong>… (😱 again) while for Landsat its around 5 calls.</p><p><strong>A better data format ?</strong></p><p>Well again, there is no ultimate data format, but let&#39;s see how thoses three PDS would behave if translating them to proper COG (512x512 internal tilling, internal overview, high level Deflate compression) using <a href="https://github.com/vincentsarago/awspds-benchmark#what-about-cogs">rio-cogeo</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/948/1*XthCMgjGgvmJx3EMCk42YA.png" /></figure><p>Less HTTP calls and less data transfer 🎉 (Landsat and CBERS dataset are also lighter).</p><h4><strong>Size / computing time / access cost</strong></h4><p>At the end of the day, people mention size being the key point to choose the data format. This is (I think) why we have a Sentinel-2 archive in JPEG2000 format and when I see my august AWS bill, this make me sad. JPEG2000 is not a cloud friendly format, even with the most advanced driver (KDU) you need to transfer twice more data (800kb vs 1.3Mb) and do almost 25 times more GET requests (3 vs 74) to do partial reading over the internet. But yes JPEG2000 weights only 95Mb while the proper COG version is around 180Mb.</p><p><strong>What about processing time ?</strong></p><p>COGs are made to be accessed partially over the internet, so you don&#39;t need to download the whole data (just get what you need). Basically you download less data so your process is faster.</p><p>On the other end, JPEG2000 are lighter, so you can download the whole data and process the whole file… hopefully we now have OpenJPEG (a free and open source driver to read JPEG2000 shipped in GDAL by default) which is performant enough to extract the data locally, so the processing time should be acceptable but again you&#39;ll need to download the whole file.</p><p>If you chose to read the JPEG2000 over the internet (as we saw earlier) this will result in a lot of GET calls and a lot of useless data transfer.</p><p><strong>$ facts</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/936/1*IWUU_KWQE72ivU0e0NTT_A.png" /><figcaption>AWS S3 pricing: <a href="https://aws.amazon.com/s3/pricing/">https://aws.amazon.com/s3/pricing/</a></figcaption></figure><p>Based on ☝️let&#39;s write a scenario of a web viewer using AWS Lambda.</p><h4><strong>JPEG2000</strong></h4><ul><li>Size: 25 Tb</li><li>Storage: 25 Tb * 1000 * 0.023 = <strong>575 $</strong> / month</li><li>1M tile requests / month</li><li>Data access: (1M * 110 (GET requests) / 1000) * 0.004 = <strong>440 $ *</strong></li><li>Processing time (1536 Mb AWS Lambda): (3 second * 1M * 1536 / 1000) * 0.00001667 $ =<strong> 76,81 $</strong> <strong>**</strong></li></ul><p>*Using Kakadu driver you might reduce this by half (~60 GET requests) but you have to pay couple thousand $ to get the license)</p><p><em>**AWS lambda cost 0.00001667 per GigaSecond | considering 3 sec per tile is quite optimistic</em></p><p><strong>COST: 575 + 440 + 76.81 = 1091.81 $ </strong>(440 + 76.81 = 516.81 $ for processing)</p><h4><strong>COG (deflate)</strong></h4><ul><li>Size: 50 Tb</li><li>Storage: 50 Tb * 1000 * 0.023 = 1150 $ / month</li><li>Data access: (1M * 5 (GET requests) / 1000) * 0.004 = 2<strong>0 $</strong></li><li>Processing time (1536 Mb AWS Lambda): (1 second * 1M * 1536 / 1000) * 0.00001667 $ =<strong> 25.60 $</strong> <strong>*</strong></li></ul><p>*Reading a tile from a COG is at least 3 times faster than for JPEG2000</p><p><strong>COST: 1150 + 20 + 25.60 = 1195.60 $ </strong>(20 + 25.60 = 45.60 $ for processing)</p><p>Those number are made from hypothesis but I believe they are close to what&#39;s going in in real world between JPEG2000 and COG. Basically if you just care about storage cost JPEG2000 is your best option, but at the end someone will have to pay $$$ to access/process the data. I believe if you store the data and provide services around, COG should be a better long term solution.</p><h3>The Ultimate data format ?</h3><p>As we saw in the intro, image formats (compression) can have a huge impact on data accessibility and thus usage (easier to download a 70Mb file than a 1.5Gb one).</p><p>Short answer to the question: there is not such thing as an <strong>Ultimate</strong> data format, in the real world there are plenty of good data formats. At the end of the day it rely and what you want the user to do.</p><p>Here are the question you should answer before choosing a format.</p><ul><li>Do you want users to visualise the data online ?</li><li>Do you want users to download the data to run processes ?:</li><li>Do you want users to create services on the cloud ?</li><li>Do you care about compression artefacts ?</li><li>What is your data type (Byte|Float|Int) ?</li><li>Do you provide processing services ?</li></ul><p><strong>Unsolicited 2cents advise:</strong></p><ul><li>Use <strong>WEBP compression </strong>for RGB or RGBA dataset (there is a lossless option). This is the best option if you are looking for space saving, but sadly is only compatible with GDAL 2.4.0 . <strong>JPEG compression </strong>might be a safer choice.</li><li>use <strong>Deflate compression</strong> with <strong>PREDICTOR=2</strong> and<strong> ZLEVEL=9 </strong>options for non-Byte or non RGB datasets.</li><li>Use internal overviews <strong>any time</strong>.</li><li>Use 256 or 512 internal block size (256 for deflate and 512 for WEBP/JPEG compressed datasets ?)</li><li>Prioritize internal bitmask instead of nodata value. And maybe give $ to someone to fix the small `bug` in GDAL which puts bitmask at the end of COGs.</li></ul><p><strong>More reads:</strong></p><ul><li>2018 SatSummit workshop about COGs (<a href="http://bit.ly/satsummit_cogeokeynote">link</a>)</li><li>What’s wrong with open infrastructure for Remote Sensing geodata? by <a href="https://medium.com/@vrielink/whats-wrong-with-open-infrastructure-for-remote-sensing-geodata-af55c91e0f03">@vrielink</a> (<a href="https://medium.com/@vrielink/whats-wrong-with-open-infrastructure-for-remote-sensing-geodata-af55c91e0f03">link</a>)</li><li><a href="http://www.cogeo.org">http://www.cogeo.org</a></li><li><a href="https://github.com/cogeotiff/cog-spec/blob/master/spec.md">COG specification</a></li><li>GDAL GeoTIFF format options (<a href="https://www.gdal.org/frmt_gtiff.html">link</a>)</li><li><a href="https://blog.hexagongeospatial.com/jpeg2000-quirks/">https://blog.hexagongeospatial.com/jpeg2000-quirks/</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ec94cd94dc3f" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>