Apache Sedona’s cover photo
Apache Sedona

Apache Sedona

Technology, Information and Internet

San Francicso, California 4,266 followers

Apache Sedona is a cluster computing system for processing large-scale spatial data (https://github.com/apache/sedona)

About us

Apache Sedona™ is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark, Apache Flink, and Snowflake, with a set of out-of-the-box distributed Spatial Datasets Github: https://github.com/apache/sedona

Website
https://github.com/apache/sedona
Industry
Technology, Information and Internet
Company size
51-200 employees
Headquarters
San Francicso, California
Type
Nonprofit
Founded
2018
Specialties
big data, geospatial, gis, and cluster computing

Locations

Updates

  • Apache Sedona reposted this

    Today I'm releasing the first experimental build of CereusDB, a WebAssembly version of SedonaDB that runs in browsers: https://lnkd.in/daCDdAMv Recently I dug a bit deeper into Apache Sedona when I saw that they also offer SedonaDB, "an open-source single-node analytical database engine with geospatial as a first-class citizen". SedonaDB is built in Rust upon DataFusion, which theoretically enables Wasm builds via Emscripten. It relies heavily on the usual library suspects like GEOS, PROJ, S2 and GDAL, which makes compilation for Wasm targets a little more complicated. I used Codex with GPT-5.4 high/xhigh to iterate on an implementation, and after hitting some dead ends, it finally succeeded. Including all features wouldn't be a small library, so I decided to have different build which contain functional subsets (uncompressed / gzipped / brotli size in MB in the brackets): - minimal: Core + geo + GEOS + spatial joins / ST_KNN (21.3 / 6.2 / 4.0) - standard: minimal + PROJ / ST_Transform (40.6 / 10.8 / 6.0) - global: standard + opt-in S2 geography kernels (42.9 / 11.8 / 6.7) - full: global + GDAL-backed raster ingestion and the full RS_* functions (49.5 / 14.4 / 8.5) With the right headers, it can easily be cached in the browsers so it's only downloaded once and not on every page load. Let me know what you think, happy for any feedback!

    • No alternative text description for this image
  • Apache Sedona reposted this

    We're chuffed to announce Apache SedonaDB 0.3.0! This release features a rewritten join that supports larger-than-memory spatial/KNN joins courtesy of Kristin Cowalcijk, 36 new functions, item-level CRSes, parameterized SQL queries, GDAL/pyogrio-based writes for FlatGeoBuf, Shapefile, and GeoPackage, GDAL/sf based reads in R, and the beginnings of an R DataFrame API. Find us in Python or R to get started or check out the release post (in the comments).

    • No alternative text description for this image
  • View organization page for Apache Sedona

    4,266 followers

    We’re back with our monthly office hour session, and there’s a lot to share this time 👇 What’s new in Apache Sedona 1.9.0 • Cloud Optimized GeoTIFF writer support • New Bing tile function • More comprehensive CRS format support in ST_Transform • Support for Apache Spark 4.1 SedonaDB 0.3 release • Out-of-core spatial join • Item-level CRS support If you’re building or scaling geospatial workloads, join us to get the latest updates and bring your questions. 📆 Tuesday, March 3 at 8AM PT 🔗 Save the date: https://lnkd.in/gygDqVb8

    • No alternative text description for this image
  • Apache Sedona reposted this

    The Wherobots team in collaboration with The Apache Software Foundation parquet, Apache Iceberg and Apache Sedona communities led the inclusion of spatial type (specifically GEOMETRY and GEOGRAPHY) as first-class logical types within the Apache Parquet columnar storage format. Spatial data types are quite essential to representing data collected from the real physical world, such as vehicle / robot movement, weather forecast, farmland, buildings details, and generally anything happening on the surface of the earth. By moving away from treating spatial data as opaque binary blobs, Parquet now enables query engines to perform efficient data skipping through the use of spatial statistics, such as bounding boxes stored at the row-group level. This integration supports standardized Well-Known Binary (WKB) encoding and includes built-in metadata for Coordinate Reference Systems (CRS), ensuring seamless interoperability across different data processing engines and lakehouse architectures like Apache Iceberg. Ultimately, this update eliminates the need for specialized spatial sidecar files, allowing spatial workloads to benefit from the same performance optimizations and scalability as standard tabular data in the lakehouse architecture. Check out the article on the official Apache Parquet blog: https://lnkd.in/g-edwsKu

  • Apache Sedona reposted this

    How Apache Sedona Solved Big Data's Hardest Problem Large Language Models can write poetry and debug code, but they still don't understand the fundamental physics of the real world. Ask an AI to find the "nearest restaurant" to a specific coordinate, and it struggles because it lacks Spatial Intelligence. In this episode, we sit down with Jia Yu, the co-creator of Apache Sedona and co-founder of Wherobots, to discuss why geospatial data breaks standard big data engines and how he built the solution that now powers over 2 million downloads a month. We trace the 10-year journey from a PhD research paper to a top-level Apache project, diving into the deep technical challenges of distributed computing. Jia explains why spatial data requires a completely different architecture than standard text or numbers and how the industry is finally moving toward a "Spatial Lakehouse" to break down data silos.

    How Apache Sedona Solved Big Data's Hardest Problem

    How Apache Sedona Solved Big Data's Hardest Problem

    www.linkedin.com

  • Apache Sedona reposted this

    Cloud-Native Geospatial Analytics with Apache Sedona is officially complete! This in-depth guide shows how to process and analyze large-scale geospatial data within the modern data ecosystem. Inside the book: • Practical solutions for working with massive geospatial datasets • How to build scalable spatial pipelines with Sedona and Spatial SQL • Integrating Python tools, applying machine learning, and using modern formats like GeoParquet Thanks to O'Reilly for partnering with us to bring this resource to the geospatial community, whether you are just getting started or already deep into spatial analytics. 👀 Scroll through for a sneak peek of the chapters, and grab your copy here: https://bit.ly/4gkm4AU 📧 If you downloaded earlier chapters, check your email for the latest version.

  • Apache Sedona reposted this

    I'm a bit late to the #ModernGeospatial party, but I recently got around to kicking the tires of some relative newcomers to the geospatial scene- namely DuckDB, SedonaDB, and Geoparquet. I wrote two scripts that you can access at the link below that do the same thing- download Cook County, IL building footprints as a GeoPackage from Overture Maps. One uses DuckDB, the other uses SedonaDB. What excites me about these tools is their flexibility and speed. We used to have to download massive files, then unzip them, then transform them in a desktop GIS program. No more. Write some SQL, and get exactly what you want in one step- quickly too. Shoutout to Matt Forrest, Qiusheng Wu, and many others for so consistently putting out such great content on all things modern geospatial. https://lnkd.in/gK7B_eT4

Similar pages

Browse jobs