NAIP

Overview

pcxarray is a small Python package that provides a simple interface to interact with the Microsoft Planetary Computer Data Catalog. With this package, you can create queries using shapely geometries and retrieve results in GeoDataFrames such that you can examine item metadata and select assets for download, as well as automatically load and preprocess data into xarray DataArray.

This example demonstrates how to use pcxarray to query NAIP imagery and load it into an xarray DataArray. The state of Mississippi is used as an example, but you can modify the code to query other states or regions.

[1]:
from pcxarray import pc_query, prepare_data, query_and_prepare
from pcxarray.utils import create_grid, load_census_shapefile

Creating polygon grids

You can use the pcxarray package to create polygon grids to subdivide a larger area into smaller sections. This is useful for processing large datasets in manageable chunks. The following code creates a grid of polygons over the state of Mississippi:

[2]:
states_gdf = load_census_shapefile(verify=False) # using verify=False due to local SSL issue, do not use in production
ms_gdf = states_gdf[states_gdf['STUSPS'] == 'MS']
ms_gdf = ms_gdf.to_crs(3814) # Mississippi Transveres Mercator (https://epsg.io/3814)
display(ms_gdf)
display(ms_gdf.iloc[0].geometry)
REGION DIVISION STATEFP STATENS GEOID GEOIDFQ STUSPS NAME LSAD MTFCC FUNCSTAT ALAND AWATER INTPTLAT INTPTLON geometry
44 3 6 28 01779790 28 0400000US28 MS Mississippi 00 G4000 A 121534116434 3914302814 +32.6864714 -089.6561377 POLYGON ((624275.848 1270567.752, 624241.051 1...
../_images/examples_naip_3_1.svg
[3]:
grid_gdf = create_grid(
    ms_gdf.iloc[0].geometry,
    crs=ms_gdf.crs,
    cell_size=1000,  # each cell is 1km x 1km
    clip_to_polygon=False, # set to True to clip cells on the polygon boundary to the polygon itself
)
display(grid_gdf)

selected_geom = grid_gdf.iloc[10000].geometry
display(selected_geom)
geometry
0 POLYGON ((575558.025 1038674.182, 576558.025 1...
1 POLYGON ((576558.025 1038674.182, 577558.025 1...
2 POLYGON ((577558.025 1038674.182, 578558.025 1...
3 POLYGON ((578558.025 1038674.182, 579558.025 1...
4 POLYGON ((579558.025 1038674.182, 580558.025 1...
... ...
126807 POLYGON ((636558.025 1577674.182, 637558.025 1...
126808 POLYGON ((637558.025 1577674.182, 638558.025 1...
126809 POLYGON ((638558.025 1577674.182, 639558.025 1...
126810 POLYGON ((639558.025 1577674.182, 640558.025 1...
126811 POLYGON ((640558.025 1577674.182, 641558.025 1...

126812 rows × 1 columns

../_images/examples_naip_4_1.svg

Querying Planetary Computer NAIP data

Now, we can use the pcxarray package to query NAIP data for the state of Mississippi. The GeoDataFrame created in the previous step is used to define the area of interest. For the purposes of this notebook, we will only query NAIP data for one polygon in the grid.

[4]:
# now, query files
items_gdf = pc_query(
    collections='naip',
    geometry=selected_geom,
    crs=grid_gdf.crs,
    datetime='2023'
)

# Display the queried items to get a broad overview of what is contained
# NOTE: STAC items contain a lot of metadata, and most of it is not relevant for
# downstream processing. The end user is expected to filter the items based on their
# needs.
display(items_gdf.columns)
display(items_gdf)
Index(['index', 'type', 'stac_version', 'stac_extensions', 'id', 'bbox',
       'properties.gsd', 'properties.datetime', 'properties.naip:year',
       'properties.proj:bbox', 'properties.providers', 'properties.naip:state',
       'properties.proj:shape', 'properties.proj:centroid.lat',
       'properties.proj:centroid.lon', 'properties.proj:transform',
       'properties.proj:code', 'links', 'assets.image.href',
       'assets.image.type', 'assets.image.title', 'assets.image.eo:bands',
       'assets.image.roles', 'assets.thumbnail.href', 'assets.thumbnail.type',
       'assets.thumbnail.title', 'assets.thumbnail.roles',
       'assets.tilejson.href', 'assets.tilejson.type', 'assets.tilejson.title',
       'assets.tilejson.roles', 'assets.rendered_preview.href',
       'assets.rendered_preview.type', 'assets.rendered_preview.title',
       'assets.rendered_preview.rel', 'assets.rendered_preview.roles',
       'collection', 'geometry'],
      dtype='object')
index type stac_version stac_extensions id bbox properties.gsd properties.datetime properties.naip:year properties.proj:bbox ... assets.tilejson.type assets.tilejson.title assets.tilejson.roles assets.rendered_preview.href assets.rendered_preview.type assets.rendered_preview.title assets.rendered_preview.rel assets.rendered_preview.roles collection geometry
0 0 Feature 1.1.0 [https://stac-extensions.github.io/eo/v1.1.0/s... ms_m_3008915_nw_16_060_20230701_20240103 [-89.253589, 30.809596, -89.183951, 30.877901] 0.6 2023-07-01 16:00:00 2023 [284556.0, 3410670.0, 291072.0, 3418116.0] ... application/json TileJSON with default rendering [tiles] https://planetarycomputer.microsoft.com/api/da... image/png Rendered preview preview [overview] naip POLYGON ((554157.957 1112863.988, 553974.736 1...
1 1 Feature 1.1.0 [https://stac-extensions.github.io/eo/v1.1.0/s... ms_m_3008915_ne_16_060_20230701_20240103 [-89.191032, 30.809639, -89.121497, 30.877913] 0.6 2023-07-01 16:00:00 2023 [290538.0, 3410556.0, 297048.0, 3418002.0] ... application/json TileJSON with default rendering [tiles] https://planetarycomputer.microsoft.com/api/da... image/png Rendered preview preview [overview] naip POLYGON ((560133.48 1112897.003, 559950.208 11...
2 2 Feature 1.1.0 [https://stac-extensions.github.io/eo/v1.1.0/s... ms_m_3008907_se_16_060_20230701_20240103 [-89.191074, 30.872095, -89.121492, 30.940371] 0.6 2023-07-01 16:00:00 2023 [290670.0, 3417480.0, 297180.0, 3424926.0] ... application/json TileJSON with default rendering [tiles] https://planetarycomputer.microsoft.com/api/da... image/png Rendered preview preview [overview] naip POLYGON ((560095.021 1119820.568, 559911.477 1...

3 rows × 38 columns

Downloading and loading NAIP data

Once we have the query results, we can download the NAIP data and load it into an xarray dataarray using the prepare_data function. The returned imagery is in the form of an xarray DataArray, which contains the NAIP imagery data for the specified geometry in the given coordinate reference system. If the geometry is not completely contained by a single NAIP tile, a mosaic will automatically be created. target_resolution can be used to specify the resolution of the output data in the units of the CRS.

[5]:
imagery = prepare_data(
    geometry=selected_geom,
    crs=grid_gdf.crs,
    items_gdf=items_gdf,
    target_resolution=1.0,
    bands=[4, 1, 2], # NIR, Red, Green
    resampling_method='bilinear',
) / 255.0 # Normalize the pixel values to [0, 1] range

imagery.plot.imshow(size=5, aspect=1)
[5]:
<matplotlib.image.AxesImage at 0x134b48ce0>
../_images/examples_naip_8_1.png

Putting it all together

In most cases, you can combine the query/load steps into a single function call.

[6]:
imagery = query_and_prepare(
    collections='naip',
    geometry=selected_geom,
    crs=grid_gdf.crs,
    datetime='2023',
    target_resolution=1.0,
    bands=[4, 1, 2],
    resampling_method='bilinear',
) / 255.0

imagery.plot.imshow(size=5, aspect=1)
[6]:
<matplotlib.image.AxesImage at 0x134bc8170>
../_images/examples_naip_10_1.png