pcxarray.processing module
- pcxarray.processing.lazy_merge_arrays(arrays: List[DataArray], method: Literal['last', 'first', 'min', 'max', 'mean', 'sum', 'median'] = 'last', geometry: BaseGeometry | None = None, crs: CRS | str | int | None = None, resolution: float | int | None = None, resampling_method: Resampling | str = 'nearest', nodata: float | None = None) DataArray
Merge multiple xarray DataArrays lazily.
Reprojects all input arrays to a common geobox and then merges them using the specified method. If geometry, CRS, or resolution are not provided, they are automatically determined from the input arrays using the union of geometries, first CRS found, and minimum resolution respectively. Unlike rioxarray.rio.reproject, this function does not trigger a computation of the dask graph.
- Parameters:
arrays (list of xarray.DataArray) – List of georeferenced DataArrays to merge from rioxarray.
method ({'last', 'first', 'min', 'max', 'mean', 'sum', 'median'}, default='last') – Method for merging overlapping pixels.
geometry (shapely.geometry.base.BaseGeometry, optional) – Target geometry for the merged array. If None, computed as the unary union of all input array bounds.
crs (pyproj.CRS, str, or int, optional) – Target coordinate reference system. If None, uses the CRS from the first input array.
resolution (float or int, optional) – Target pixel resolution in CRS units. If None, uses the minimum resolution from all input arrays.
resampling_method (rasterio.enums.Resampling or str, default 'nearest') – Resampling method for reprojection. Can be Resampling enum or string name (e.g., ‘nearest’, ‘bilinear’, ‘cubic’, etc.).
nodata (float, optional) – NoData value to use for masking. If None, uses the NoData value from the first input array.
- Returns:
Merged DataArray reprojected to the common geobox with spatial coordinates and CRS information preserved.
- Return type:
xarray.DataArray
- Raises:
ValueError – If an unknown merge method is specified or if geometry and CRS do not produce a valid geobox.
UserWarning – If reprojection fails for any input array (the array is skipped and processing continues).
Notes
- TODO: At some point, consider documenting edge cases around nodata handling.
This seems to be the most fragile part of this package, though it’s difficult to really nail down all the edge cases.
- pcxarray.processing.prepare_data(items_gdf: GeoDataFrame, geometry: BaseGeometry | None = None, crs: CRS | str | int | None = None, bands: List[str | int] | None = None, target_resolution: float | int | None = None, all_touched: bool = False, merge_method: Literal['last', 'first', 'min', 'max', 'mean', 'sum', 'median'] = 'last', resampling_method: Resampling | str = 'nearest', chunks: str | Dict[str, int] | None = None, enable_time_dim: bool = False, time_col: str | None = 'properties.datetime', time_format_str: str | None = None, max_workers: int = 1, enable_progress_bar: bool = False, **rioxarray_kwargs: Dict[str, Any] | None) DataArray
Prepare and merge raster data from Planetary Computer query results.
Selects the minimum set of STAC items needed to cover a given geometry, reads and mosaics raster tiles, and handles reprojection, resampling, and merging. Items are selected using a greedy algorithm to minimize the number of tiles while ensuring complete coverage. When a single item fully covers the geometry, no merging is performed for efficiency.
- Parameters:
items_gdf (geopandas.GeoDataFrame) – GeoDataFrame of STAC items to process.
geometry (shapely.geometry.base.BaseGeometry, optional) – Area of interest geometry in the target CRS. If None, uses the union of all geometries in items_gdf.
crs (pyproj.CRS, str or int, optional) – Coordinate reference system for the output. If None, uses the CRS from items_gdf.
bands (list of str or int, optional) – List of band names or indices to select; if None, all valid bands are loaded.
target_resolution (float or int, optional) – Target pixel size for the output raster in units of the CRS. If None, uses the native resolution of the first item.
all_touched (bool, default=False) – Whether to include all pixels touched by the geometry during final clipping.
merge_method ({'last', 'first', 'min', 'max', 'mean', 'sum', 'median'}, default='last') – Method to use when merging overlapping arrays.
resampling_method (rasterio.enums.Resampling or str, default='nearest') – Resampling method to use for reprojection.
chunks (str or dict, optional) – Chunking options for dask/xarray.
enable_time_dim (bool, default=False) – If True, add a time dimension to the output. All selected items must have the same datetime value.
time_col (str, default='properties.datetime') – Column name for datetime in items_gdf.
time_format_str (str, optional) – Format string for parsing datetime values.
max_workers (int, default=1) – Number of parallel workers to use (-1 uses all available CPUs).
enable_progress_bar (bool, default=False) – Whether to display a progress bar during tile merging.
**rioxarray_kwargs (dict, optional) – Additional keyword arguments to pass to rioxarray.open_rasterio.
- Returns:
The prepared raster data as an xarray DataArray, optionally with a time dimension.
- Return type:
xarray.DataArray
- Raises:
ValueError – If enable_time_dim is True but time_col is not found in items_gdf, or if selected items have different datetime values when enable_time_dim is True.
- pcxarray.processing.prepare_timeseries(items_gdf: GeoDataFrame, geometry: BaseGeometry, crs: CRS | str | int = 4326, bands: List[str | int] | None = None, target_resolution: float | None = None, all_touched: bool = False, merge_method: Literal['last', 'first', 'min', 'max', 'mean', 'sum', 'median'] = 'last', resampling_method: Resampling | str = 'nearest', chunks: Dict[str, int] | None = None, time_col: str = 'properties.datetime', time_format_str: str | None = None, ignore_time_component: bool = True, max_workers: int = 1, enable_progress_bar: bool = True, **rioxarray_kwargs: Dict[str, Any] | None) DataArray
Prepare a time series of raster data from a GeoDataFrame of STAC items.
Groups items by time, reads and merges rasters for each timestep, and concatenates them into a single DataArray along the time dimension. Supports parallel processing and chunking for large datasets.
- Parameters:
items_gdf (geopandas.GeoDataFrame) – GeoDataFrame of STAC items to process.
geometry (shapely.geometry.base.BaseGeometry) – Area of interest geometry in the target CRS.
crs (pyproj.CRS, str or int, default=4326) – Coordinate reference system for the output.
bands (list of str or int, optional) – List of band names or indices to select; if None, all valid bands are loaded.
target_resolution (float or int, optional) – Target pixel size for the output raster.
all_touched (bool, default=False) – Whether to include all pixels touched by the geometry.
merge_method ({'last', 'first', 'min', 'max', 'mean', 'sum', 'median'}, default='last') – Method to use when merging arrays.
resampling_method (rasterio.enums.Resampling or str, default='nearest') – Resampling method to use for reprojection.
chunks (dict, optional) – Chunking options for dask/xarray.
time_col (str, default='properties.datetime') – Column name for datetime in items_gdf.
time_format_str (str, optional) – Format string for parsing datetime values.
ignore_time_component (bool, default=True) – If True, ignore the time component and only use the date.
max_workers (int, default=1) – Number of parallel workers to use (-1 uses all available CPUs).
enable_progress_bar (bool, default=True) – Whether to display a progress bar during processing.
**rioxarray_kwargs (dict, optional) – Additional keyword arguments to pass to rioxarray.open_rasterio.
- Returns:
The prepared time series raster data as an xarray DataArray with a time dimension.
- Return type:
xarray.DataArray
- pcxarray.processing.query_and_prepare(collections: str | List[str], geometry: BaseGeometry, crs: CRS | str | int = 4326, datetime: str = '2000-01-01/2025-01-01', bands: List[str | int] | None = None, target_resolution: float | None = None, all_touched: bool = False, merge_method: Literal['last', 'first', 'min', 'max', 'mean', 'sum', 'median'] = 'last', resampling_method: Resampling | str = 'nearest', chunks: str | Dict[str, int] | None = None, enable_time_dim: bool = False, time_col: str | None = 'properties.datetime', time_format_str: str | None = None, max_workers: int = 1, enable_progress_bar: bool = False, return_items: bool = False, query_kwargs: Dict[str, Any] | None = None, rioxarray_kwargs: Dict[str, Any] | None = None) DataArray | tuple
Query the Planetary Computer and prepare raster data in a single step.
Combines a STAC API query and raster preparation pipeline. It queries the Planetary Computer for items matching the given geometry, date range, and collections, then reads, merges, and processes the raster data. Optionally returns the items GeoDataFrame.
- Parameters:
collections (str or list of str) – Collection(s) to search within the Planetary Computer catalog.
geometry (shapely.geometry.base.BaseGeometry) – Area of interest geometry.
crs (pyproj.CRS, str or int, default=4326) – Coordinate reference system for the input/output.
datetime (str, default='2000-01-01/2025-01-01') – Date/time range for the query in ISO 8601 format or interval.
bands (list of str or int, optional) – List of band names or indices to select; if None, all valid bands are loaded.
target_resolution (float or int, optional) – Target pixel size for the output raster in units of the CRS.
all_touched (bool, default=False) – Whether to include all pixels touched by the geometry during clipping.
merge_method ({'last', 'first', 'min', 'max', 'mean', 'sum', 'median'}, default='last') – Method to use when merging overlapping arrays.
resampling_method (rasterio.enums.Resampling or str, default='nearest') – Resampling method to use for reprojection.
chunks (str or dict, optional) – Chunking options for dask/xarray.
enable_time_dim (bool, default=False) – If True, add a time dimension to the output.
time_col (str, default='properties.datetime') – Column name for datetime in items_gdf.
time_format_str (str, optional) – Format string for parsing datetime values.
max_workers (int, default=1) – Number of parallel workers to use (-1 uses all available CPUs).
enable_progress_bar (bool, default=False) – Whether to display a progress bar during merging.
return_items (bool, default=False) – If True, also return the items GeoDataFrame.
query_kwargs (dict, optional) – Additional query parameters to pass to the STAC search.
rioxarray_kwargs (dict, optional) – Additional keyword arguments to pass to rioxarray.open_rasterio.
- Returns:
The prepared raster data. If return_items is True, returns a tuple of (DataArray, GeoDataFrame).
- Return type:
xarray.DataArray or tuple
Notes
This is a convenience function that combines pc_query() and prepare_data(). For more control over the process, use those functions separately.