raster_tools.zonal.zonal_stats#

raster_tools.zonal.zonal_stats(features, data_raster, stats, features_field=None, wide_format=True, handle_overlap=False)[source]#

Apply stat functions to a raster based on a set of features.

Parameters
  • features (str, Vector, Raster) – A Vector or path string pointing to a vector file or a categorical Raster. The vector features are used like cookie cutters to pull data from the data_raster bands. If features is a Raster, it must be an int dtype and have only one band.

  • data_raster (Raster, str) – A Raster or path string pointing to a raster file. The data raster to pull data from and apply the stat functions to.

  • stats (str, list of str) –

    A single string or list of strings corresponding to stat functions. These functions will be applied to the raster data for each of the features in features. Valid string values:

    ’asm’

    Angular second moment. Applies -sum(P(g)**2) where P(g) gives the probability of g within the neighborhood.

    ’count’

    Count valid cells.

    ’entropy’

    Calculates the entropy. Applies -sum(P(g) * log(P(g))). See ‘asm’ above.

    ’max’

    Find the maximum value.

    ’mean’

    Calculate the mean.

    ’median’

    Calculate the median value.

    ’min’

    Find the minimum value.

    ’mode’

    Compute the statistical mode of the data. In the case of a tie, the lowest value is returned.

    ’nunique’

    Count unique values.

    ’prod’

    Calculate the product.

    ’size’

    Calculate zone size.

    ’std’

    Calculate the standard deviation.

    ’sum’

    Calculate the sum.

    ’var’

    Calculate the variance.

  • features_field (str, optional) – If the features argument is a vector, this determines which field to use when rasterizing the features. It must match one of the fields in features. The default is to use features’ index.

  • wide_format (bool, optional) –

    If True, the resulting dataframe is returned in wide format where the columns are a cartesian product of the data_raster bands and the specified stats and the index contains the feature zone IDs.

    pandas.MultiIndex(
      [
        ('band_1', 'stat1'),
        ('band_1', 'stat2'),
        ...
        ('band_2', 'stat1'),
        ('band_2', 'stat2'),
        ...
      ],
    )
    

    If False, the resulting dataframe has columns ‘zone’, ‘band’, ‘stat1’, ‘stat2’, … and an integer index. In this case, the zone column contains the feature zone IDs and band contains the one-base integer band number. The rest of the columns correspond to the specified stats.

    The default is wide format.

  • handle_overlap (bool, optional) – Normally, polygon inputs for features are converted to a raster. This means that a cell can have only one value. In the case of overlapping polygons, one polygon will trump the others and the resulting statistics for all of the incident polygons may be affected. If True, overlapping polygons are accounted for and zonal statistics will be calculated independent of overlap. Currently this will trigger computation of features. The default is False.

Returns

A delayed dask DataFrame where the specified stats have been applied to the bands in data_raster. See the wide_format option for a description of the dataframe’s structure.

Return type

dask.dataframe.DataFrame