RSGISLib Image Segmentation Module

The segmentation module contains the segmentation functionality for RSGISLib.

A number of steps are required for the segmentation, for most users it is recommended to use the runShepherdSegmentation helper function which will run all the required steps to generate a segmentation:

Example:

from rsgislib.segmentation import segutils

segutils.runShepherdSegmentation(inImage,
                                 outputClumps,
                                 tmpath='./',
                                 numClusters=60,
                                 minPxls=100,
                                 distThres=100,
                                 sampling=100, kmMaxIter=200)

Where ‘inImage’ is the input image (optionally masked and stretched) and ‘outputClumps’ is the output clumps file.

More information about the segmentation method is available in the following paper:

Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658

For the wider system of data analysis using segments see the following paper:

Daniel Clewley, Peter Bunting, James Shepherd, Sam Gillingham, Neil Flood, John Dymond, Richard Lucas, John Armston and Mahta Moghaddam. 2014. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing. Volume 6, Pages 6111-6135. http://www.mdpi.com/2072-4292/6/7/6111

Utilities

rsgislib.segmentation.segutils.runShepherdSegmentation(inputImg, outputClumps, outputMeanImg=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200, processInMem=False, saveProcessStats=False, imgStretchStats='', kMeansCentres='', imgStatsJSONFile='')

Utility function to call the segmentation algorithm of Shepherd et al. (2019).

Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658

Where:

Parameters
  • inputImg – is a string containing the name of the input file.

  • outputClumps – is a string containing the name of the output clump file.

  • outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.

  • tmpath – is a file path for intermediate files (default is current directory).

  • gdalformat – is a string containing the GDAL format for the output file (default = KEA).

  • noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images (default = False)/

  • noStretch – is a bool which specifies that the input image bands should not be stretched (default = False).

  • noDelete – is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).

  • numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).

  • minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).

  • distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).

  • bands – is an array providing a subset of image bands to use (default is None to use all bands).

  • sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).

  • kmMaxIter – maximum iterations for KMeans.

  • processInMem – where functions allow it perform processing in memory rather than on disk.

  • saveProcessStats – is a bool which specifies that the image stretch stats and the kMeans centre stats should be saved along with a header.

  • imgStretchStats – is a string providing the file name and path for the image stretch stats (Output).

  • kMeansCentres – is a string providing the file name and path for the KMeans clusters centres (don’t include file extension; .gmtxt will be added to the end) (Output).

  • imgStatsJSONFile – is a string providing the name and path of a JSON file storing the image spatial extent and imgStretchStats and kMeansCentres file paths for use by other commands (Output).

Example:

from rsgislib.segmentation import segutils

inputImg = 'jers1palsar_stack.kea'
outputClumps = 'jers1palsar_stack_clumps_elim_final.kea'
outputMeanImg = 'jers1palsar_stack_clumps_elim_final_mean.kea'

segutils.runShepherdSegmentation(inputImg, outputClumps, outputMeanImg, minPxls=100)
rsgislib.segmentation.tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='segtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200)

Utility function to call the segmentation algorithm of Shepherd et al. (2019) using the tiled process outlined in Clewley et al (2015).

Parameters
  • inputImage – is a string containing the name of the input file.

  • clumpsImage – is a string containing the name of the output clump file.

  • tmpath – is a file path for intermediate files (default is to create a directory ‘segtmp’). If path does current not exist then it will be created and deleted afterwards.

  • tileWidth – is an int specifying the width of the tiles used for processing (Default 2000)

  • tileHeight – is an int specifying the height of the tiles used for processing (Default 2000)

  • validDataThreshold – is a float (value between 0 - 1) used to specify the amount of valid image pixels (i.e., not a no data value of zero) are within a tile. Tiles failing to meet this threshold are merged with ones which do (Default 0.3).

  • numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).

  • minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).

  • distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).

  • bands – is an array providing a subset of image bands to use (default is None to use all bands).

  • sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).

  • kmMaxIter – maximum iterations for KMeans (Default 200).

Example:

from rsgislib.segmentation import tiledsegsingle

inputImage = 'LS5TM_20110428_sref_submask_osgb.kea'
clumpsImage = 'LS5TM_20110428_sref_submask_osgb_clumps.kea'

tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='./rsgislibsegtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=[4,5,3], sampling=100, kmMaxIter=200)
rsgislib.segmentation.segutils.runShepherdSegmentationPreCalcdStats(inputImg, outputClumps, kMeansCentres, imgStretchStats, outputMeanImg=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, minPxls=100, distThres=100, bands=None, processInMem=False)

Utility function to call the segmentation algorithm of Shepherd et al. (2019) using pre-calculated stretch stats and KMeans cluster centres.

Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658

Where:

Parameters
  • inputImg – is a string containing the name of the input file.

  • outputClumps – is a string containing the name of the output clump file.

  • kMeansCentres – is a string providing the file name and path for the KMeans clusters centres (Input)

  • imgStretchStats – is a string providing the file name and path for the image stretch stats (Input - not required if noStretch=True)

  • outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.

  • tmpath – is a file path for intermediate files (default is current directory).

  • gdalformat – is a string containing the GDAL format for the output file (default = KEA).

  • noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images (default = False)/

  • noStretch – is a bool which specifies that the input image bands should not be stretched (default = False).

  • noDelete – is a bool which specifies that the temporary images created during processing should not be deleted once processing has been completed (default = False).

  • minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).

  • distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).

  • bands – is an array providing a subset of image bands to use (default is None to use all bands).

  • sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).

  • processInMem – where functions allow it perform processing in memory rather than on disk.

Example:

from rsgislib.segmentation import segutils

inputImg = 'jers1palsar_stack.kea'
outputClumps = 'jers1palsar_stack_clumps_elim_final.kea'
outputMeanImg = 'jers1palsar_stack_clumps_elim_final_mean.kea'
kMeansCentres = 'jers1palsar_stack_kcentres.gmtxt'
imgStretchStats = 'jers1palsar_stack_stchstats.txt'

segutils.runShepherdSegmentationPreCalcdStats(inputImg, outputClumps, kMeansCentres, imgStretchStats, outputMeanImg, minPxls=100)
rsgislib.segmentation.segutils.runShepherdSegmentationTestMinObjSize(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClusters=100, minPxlsStart=10, minPxlsStep=5, numOfMinPxlsSteps=20, distThres=1000000, bands=None, sampling=100, kmMaxIter=200, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)

Utility function to call the segmentation algorithm of Shepherd et al. (2019) and to test are range of ‘k’ within the kMeans.

Where:

Parameters
  • inputImg – is a string containing the name of the input file

  • outputClumps – is a string containing the name of the output clump file

  • outStatsFile – is a string containing the name of the output CSV file with the image segmentation stats

  • outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.

  • tmpath – is a file path for intermediate files (default is current directory).

  • gdalformat – is a string containing the GDAL format for the output file (default is KEA)

  • noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images.

  • noStretch – is a bool which specifies that the input image bands should not be stretched.

  • noDelete – is a book which specifies that the temporary images created during processing should not be deleted once processing has been completed.

  • numClusters – is an int which specifies the number of clusters within the KMeans clustering process

  • minPxlsStart – is an int which specifies the minimum number pixels within a segments at the start of processing.

  • minPxlsStep – is an int which specifies the minimum number pixels within a segments increment each step.

  • numOfMinPxlsSteps – is an int which specifies the number steps (i.e., tests) which are performed.

  • distThres – specifies the distance threshold for joining the segments (default is a very large value which turns off this option.).

  • bands – is an array providing a subset of image bands to use (default is None to use all bands)

  • sampling – specify the subsampling of the image for the data used within the KMeans (1 == no subsampling; default is 100)

  • kmMaxIter – maximum iterations for KMeans.

  • minNormV – is a floating point =None

  • maxNormV – None

  • minNormMI – None

  • maxNormMI – None

Example:

from rsgislib.segmentation import segutils

inputImg = './WV2_525N040W_20110727_TOARefl_b762_stch.kea'
outputClumpsBase = './OptimalTests/WV2_525N040W_20110727_MinPxl'
outputMeanImgBase = './OptimalTests/WV2_525N040W_20110727_MinPxlMean'
tmpath='./OptimalTests/tmp/'
outStatsFile = './OptimalTests/StatsMinPxl.csv'

# Will test minimum number of pixels within an object from 10 to 100 with intervals of 5.
segutils.runShepherdSegmentationTestMinObjSize(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=outputMeanImgBase, tmpath=tmpath, noStretch=True, numClusters=100, minPxlsStart=5, minPxlsStep=5, numOfMinPxlsSteps=20, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)
rsgislib.segmentation.segutils.runShepherdSegmentationTestNumClumps(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=None, tmpath='.', gdalformat='KEA', noStats=False, noStretch=False, noDelete=False, numClustersStart=10, numClustersStep=10, numOfClustersSteps=10, minPxls=10, distThres=1000000, bands=None, sampling=100, kmMaxIter=200, processInMem=False, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)

Utility function to call the segmentation algorithm of Shepherd et al. (2019) and to test are range of ‘k’ within the kMeans.

Shepherd, J. D., Bunting, P., & Dymond, J. R. (2019). Operational Large-Scale Segmentation of Imagery Based on Iterative Elimination. Remote Sensing, 11(6), 658. http://doi.org/10.3390/rs11060658

Where:

Parameters
  • inputImg – is a string containing the name of the input file

  • outputClumps – is a string containing the name of the output clump file

  • outStatsFile – is a string containing the name of the output CSV file with the image segmentation stats

  • outputMeanImg – is the output mean image file (clumps attributed with pixel mean from input image) - pass ‘None’ to skip creating.

  • tmpath – is a file path for intermediate files (default is current directory).

  • gdalformat – is a string containing the GDAL format for the output file (default is KEA)

  • noStats – is a bool which specifies that no image statistics and pyramids should be built for the output images.

  • noStretch – is a bool which specifies that the input image bands should not be stretched.

  • noDelete – is a book which specifies that the temporary images created during processing should not be deleted once processing has been completed.

  • numClustersStart – is an int which specifies the number of clusters within the KMeans clustering to start the process

  • numClustersStep – is an int which specifies the number of clusters within the KMeans clustering added with each step

  • numOfClustersSteps – is an int which specifies the number steps (i.e., tests) which are performed.

  • minPxls – is an int which specifies the minimum number pixels within a segments.

  • distThres – specifies the distance threshold for joining the segments (default is a very large value which turns off this option.).

  • bands – is an array providing a subset of image bands to use (default is None to use all bands)

  • sampling – specify the subsampling of the image for the data used within the KMeans (1 == no subsampling; default is 100)

  • kmMaxIter – maximum iterations for KMeans.

  • processInMem – where functions allow it perform processing in memory rather than on disk.

  • minNormV – is a floating point =None

  • maxNormV – None

  • minNormMI – None

  • maxNormMI – None

Example:

from rsgislib.segmentation import segutils


inputImg = './WV2_525N040W_20110727_TOARefl_b762_stch.kea'
outputClumpsBase = './OptimalTests/WV2_525N040W_20110727_Clumps'
outputMeanImgBase = './OptimalTests/WV2_525N040W_20110727_ClumpsMean'
tmpath='./OptimalTests/tmp/'
outStatsFile = './OptimalTests/StatsClumps.csv'

# Will test clump values from 10 to 200 with intervals of 10.
segutils.runShepherdSegmentationTestNumClumps(inputImg, outputClumpsBase, outStatsFile, outputMeanImgBase=outputMeanImgBase, tmpath=tmpath, noStretch=True, numClustersStart=10, numClustersStep=10, numOfClustersSteps=20, minPxls=50, minNormV=None, maxNormV=None, minNormMI=None, maxNormMI=None)

Clump

rsgislib.segmentation.clump(inputimage, outputimage, gdalformat, processinmemory, nodata, addPxlVal2Rat)

A function which clumps an input image (of int pixel data type) to identify connected independent sets of pixels.

Where:

Parameters
  • inputimage – is a string containing the name of the input file

  • outputimage – is a string containing the name of the output file

  • gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’

  • processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).

  • nodata – is None or float

  • addPxlVal2Rat – is a boolean specifying whether the pixel value (from inputimage) should be added as a RAT.

rsgislib.segmentation.tiledclump.performClumpingSingleThread(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters
  • inputImage – the input image to be clumped.

  • clumpsImage – the output clumped image.

  • tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.

  • width – int for width of the image tiles used for processing (Default = 2000).

  • height – int for height of the image tiles used for processing (Default = 2000).

  • gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

rsgislib.segmentation.tiledclump.performClumpingMultiProcess(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=-1)

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters
  • inputImage – the input image to be clumped.

  • clumpsImage – the output clumped image.

  • tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.

  • width – int for width of the image tiles used for processing (Default = 2000).

  • height – int for height of the image tiles used for processing (Default = 2000).

  • gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

  • nCores – is an int specifying the number of cores to be used for clumping processing.

rsgislib.segmentation.tiledclump.performUnionClumpingSingleThread(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters
  • inputImage – the input image to be clumped.

  • refImg – the reference image which the union is undertaken with (typically an existing classification)

  • clumpsImage – the output clumped image.

  • tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.

  • width – int for width of the image tiles used for processing (Default = 2000).

  • height – int for height of the image tiles used for processing (Default = 2000).

  • gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

rsgislib.segmentation.tiledclump.performUnionClumpingMultiProcess(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=-1)

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters
  • inputImage – the input image to be clumped.

  • refImg – the reference image which the union is undertaken with (typically an existing classification)

  • clumpsImage – the output clumped image.

  • tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.

  • width – int for width of the image tiles used for processing (Default = 2000).

  • height – int for height of the image tiles used for processing (Default = 2000).

  • gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

  • nCores – is an int specifying the number of cores to be used for clumping processing.

Label

rsgislib.segmentation.labelPixelsFromClusterCentres(inputimage, outputimage, clustercenters, ignorezeros, gdalformat)

Labels image pixels with the ID of the nearest cluster centre.

Where:

Parameters
  • inputimage – is a string containing the name of the input file

  • outputimage – is a string containing the name of the output file

  • clustercentres – is a string containing the name of the cluster centre file

  • ignore – zeros is a bool

  • gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’

rsgislib.segmentation.relabelClumps(inputimage, outputimage, gdalformat, processinmemory)

Relabel clumps

Where:

Parameters
  • inputimage – is a string containing the name of the input file

  • outputimage – is a string containing the name of the output file

  • gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’

  • processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).

Elimination

rsgislib.segmentation.eliminateSinglePixels(inputimage, clumpsimage, outputimage, tempfile, gdalformat, processinmemory, ignorezeros)

Eliminates single pixels

Where:

Parameters
  • inputimage – is a string containing the name of the input file

  • clumpsimage – is a string containing the name of the clump file

  • outputimage – is a string containing the name of the output file

  • tempfile – is a string containing the name of the temporary file to use

  • gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’

  • processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).

  • ignorezeros – is a bool

rsgislib.segmentation.rmSmallClumps(clumpsImage, outputImage, threshold, gdalformat)

A function to remove small clumps and set them with a value of 0 (i.e., no data)

Where:

Parameters
  • clumpsImage – is a string containing the name of the input clumps file - note a column called ‘Histogram’.

  • outputImage – is a string containing the name of the output clumps file

  • threshold – is a float containing the area threshold (in pixels)

  • gdalformat – is a string defining the format of the output image.

rsgislib.segmentation.rmSmallClumpsStepwise(inputimage, clumpsimage, outputimage, gdalformat, stretchstatsavail, stretchstatsfile, storemean, processinmemory, minclumpsize, specThreshold)

eliminate clumps smaller than a given size from the scene, small clumps will be combined with their spectrally closest neighbouring clump in a stepwise fashion unless over spectral distance threshold

Where:

Parameters
  • inputimage – is a string containing the name of the input file

  • clumpsimage – is a string containing the name of the clump file

  • outputimage – is a string containing the name of the output file

  • gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’

  • stretchstatsavail – is a bool

  • stretchstatsfile – is a string containing the name of the stretch stats file

  • storemean – is a bool

  • processinmemory – is a bool specifying if processing should be carried out in memory (faster if sufficient RAM is available, set to False if unsure).

  • minclumpsize – is an unsigned integer providing the minimum size for clumps.

  • specThreshold – is a float providing the maximum (Euclidian distance) spectral separation for which to merge clumps. Set to a large value to ignore spectral separation and always merge.

Join / Union

rsgislib.segmentation.unionOfClumps(outputimage, gdalformat, inputimagepaths, nodata, addPxlVals2Rat)

The function takes the union of clumps images - combining them so all lines from all clumps are preserved in the new outputted clumps image.

Where:

Parameters
  • outputimage – is a string containing the name of the output file

  • gdalformat – is a string containing the GDAL format for the output file - eg ‘KEA’

  • inputimagepaths – is a list of input image paths

  • nodata – is None or float

  • addPxlVals2Rat – is a boolean specifying whether the pixel values (from inputimagepaths) should be added as a RAT; column names have prefix ‘ClumpVal_’ with index starting at 1 for each variable.

Visualisation

rsgislib.segmentation.meanImage(inputImage, inputClumps, outputImage, gdalformat, datatype)

A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.

Where:

Parameters
  • inputImage – is a string containing the name of the input image file from which the mean is taken.

  • inputClumps – is a string containing the name of the input clumps file

  • outputImage – is a string containing the name of the output image.

  • gdalformat – is a string defining the format of the output image.

  • datatype – is an containing one of the values from rsgislib.TYPE_*

Tiles

rsgislib.segmentation.mergeSegmentationTiles(outputimage, bordermaskimage, tileboundary, tileoverlap, tilebody, colsname, inputimagepaths)

Merge body clumps from tile segmentations into outputfile

Where:

Parameters
  • outputimage – is a string containing the name of the output file

  • bordermaskimage – is a string containing the name of the border mask file

  • tileboundary – is an unsigned integer containing the tile boundary pixel value

  • tileoverlap – is an unsigned integer containing the tile overlap pixel value

  • tilebody – is an unsigned integer containing the tile body pixel value

  • colsname – is a string containing the name of the object id column

  • inputimagepaths – is a list of input image paths

rsgislib.segmentation.tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='segtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=None, sampling=100, kmMaxIter=200)

Utility function to call the segmentation algorithm of Shepherd et al. (2019) using the tiled process outlined in Clewley et al (2015).

Parameters
  • inputImage – is a string containing the name of the input file.

  • clumpsImage – is a string containing the name of the output clump file.

  • tmpath – is a file path for intermediate files (default is to create a directory ‘segtmp’). If path does current not exist then it will be created and deleted afterwards.

  • tileWidth – is an int specifying the width of the tiles used for processing (Default 2000)

  • tileHeight – is an int specifying the height of the tiles used for processing (Default 2000)

  • validDataThreshold – is a float (value between 0 - 1) used to specify the amount of valid image pixels (i.e., not a no data value of zero) are within a tile. Tiles failing to meet this threshold are merged with ones which do (Default 0.3).

  • numClusters – is an int which specifies the number of clusters within the KMeans clustering (default = 60).

  • minPxls – is an int which specifies the minimum number pixels within a segments (default = 100).

  • distThres – specifies the distance threshold for joining the segments (default = 100, set to large number to turn off this option).

  • bands – is an array providing a subset of image bands to use (default is None to use all bands).

  • sampling – specify the subsampling of the image for the data used within the KMeans (default = 100; 1 == no subsampling).

  • kmMaxIter – maximum iterations for KMeans (Default 200).

Example:

from rsgislib.segmentation import tiledsegsingle

inputImage = 'LS5TM_20110428_sref_submask_osgb.kea'
clumpsImage = 'LS5TM_20110428_sref_submask_osgb_clumps.kea'

tiledsegsingle.performTiledSegmentation(inputImage, clumpsImage, tmpDIR='./rsgislibsegtmp', tileWidth=2000, tileHeight=2000, validDataThreshold=0.3, numClusters=60, minPxls=100, distThres=100, bands=[4,5,3], sampling=100, kmMaxIter=200)
rsgislib.segmentation.tiledclump.clumpImgFunc(imgs)

Clump an image with values provides as an array for use within a multiprocessing Pool

rsgislib.segmentation.tiledclump.performClumpingMultiProcess(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=-1)

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters
  • inputImage – the input image to be clumped.

  • clumpsImage – the output clumped image.

  • tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.

  • width – int for width of the image tiles used for processing (Default = 2000).

  • height – int for height of the image tiles used for processing (Default = 2000).

  • gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

  • nCores – is an int specifying the number of cores to be used for clumping processing.

rsgislib.segmentation.tiledclump.performClumpingSingleThread(inputImage, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')

Clump the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters
  • inputImage – the input image to be clumped.

  • clumpsImage – the output clumped image.

  • tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.

  • width – int for width of the image tiles used for processing (Default = 2000).

  • height – int for height of the image tiles used for processing (Default = 2000).

  • gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

rsgislib.segmentation.tiledclump.performUnionClumpingMultiProcess(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA', nCores=-1)

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters
  • inputImage – the input image to be clumped.

  • refImg – the reference image which the union is undertaken with (typically an existing classification)

  • clumpsImage – the output clumped image.

  • tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.

  • width – int for width of the image tiles used for processing (Default = 2000).

  • height – int for height of the image tiles used for processing (Default = 2000).

  • gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

  • nCores – is an int specifying the number of cores to be used for clumping processing.

rsgislib.segmentation.tiledclump.performUnionClumpingSingleThread(inputImage, refImg, clumpsImage, tmpDIR='tmp', width=2000, height=2000, gdalformat='KEA')

Clump and union with the reference image the input image using a tiled processing chain allowing large images to be clumped more quickly.

Parameters
  • inputImage – the input image to be clumped.

  • refImg – the reference image which the union is undertaken with (typically an existing classification)

  • clumpsImage – the output clumped image.

  • tmpDIR – the temporary directory where intermediate files will be written (default is ‘tmp’). Directory will be created and deleted if does not exist.

  • width – int for width of the image tiles used for processing (Default = 2000).

  • height – int for height of the image tiles used for processing (Default = 2000).

  • gdalformat – string with the GDAL image format for the output image (Default = KEA). NOTE. KEA is used as intermediate format internally and therefore needs to be available.

rsgislib.segmentation.tiledclump.unionClumpImgFunc(imgs)

Union Clump an image with values provides as an array for use within a multiprocessing Pool

scikit-image

rsgislib.segmentation.skimgseg.performFelsenszwalbSegmentation(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, scale=1, sigma=0.8, min_size=20)

A function to perform the Felsenszwalb segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters
  • inputImg – input image file.

  • outputImg – output image file.

  • gdalformat – output image file format.

  • tmpDIR – temp DIR used to output PCA files

  • calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.

  • usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.

  • nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.

  • scale – scikit-image Felsenszwalb parameter: ‘Free parameter. Higher means larger clusters.’

  • sigma – scikit-image Felsenszwalb parameter: ‘Width of Gaussian kernel used in preprocessing.’

  • min_size – scikit-image Felsenszwalb parameter: ‘Minimum component size. Enforced using postprocessing.’

rsgislib.segmentation.skimgseg.performQuickshiftSegmentation(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, pcaPxlSample=100, ratio=1.0, kernel_size=5, max_dist=10, sigma=0, convert2lab=True, random_seed=42)

A function to perform the quickshift segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters
  • inputImg – input image file.

  • outputImg – output image file.

  • gdalformat – output image file format.

  • tmpDIR – temp DIR used to output PCA files

  • calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.

  • usePCA – if there are not 3 image bands in the input file then you can use PCA to reduce the number of image bands.

  • ratio – scikit-image Quickshift parameter: ‘Balances color-space proximity and image-space proximity. Higher values give more weight to color-space. (between 0 and 1)’

  • kernel_size – scikit-image Quickshift parameter: ‘Width of Gaussian kernel used in smoothing the sample density. Higher means fewer clusters.’

  • max_dist – scikit-image Quickshift parameter: ‘Cut-off point for data distances. Higher means fewer clusters.’

  • sigma – scikit-image Quickshift parameter: ‘Width for Gaussian smoothing as preprocessing. Zero means no smoothing.’

  • convert2lab – scikit-image Quickshift parameter: ‘Whether the input should be converted to Lab colorspace prior to segmentation. For this purpose, the input is assumed to be RGB.’

  • random_seed – scikit-image Quickshift parameter: ‘Random seed used for breaking ties.’

rsgislib.segmentation.skimgseg.performRandomWalkerSegmentation(inputImg, markersImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, beta=130, mode='bf', tol=0.001, spacing=None)

A function to perform the random walker segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters
  • inputImg – input image file.

  • markersImg – input markers image file - markers must be uniquely numbered.

  • outputImg – output image file.

  • gdalformat – output image file format.

  • tmpDIR – temp DIR used to output PCA files

  • calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.

  • usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.

  • nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.

  • beta – scikit-image random_walker parameter: ‘Penalization coefficient for the random walker motion (the greater beta, the more difficult the diffusion).’

  • mode – scikit-image random_walker parameter: ‘Mode for solving the linear system in the random walker algorithm. Available options {‘cg_mg’, ‘cg’, ‘bf’}.’ * ‘bf’ (brute force): an LU factorization of the Laplacian is computed. This is fast for small images (<1024x1024), but very slow and memory-intensive for large images (e.g., 3-D volumes). * ‘cg’ (conjugate gradient): the linear system is solved iteratively using the Conjugate Gradient method from scipy.sparse.linalg. This is less memory-consuming than the brute force method for large images, but it is quite slow. * ‘cg_mg’ (conjugate gradient with multigrid preconditioner): a preconditioner is computed using a multigrid solver, then the solution is computed with the Conjugate Gradient method. This mode requires that the pyamg module (http://pyamg.org/) is installed. For images of size > 512x512, this is the recommended (fastest) mode.

  • tol – scikit-image random_walker parameter: ‘tolerance to achieve when solving the linear system, in cg’ and ‘cg_mg’ modes.’

  • spacing – scikit-image random_walker parameter: ‘Spacing between voxels in each spatial dimension. If None, then the spacing between pixels/voxels in each dimension is assumed 1.’

rsgislib.segmentation.skimgseg.performSlicSegmentation(inputImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, n_segments=100, compactness=10.0, max_iter=10, sigma=0, spacing=None, convert2lab=None, enforce_connectivity=True, min_size_factor=0.5, max_size_factor=3, slic_zero=False)

A function to perform the slic segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters
  • inputImg – input image file.

  • outputImg – output image file.

  • gdalformat – output image file format.

  • tmpDIR – temp DIR used to output PCA files

  • calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.

  • usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.

  • nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.

  • n_segments – scikit-image Slic parameter: ‘The (approximate) number of labels in the segmented output image.’

  • compactness – scikit-image Slic parameter: ‘Balances color proximity and space proximity. Higher values give more weight to space proximity, making superpixel shapes more square/cubic. In SLICO mode, this is the initial compactness. This parameter depends strongly on image contrast and on the shapes of objects in the image. We recommend exploring possible values on a log scale, e.g., 0.01, 0.1, 1, 10, 100, before refining around a chosen value.’

  • max_iter – scikit-image Slic parameter: ‘Maximum number of iterations of k-means.’

  • sigma – scikit-image Slic parameter: ‘Width of Gaussian smoothing kernel for pre-processing for each dimension of the image. The same sigma is applied to each dimension in case of a scalar value. Zero means no smoothing. Note, that sigma is automatically scaled if it is scalar and a manual voxel spacing is provided (see Notes section).’

  • spacing – scikit-image Slic parameter: ‘The voxel spacing along each image dimension. By default, slic assumes uniform spacing (same voxel resolution along z, y and x). This parameter controls the weights of the distances along z, y, and x during k-means clustering.’

  • convert2lab – scikit-image Slic parameter: ‘Whether the input should be converted to Lab colorspace prior to segmentation. The input image must be RGB. Highly recommended.’

  • enforce_connectivity – scikit-image Slic parameter: ‘Whether the generated segments are connected or not’

  • min_size_factor – scikit-image Slic parameter: ‘Proportion of the minimum segment size to be removed with respect to the supposed segment size “depth:paramwidth*height/n_segments”’

  • max_size_factor – scikit-image Slic parameter: ‘Proportion of the maximum connected segment size. A value of 3 works in most of the cases.’

  • slic_zero – scikit-image Slic parameter: ‘Run SLIC-zero, the zero-parameter mode of SLIC.’

rsgislib.segmentation.skimgseg.performWatershedSegmentation(inputImg, markersImg, outputImg, gdalformat='KEA', noDataVal=0, tmpDIR='./tmp', calcStats=True, usePCA=False, nPCABands=3, pcaPxlSample=100, compactness=0, watershed_line=False)

A function to perform the watershed segmentation algorithm from the scikit-image library (http://scikit-image.org/docs/stable/api/skimage.segmentation.html).

Parameters
  • inputImg – input image file.

  • markersImg – input markers image file.

  • outputImg – output image file.

  • gdalformat – output image file format.

  • tmpDIR – temp DIR used to output PCA files

  • calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.

  • usePCA – if there are not 1 or 3 image bands in the input file then you can use PCA to reduce the number of image bands.

  • nPCABands – the number of principle components outputs from the PCA - needs to be either 1 or 3.

  • compactness – scikit-image Watershed parameter: ‘Use compact watershed with given compactness parameter. Higher values result in more regularly-shaped watershed basins; Peer Neubert & Peter Protzel (2014). Compact Watershed and Preemptive SLIC: On Improving Trade-offs of Superpixel Segmentation Algorithms. ICPR 2014’

  • watershed_line – scikit-image Watershed parameter: ‘If watershed_line is True, a one-pixel wide line separates the regions obtained by the watershed algorithm. The line has the label 0.’

Other

rsgislib.segmentation.generateRegularGrid(inputImage, outputClumps, gdalformat, numXPxls, numYPxls, offset)

A function to generate an image where with the mean value for each clump. Primarily for visualisation and evaluating segmentation.

Where:

Parameters
  • inputImage – is a string containing the name of the input image file specifying the dimensions of the output image.

  • outputClumps – is a string containing the name and path of the output clumps image

  • gdalformat – is a string defining the format of the output image.

  • numXPxls – is the size of the grid cells in the X axis in pixel units.

  • numYPxls – is the size of the grid cells in the Y axis in pixel units.

  • offset – is a boolean specifying whether the grid should be offset, i.e., starts half way point of numXPxls and numYPxls (Default is false; optional)

rsgislib.segmentation.dropSelectedClumps(clumpsImage, outputClumps, gdalformat)

A function to drop the selected clumps from the segmentation.

Where:

Parameters
  • clumpsImage – is a string containing the filepath for the input clumps image.

  • outputClumps – is a string containing the name and path of the output clumps image

  • gdalformat – is a string defining the format of the output image.

  • selectClumpsCol – is a string defining the binary column for defining the segments to be merged (1 == selected clumps).

rsgislib.segmentation.findTileBordersMask(bordermaskimage, tileboundary, tileoverlap, tilebody, colsname, inputimagepaths)

Mask tile borders

Where:

Parameters
  • bordermaskimage – is a string containing the name of the border mask file

  • tileboundary – is an unsigned integer containing the tile boundary pixel value

  • tileoverlap – is an unsigned integer containing the tile overlap pixel value

  • tilebody – is an unsigned integer containing the tile body pixel value

  • colsname – is a string containing the name of the object id column

  • inputimagepaths – is a list of input clump image paths

rsgislib.segmentation.includeRegionsInClumps(clumpsImage, regionsImage, outputClumps, gdalformat)

A function to include a set of clumped regions within an existing clumps (i.e., segmentation) image. NOTE. You should run the relabelClumps function on the output of this command before using further.

Where:

Parameters
  • clumpsImage – is a string containing the filepath for the input clumps image.

  • regionsImage – is a string containing the filepath for the input regions image.

  • outputClumps – is a string containing the name and path of the output clumps image

  • gdalformat – is a string defining the format of the output image.

rsgislib.segmentation.mergeClumpImages(inputimagepaths, outputimage, mergeRATs)

Merge all clumps from tile segmentations into outputfile

Where:

Parameters
  • inputimagepaths – is a list of input image paths

  • outputimage – is a string containing the name of the output file

  • mergeRATs – is a boolean specifying with the image RATs are to merged (Default: false; Optional)

rsgislib.segmentation.mergeEquivClumps(clumpsImage, outputClumps, gdalformat, valClumpsCols)

A function to merge neighbouring clumps which have the same value - for example when merging across tile boundaries.

Where:

Parameters
  • clumpsImage – is a string containing the filepath for the input clumps image.

  • outputClumps – is a string containing the name and path of the output clumps image

  • gdalformat – is a string defining the format of the output image.

  • valClumpsCol – is a list of strings defining the value(s) used to define equivalence (typically it might be the original pixel values when clumping through tiling).

rsgislib.segmentation.mergeSegments2Neighbours(clumpsImage, spectralImage, outputClumps, gdalformat, selectedClumpsCol, noDataClumpsCol)

A function to merge some selected clumps with the neighbours based on colour (spectral) distance where clumps identified as no data are ignored.

Where:

Parameters
  • clumpsImage – is a string containing the filepath for the input clumps image.

  • spectralImage – is a string containing the filepath for the input image used to define ‘distance’.

  • outputClumps – is a string containing the name and path of the output clumps image

  • gdalformat – is a string defining the format of the output image.

  • selectClumpsCol – is a string defining the binary column for defining the segments to be merged (1 == selected clumps).

  • noDataClumpsCol – is a string defining the binary column for defining the segments to be ignored as no data (1 == no-data clumps).

rsgislib.segmentation.pxlGrowRegions(clumpsImage, valsImage, outputImage, gdalformat, muParseCriteria, varNameBandPairs)

A function to merge neighbouring clumps which have the same value - for example when merging across tile boundaries.

Where:

Parameters
  • clumpsImage – is a string containing the filepath for the input clumps image.

  • valsImage – is a string containing the file path for the values (criteria) image.

  • outputClumps – is a string containing the name and path of the output clumps image

  • gdalformat – is a string defining the format of the output image.

  • muParseCriteria – is a string with an muparser criteria (muparser; e.g., b1 < 20?1:0). Expression output must be 0 or 1 (1 for True).

  • varNameBandPairs – is a list pairs specifying the variable name (in muparser expression) and the band number to which it refers in valsImage (note band numbers start a 1).

Example:

varBandPair = collections.namedtuple('VarBandPair', ['varName', 'bandIndex'])
varBandPairSeq = list()
varBandPairSeq.append(varBandPair(varName='b1', bandIndex=1))
muParseCriteria = 'b1 > 1000?1:0'
rsgislib.segmentation.pxlGrowRegions(tmpInitClearSkyRegionsFinal, tmpCloudsImgDist2CloudsNoData, tmpClearSkyRegionsGrow, 'KEA', muParseCriteria, varBandPairSeq)