RSGISLib Classification Module

The classification module provides classification functionality within RSGISLib.

rsgislib.classification.generateTransectAccuracyPts(inputImage, inputLinesShp, outputPtsShp, classImgCol, classImgVecCol, classRefVecCol, lineStep, force=False)

A tool for converting a set of lines in to point transects and populating with the information for undertaking an accuracy assessment.

Where:

Parameters
  • inputImage – is a string specifying the input image file with classification.

  • inputLinesShp – is a string specifying the input lines shapefile path.

  • outputPtsShp – is a string specifying the output points shapefile path.

  • classImgCol – is a string speciyfing the name of the column in the image file containing the class names.

  • classImgVecCol – is a string specifiying the output column in the shapefile for the classified class names.

  • classRefVecCol – is an optional string specifiying an output column in the shapefile which can be used in the accuracy assessment for the reference data.

  • lineStep – is a double specifying the step along the lines between the points

  • force – is an optional boolean specifying whether the output shapefile should be deleted if is already exists (True and it will be deleted; Default is False)

Image Pixel Classification

class rsgislib.classification.classimgutils.ClassInfoObj(id=None, fileH5=None, red=None, green=None, blue=None)

This is a class to store the information associated within the classification.

Parameters
  • id – Output pixel value for this class

  • fileH5 – hdf5 file (from rsgislib.imageutils.extractZoneImageBandValues2HDF) with the training data for the class

  • red – Red colour for visualisation (0-255)

  • green – Green colour for visualisation (0-255)

  • blue – Blue colour for visualisation (0-255)

  • id – Output pixel value for this class

  • fileH5 – hdf5 file (from rsgislib.imageutils.extractZoneImageBandValues2HDF) with the training data for the class

  • red – Red colour for visualisation (0-255)

  • green – Green colour for visualisation (0-255)

  • blue – Blue colour for visualisation (0-255)

class rsgislib.classification.classimgutils.SamplesInfoObj(className=None, classID=None, maskImg=None, maskPxlVal=None, outSampImgFile=None, numSamps=None, samplesH5File=None, red=None, green=None, blue=None)

This is a class to store the information associated within the classification.

Parameters
  • className – The name of the class

  • classID – Is the classification numeric ID (i.e., output pixel value)

  • maskImg – The input image mask from which samples are taken

  • maskPxlVal – The pixel value within the mask for the class

  • outSampImgFile – Temporary file which will store the sampled pixels.

  • numSamps – The number of samples required.

  • samplesH5File – File location for the HDF5 file with the input image values for training.

  • red – for visualisation red value.

  • green – for visualisation green value.

  • blue – for visualisation blue value.

  • className – The name of the class

  • classID – Is the classification numeric ID (i.e., output pixel value)

  • maskImg – The input image mask from which samples are taken

  • maskPxlVal – The pixel value within the mask for the class

  • outSampImgFile – Temporary file which will store the sampled pixels.

  • numSamps – The number of samples required.

  • samplesH5File – File location for the HDF5 file with the input image values for training.

  • red – for visualisation red value.

  • green – for visualisation green value.

  • blue – for visualisation blue value.

rsgislib.classification.classimgutils.applyClassifer(classTrainInfo, skClassifier, imgMask, imgMaskVal, imgFileInfo, outputImg, gdalformat, classClrNames=True)

This function uses a trained classifier and applies it to the provided input image.

Parameters
  • classTrainInfo – dict (where the key is the class name) of ClassInfoObj objects which will be used to train the classifier (i.e., trainClassifier()), provide pixel value id and RGB class values.

  • skClassifier – a trained instance of a scikit-learn classifier (e.g., use trainClassifier or findClassifierParametersAndTrain)

  • imgMask – is an image file providing a mask to specify where should be classified. Simplest mask is all the valid data regions (rsgislib.imageutils.genValidMask)

  • imgMaskVal – the pixel value within the imgMask to limit the region to which the classification is applied. Can be used to create a heirachical classification.

  • imgFileInfo – a list of rsgislib.imageutils.ImageBandInfo objects (also used within rsgislib.imageutils.extractZoneImageBandValues2HDF) to identify which images and bands are to be used for the classification so it adheres to the training data.

  • outputImg – output image file with the classification. Note. by default a colour table and class names column is added to the image. If an error is produced use HFA or KEA formats.

  • gdalformat – is the output image format - all GDAL supported formats are supported.

  • classClrNames – default is True and therefore a colour table will the colours specified in classTrainInfo and a ClassName column (from imgFileInfo) will be added to the output file.

rsgislib.classification.classimgutils.findClassifierParametersAndTrain(classTrainInfo, paramSearchSampNum=0, gridSearch=GridSearchCV(cv='warn', error_score='raise-deprecating', estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None, oob_score=False, random_state=None, verbose=0, warm_start=False), iid='warn', n_jobs=None, param_grid={}, pre_dispatch='2*n_jobs', refit=True, return_train_score=False, scoring=None, verbose=0))

A function to find the optimal parameters for classification using a Grid Search (http://scikit-learn.org/stable/modules/grid_search.html). The returned classifier instance will be trained using the input data.

Parameters
  • classTrainInfo – list of ClassInfoObj objects which will be used to train the classifier.

  • paramSearchSampNum – the number of samples that will be randomly sampled from the training data for each class for applying the grid search (tend to use a small data sample as can take a long time). A value of 500 would use 500 samples per class.

  • gridSearch – is an instance of the sklearn.model_selection.GridSearchCV with an instance of the choosen classifier and parameters to be searched.

rsgislib.classification.classimgutils.performPerPxlMLClassShpTrain(imageBandInfo=[], classInfo={}, outputImg='classImg.kea', gdalformat='KEA', tmpPath='./tmp', skClassifier=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None, oob_score=False, random_state=None, verbose=0, warm_start=False), gridSearch=None, paramSearchSampNum=100)

A function which performs a per-pixel based classification of a scene using a machine learning classifier from the scikit-learn library where a single polygon shapefile per class is required to represent the training data.

Parameters
  • imageBandInfo – is a list of rsgislib.imageutils.ImageBandInfo objects specifying the images which should be used.

  • classInfo – is a dict of rsgislib.classification.classimgutils.ClassInfoObj objects where the key is the class name. The fileH5 field is used to define the file path to the shapefile with the training data.

  • outputImg – is the name and path to the output image file.

  • gdalformat – is the output image file format (e.g., KEA).

  • tmpPath – is a tempory file path which can be used during processing.

  • skClassifier – is an instance of a scikit-learn classifier appropriately parameterised. If None then the gridSearch object must not be None.

  • gridSearch – is an instance of a scikit-learn sklearn.model_selection.GridSearchCV object with the classifier and parameter search space specified. (If None then skClassifier will be used; if both not None then skClassifier will be used in preference to gridSearch)

Example:

from rsgislib.classification import classimgutils
from rsgislib import imageutils

from sklearn.ensemble import ExtraTreesClassifier
from sklearn.model_selection import GridSearchCV

imageBandInfo=[imageutils.ImageBandInfo('./LS2MSS_19750620_lat10lon6493_r67p250_rad_srefdem_30m.kea', 'Landsat', [1,2,3,4])]
classInfo=dict()
classInfo['Forest'] = classimgutils.ClassInfoObj(id=1, fileH5='./ForestRegions.shp', red=0, green=255, blue=0)
classInfo['Non-Forest'] = classimgutils.ClassInfoObj(id=2, fileH5='./NonForestRegions.shp', red=100, green=100, blue=100)


skClassifier=ExtraTreesClassifier(n_estimators=20)
classimgutils.performPerPxlMLClassShpTrain(imageBandInfo, classInfo, outputImg='classImg.kea', gdalformat='KEA', tmpPath='./tmp', skClassifier=skClassifier)
rsgislib.classification.classimgutils.performPxlClustering(inputImg, outputImg, gdalformat='KEA', noDataVal=0, imgSamp=100, clusterer=MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++', init_size=None, max_iter=100, max_no_improvement=10, n_clusters=60, n_init=3, random_state=None, reassignment_ratio=0.01, tol=0.0, verbose=0), calcStats=True, useMeanShiftEstBandWidth=False)

A function which allows a clustering to be performed using the algorithms available within the scikit-learn library. The clusterer is trained on a sample of the input image and then applied using the predict function (therefore this function is only compatiable with clusterers which have the predict function implemented) to the whole image.

Parameters
  • inputImg – input image file.

  • outputImg – output image file.

  • gdalformat – output image file format.

  • noDataVal – no data value associated with the input image.

  • imgSamp – the input image sampling. (e.g., 100 is every 100th pixel)

  • clusterer – clusterer from scikit-learn which must have a predict function.

  • calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.

  • useMeanShiftEstBandWidth – use the mean-shift algorithm as the clusterer (pass None as the clusterer) where the bandwidth is calculated from the data itself.

rsgislib.classification.classimgutils.performPxlTiledClustering(inputImg, outputImg, gdalformat='KEA', noDataVal=0, clusterer=MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++', init_size=None, max_iter=100, max_no_improvement=10, n_clusters=60, n_init=3, random_state=None, reassignment_ratio=0.01, tol=0.0, verbose=0), calcStats=True, useMeanShiftEstBandWidth=False, tileXSize=200, tileYSize=200)

A function which allows a clustering to be performed using the algorithms available within the scikit-learn library. The clusterer is applied to a single tile at a time and therefore produces tile boundaries in the result. However, memory is controlled such that usage isn’t excessive which it could be when processing a whole image.

Parameters
  • inputImg – input image file.

  • outputImg – output image file.

  • gdalformat – output image file format.

  • noDataVal – no data value associated with the input image.

  • clusterer – clusterer from scikit-learn which must have a predict function.

  • calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.

  • useMeanShiftEstBandWidth – use the mean-shift algorithm as the clusterer (pass None as the clusterer) where the bandwidth is calculated from the data itself.

  • tileXSize – tile size in the x-axis in pixels.

  • tileYSize – tile size in the y-axis in pixels.

rsgislib.classification.classimgutils.performPxlWholeImgClustering(inputImg, outputImg, gdalformat='KEA', noDataVal=0, clusterer=MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++', init_size=None, max_iter=100, max_no_improvement=10, n_clusters=60, n_init=3, random_state=None, reassignment_ratio=0.01, tol=0.0, verbose=0), calcStats=True, useMeanShiftEstBandWidth=False)

A function which allows a clustering to be performed using the algorithms available within the scikit-learn library. The clusterer is applied to the whole image in one operation so therefore requires the whole image to be loaded into memory. However, if there is sufficent memory all the clustering algorithms within scikit-learn can be applied without boundary artifacts.

Parameters
  • inputImg – input image file.

  • outputImg – output image file.

  • gdalformat – output image file format.

  • noDataVal – no data value associated with the input image.

  • clusterer – clusterer from scikit-learn which must have a predict function.

  • calcStats – calculate image pixel statistics, histogram and image pyramids - note if you are not using a KEA file then the format needs to support RATs for this option as histogram and colour table are written to RAT.

  • useMeanShiftEstBandWidth – use the mean-shift algorithm as the clusterer (pass None as the clusterer) where the bandwidth is calculated from the data itself.

rsgislib.classification.classimgutils.performVotingClassification(skClassifiers, trainSamplesInfo, imgFileInfo, classAreaMask, classMaskPxlVal, tmpDIR, tmpImgBase, outClassImg, gdalformat='KEA', numCores=-1)

A function which will perform a number of classification creating a combined classification by a simple vote. The classifier parameters can be differed as a list of classifiers is provided (the length of the list is equal to the number of votes), where the training data is resampled for each classifier. The analysis can be performed using multiple processing cores.

Where:

Parameters
  • skClassifiers – a list of classifiers (from scikit-learn), the number of classifiers defined will be equal to the number of votes.

  • trainSamplesInfo – a list of rsgislib.classification.classimgutils.SamplesInfoObj objects used to parameters the classifer and extract training data.

  • imgFileInfo – a list of rsgislib.imageutils.ImageBandInfo objects (also used within rsgislib.imageutils.extractZoneImageBandValues2HDF) to identify which images and bands are to be used for the classification so it adheres to the training data.

  • classAreaMask – a mask image which is used to specified the areas of the scene which are to be classified.

  • classMaskPxlVal – is the pixel value within the classAreaMask image for the areas of the image which are to be classified.

  • tmpDIR – a temporary file location which will be created and removed during processing.

  • tmpImgBase – the same name of files written to the tmpDIR

  • outClassImg – the final output image file.

  • gdalformat – the output file format for outClassImg

  • numCores – is the number of processing cores to be used for the analysis (if -1 then all cores on the machine will be used).

Example:

classVoteTemp = os.path.join(imgTmp, 'ClassVoteTemp')

imgFileInfo = [rsgislib.imageutils.ImageBandInfo(img2010dB, 'sardb', [1,2]), rsgislib.imageutils.ImageBandInfo(imgSRTM, 'srtm', [1])]
trainSamplesInfo = []
trainSamplesInfo.append(PerformVotingClassifier.SamplesInfoObj(className='Water', classID=1, maskImg=classTrainRegionsMask, maskPxlVal=1, outSampImgFile='WaterSamples.kea', numSamps=500, samplesH5File='WaterSamples_pxlvals.h5', red=0, green=0, blue=255))
trainSamplesInfo.append(PerformVotingClassifier.SamplesInfoObj(className='Land', classID=2, maskImg=classTrainRegionsMask, maskPxlVal=2, outSampImgFile='LandSamples.kea', numSamps=500, samplesH5File='LandSamples_pxlvals.h5', red=150, green=150, blue=150))
trainSamplesInfo.append(PerformVotingClassifier.SamplesInfoObj(className='Mangroves', classID=3, maskImg=classTrainRegionsMask, maskPxlVal=3, outSampImgFile='MangroveSamples.kea', numSamps=500, samplesH5File='MangroveSamples_pxlvals.h5', red=0, green=153, blue=0))

skClassifiers = []
for i in range(5):
    skClassifiers.append(ExtraTreesClassifier(n_estimators=50))
    
for i in range(5):
    skClassifiers.append(ExtraTreesClassifier(n_estimators=100))
    
for i in range(5):
    skClassifiers.append(ExtraTreesClassifier(n_estimators=50), max_depth=2)
    
for i in range(5):
    skClassifiers.append(ExtraTreesClassifier(n_estimators=100), max_depth=2)

mangroveRegionClassImg = MangroveRegionClass.kea
classimgutils.performVotingClassification(skClassifiers, trainSamplesInfo, imgFileInfo, classWithinMask, 1, classVoteTemp, 'ClassImgSample', mangroveRegionClassImg, gdalformat='KEA', numCores=-1)
rsgislib.classification.classimgutils.trainClassifier(classTrainInfo, skClassifier)

This function trains the classifier.

Parameters

Raster GIS

rsgislib.classification.classratutils.balanceSampleTrainingRandom(clumpsImg, trainCol, outTrainCol, minNoSamples, maxNoSamples)

A function to balance the number of training samples for classification so the number is above a minimum threshold (minNoSamples) and all equal to the class with the smallest number of samples unless that is above a set maximum (maxNoSamples).

Parameters
  • clumpsImg – is a string with the file path to the input image with RAT

  • trainCol – is a string for the name of the input column specifying the training samples (zero is no data)

  • outTrainCol – is a string with the name of the outputted training samples.

  • minNoSamples – is an int specifying the minimum number of training samples for a class (if below threshold class is removed).

  • maxNoSamples – is an int specifiying the maximum number of training samples per class.

rsgislib.classification.classratutils.classifyWithinRAT(clumpsImg, classesIntCol, classesNameCol, variables, classifier=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features=3, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1, oob_score=True, random_state=None, verbose=0, warm_start=False), outColInt='OutClass', outColStr='OutClassName', roiCol=None, roiVal=1, classColours=None, preProcessor=None, justFit=False)

A function which will perform a classification within the RAT using a classifier from scikit-learn

Parameters
  • clumpsImg – is the clumps image on which the classification is to be performed

  • classesIntCol – is the column with the training data as int values

  • classesNameCol – is the column with the training data as string class names

  • variables – is an array of column names which are to be used for the classification

  • classifier – is an instance of a scikit-learn classifier (e.g., RandomForests which is Default)

  • outColInt – is the output column name for the int class representation (Default: ‘OutClass’)

  • outColStr – is the output column name for the class names column (Default: ‘OutClassName’)

  • roiCol – is a column name for a column which specifies the region to be classified. If None ignored (Default: None)

  • roiVal – is a int value used within the roiCol to select a region to be classified (Default: 1)

  • classColours – is a python dict using the class name as the key along with arrays of length 3 specifying the RGB colours for the class.

  • preProcessor – is a scikit-learn processors such as sklearn.preprocessing.MaxAbsScaler() which can rescale the input variables independently as read in (Define: None; i.e., not in use).

  • justFit – is a boolean specifying that the classifier should just be fitted to the data and not applied (Default: False; i.e., apply classification)

Example:

from sklearn.ensemble import ExtraTreesClassifier
from rsgislib.classification import classratutils

classifier = ExtraTreesClassifier(n_estimators=100, max_features=3, n_jobs=-1, verbose=0)

classColours = dict()
classColours['Forest'] = [0,138,0]
classColours['NonForest'] = [200,200,200]

variables = ['GreenAvg', 'RedAvg', 'NIR1Avg', 'NIR2Avg', 'NDVI']
classifyWithinRAT(clumpsImg, classesIntCol, classesNameCol, variables, classifier=classifier, classColours=classColours)

from sklearn.preprocessing import MaxAbsScaler

# With pre-processor
classifyWithinRAT(clumpsImg, classesIntCol, classesNameCol, variables, classifier=classifier, classColours=classColours, preProcessor=MaxAbsScaler())
rsgislib.classification.classratutils.classifyWithinRATTiled(clumpsImg, classesIntCol, classesNameCol, variables, classifier=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features=3, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1, oob_score=True, random_state=None, verbose=0, warm_start=False), outColInt='OutClass', outColStr='OutClassName', roiCol=None, roiVal=1, classColours=None, scaleVarsRange=False, justFit=False)

A function which will perform a classification within the RAT using a classifier from scikit-learn using the rios ratapplier interface allowing very large RATs to be processed.

Parameters
  • clumpsImg – is the clumps image on which the classification is to be performed

  • classesIntCol – is the column with the training data as int values

  • classesNameCol – is the column with the training data as string class names

  • variables – is an array of column names which are to be used for the classification

  • classifier – is an instance of a scikit-learn classifier (e.g., RandomForests which is Default)

  • outColInt – is the output column name for the int class representation (Default: ‘OutClass’)

  • outColStr – is the output column name for the class names column (Default: ‘OutClassName’)

  • roiCol – is a column name for a column which specifies the region to be classified. If None ignored (Default: None)

  • roiVal – is a int value used within the roiCol to select a region to be classified (Default: 1)

  • classColours – is a python dict using the class name as the key along with arrays of length 3 specifying the RGB colours for the class.

  • scaleVarsRange – will rescale each variable independently to a range of 0-1 (default: False).

  • justFit – is a boolean specifying that the classifier should just be fitted to the data and not applied (Default: False; i.e., apply classification)

Example:

from sklearn.ensemble import ExtraTreesClassifier
from rsgislib.classification import classratutils

classifier = ExtraTreesClassifier(n_estimators=100, max_features=3, n_jobs=-1, verbose=0)

classColours = dict()
classColours['Forest'] = [0,138,0]
classColours['NonForest'] = [200,200,200]

variables = ['GreenAvg', 'RedAvg', 'NIR1Avg', 'NIR2Avg', 'NDVI']
classifyWithinRATTiled(clumpsImg, classesIntCol, classesNameCol, variables, classifier=classifier, classColours=classColours)
    
# With using range scaling.
classifyWithinRATTiled(clumpsImg, classesIntCol, classesNameCol, variables, classifier=classifier, classColours=classColours, scaleVarsRange=True)
rsgislib.classification.classratutils.clusterWithinRAT(clumpsImg, variables, clusterer=MiniBatchKMeans(batch_size=100, compute_labels=True, init='k-means++', init_size=None, max_iter=100, max_no_improvement=10, n_clusters=8, n_init=3, random_state=None, reassignment_ratio=0.01, tol=0.0, verbose=0), outColInt='OutCluster', roiCol=None, roiVal=1, clrClusters=True, clrSeed=10, addConnectivity=False, preProcessor=None)

A function which will perform a clustering within the RAT using a clustering algorithm from scikit-learn

Parameters
  • clumpsImg – is the clumps image on which the classification is to be performed.

  • variables – is an array of column names which are to be used for the clustering.

  • clusterer – is an instance of a scikit-learn clusterer (e.g., MiniBatchKMeans which is Default; Note with 8 clusters).

  • outColInt – is the output column name identifying the clusters (Default: ‘OutCluster’).

  • roiCol – is a column name for a column which specifies the region to be clustered. If None ignored (Default: None).

  • roiVal – is a int value used within the roiCol to select a region to be clustered (Default: 1).

  • clrClusters – is a boolean specifying whether the colour table should be updated to correspond to the clusters (Default: True).

  • clrSeed – is an integer seeding the random generator used to generate the colours (Default=10; if None provided system time used).

  • addConnectivity – is a boolean which adds a kneighbors_graph to the clusterer (just an option for the AgglomerativeClustering algorithm)

  • preProcessor – is a scikit-learn processors such as sklearn.preprocessing.MaxAbsScaler() which can rescale the input variables independently as read in (Define: None; i.e., not in use).

Example:

from rsgislib.classification import classratutils
from sklearn.cluster import DBSCAN

sklearnClusterer = DBSCAN(eps=1, min_samples=50)
classratutils.clusterWithinRAT('MangroveClumps.kea', ['MinX', 'MinY'], clusterer=sklearnClusterer, outColInt="OutCluster", roiCol=None, roiVal=1, clrClusters=True, clrSeed=10, addConnectivity=False)

# With pre-processor
from sklearn.preprocessing import MaxAbsScaler
classratutils.clusterWithinRAT('MangroveClumps.kea', ['MinX', 'MinY'], clusterer=sklearnClusterer, outColInt="OutCluster", roiCol=None, roiVal=1, clrClusters=True, clrSeed=10, addConnectivity=False, preProcessor=MaxAbsScaler())
rsgislib.classification.classratutils.findClassifierParameters(clumpsImg, classesIntCol, variables, preProcessor=None, gridSearch=GridSearchCV(cv='warn', error_score='raise-deprecating', estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None, oob_score=False, random_state=None, verbose=0, warm_start=False), iid='warn', n_jobs=None, param_grid={}, pre_dispatch='2*n_jobs', refit=True, return_train_score=False, scoring=None, verbose=0))

Find the optimal parameters for a classifier using a grid search and return a classifier instance with those optimal parameters.

Parameters
  • clumpsImg – is the clumps image on which the classification is to be performed

  • classesIntCol – is the column with the training data as int values

  • variables – is an array of column names which are to be used for the classification

  • preProcessor – is a scikit-learn processors such as sklearn.preprocessing.MaxAbsScaler() which can rescale the input variables independently as read in (Define: None; i.e., not in use).

  • gridSearch – is an instance of GridSearchCV parameterised with a classifier and parameters to be searched.

Returns

Instance of the classifier with optimal parameters defined.

Example:

from rsgislib.classification import classratutils
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MaxAbsScaler

clumpsImg = "./LS8_20150621_lat10lon652_r67p233_clumps.kea"
classesIntCol = 'ClassInt'

classParameters = {'kernel':['linear', 'rbf',  'poly', 'sigmoid'], 'C':[1, 2, 3, 4, 5, 10, 100, 400, 500, 1e3, 5e3, 1e4, 5e4, 1e5], 'gamma':[0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1, 'auto'], 'degree':[2, 3, 4, 5, 6, 7, 8], 'class_weight':['', 'balanced'], 'decision_function_shape':['ovo', 'ovr', None]}
variables = ['BlueRefl', 'GreenRefl', 'RedRefl', 'NIRRefl', 'SWIR1Refl', 'SWIR2Refl']

gSearch = GridSearchCV(SVC(), classParameters)
classifier = classratutils.findClassifierParameters(clumpsImg, classesIntCol, variables, preProcessor=MaxAbsScaler(), gridSearch=gSearch)
rsgislib.classification.collapseClasses(inputimage, outputimage, gdalformat, classColumn, classIntCol)

Collapses an attribute table with a large number of classified clumps (segments) to a attribute table with a single row per class (i.e. a classification rather than segmentation.

Where:

Parameters
  • inputImage – is a string containing the name and path of the input file with attribute table.

  • outputImage – is a string containing the name and path of the output file.

  • gdalformat – is a string with the output image format for the GDAL driver.

  • classColumn – is a string with the name of the column with the class names - internally this will be treated as a string column even if a numerical column is specified.

  • classIntCol – is a sting specifying the name of a column with the integer class representation. This is an optional parameter but if specified then the int reprentation of the classes will be reserved.

rsgislib.classification.colour3bands(inputimage, outputimage, gdalformat)

Generates a 3 band colour image from the colour table in the input file.

Where:

Parameters
  • inputImage – is a string containing the name and path of the input file with attribute table.

  • outputImage – is a string containing the name and path of the output file.

  • gdalformat – is a string with the output image format for the GDAL driver.

Accuracy Assessment

rsgislib.classification.generateRandomAccuracyPts(inputImage, outputShp, classImgCol, classImgVecCol, classRefVecCol, numPts, seed, force)

Generates a set of random points for accuracy assessment.

Where:

Parameters
  • inputImage – is a string containing the name and path of the input image with attribute table.

  • outputShp – is a string containing the name and path of the output shapefile.

  • classImgCol – is a string speciyfing the name of the column in the image file containing the class names.

  • classImgVecCol – is a string specifiying the output column in the shapefile for the classified class names.

  • classRefVecCol – is a string specifiying an output column in the shapefile which can be used in the accuracy assessment for the reference data.

  • numPts – is an int specifying the total number of points which should be created.

  • seed – is an int specifying the seed for the random number generator. (Optional: Default 10)

  • force – is a bool, specifying whether to force removal of the output vector if it exists. (Optional: Default False)

rsgislib.classification.generateStratifiedRandomAccuracyPts(inputImage, outputShp, classImgCol, classImgVecCol, classRefVecCol, numPts, seed, force, usePxlLst)

Generates a set of stratified random points for accuracy assessment.

Where:

Parameters
  • inputImage – is a string containing the name and path of the input image with attribute table.

  • outputShp – is a string containing the name and path of the output shapefile.

  • classImgCol – is a string speciyfing the name of the column in the image file containing the class names.

  • classImgVecCol – is a string specifiying the output column in the shapefile for the classified class names.

  • classRefVecCol – is a string specifiying an output column in the shapefile which can be used in the accuracy assessment for the reference data.

  • numPts – is an int specifying the number of points for each class which should be created.

  • seed – is an int specifying the seed for the random number generator. (Optional: Default 10)

  • force – is a bool, specifying whether to force removal of the output vector if it exists. (Optional: Default False)

  • usePxlLst – is a bool, if there are only a small number of pixels then creating a list of all the pixel locations will speed up processing. (Optional: Default False)

rsgislib.classification.generateTransectAccuracyPts(inputImage, inputLinesShp, outputPtsShp, classImgCol, classImgVecCol, classRefVecCol, lineStep, force=False)

A tool for converting a set of lines in to point transects and populating with the information for undertaking an accuracy assessment.

Where:

Parameters
  • inputImage – is a string specifying the input image file with classification.

  • inputLinesShp – is a string specifying the input lines shapefile path.

  • outputPtsShp – is a string specifying the output points shapefile path.

  • classImgCol – is a string speciyfing the name of the column in the image file containing the class names.

  • classImgVecCol – is a string specifiying the output column in the shapefile for the classified class names.

  • classRefVecCol – is an optional string specifiying an output column in the shapefile which can be used in the accuracy assessment for the reference data.

  • lineStep – is a double specifying the step along the lines between the points

  • force – is an optional boolean specifying whether the output shapefile should be deleted if is already exists (True and it will be deleted; Default is False)

rsgislib.classification.popClassInfoAccuracyPts(inputImage, inputShp, classImgCol, classImgVecCol, classRefVecCol)

Generates a set of stratified random points for accuracy assessment.

Where:

Parameters
  • inputImage – is a string containing the name and path of the input image with attribute table.

  • inputShp – is a string containing the name and path of the input shapefile.

  • classImgCol – is a string speciyfing the name of the column in the image file containing the class names.

  • classImgVecCol – is a string specifiying the output column in the shapefile for the classified class names.

  • classRefVecCol – is an optional string specifiying an output column in the shapefile which can be used in the accuracy assessment for the reference data.