RSGISLib Vector Attributes Module

Basic Read and Writing Columns

rsgislib.vectorattrs.write_vec_column(out_vec_file: str, out_vec_lyr: str, att_column: str, att_col_datatype: int, att_col_data: List)

A function which will write a column to a vector file

Parameters:
  • out_vec_file – The file / path to the vector data ‘file’.

  • out_vec_lyr – The layer to which the data is to be added.

  • att_column – Name of the output column

  • att_col_datatype – ogr data type for the output column (e.g., ogr.OFTString, ogr.OFTInteger, ogr.OFTReal)

  • att_col_data – A list of the same length as the number of features in vector file.

rsgislib.vectorattrs.write_vec_column_to_layer(out_vec_lyr_obj: Layer, att_column: str, att_col_datatype: int, att_col_data: List)

A function which will write a column to a vector layer.

Parameters:
  • out_vec_lyr_obj – GDAL/OGR vector layer object

  • att_column – Name of the output column

  • att_col_datatype – ogr data type for the output column (e.g., ogr.OFTString, ogr.OFTInteger, ogr.OFTReal)

  • att_col_data – A list of the same length as the number of features in vector file.

rsgislib.vectorattrs.read_vec_column(vec_file: str, vec_lyr: str, att_column: str) List

A function which will reads a column from a vector file

Parameters:
  • vec_file – The file / path to the vector data ‘file’.

  • vec_lyr – The layer to which the data is to be read from.

  • att_column – Name of the input column

Returns:

a list with the column values.

rsgislib.vectorattrs.read_vec_columns(vec_file: str, vec_lyr: str, att_columns: List[str]) List[Dict]

A function which will reads a number of column from a vector file

Parameters:
  • vec_file – The file / path to the vector data ‘file’.

  • vec_lyr – The layer to which the data is to be read from.

  • att_columns – List of input attribute column names to be read in.

Returns:

list of dicts with the column names as keys

rsgislib.vectorattrs.get_vec_cols_as_array(vec_file: str, vec_lyr: str, cols: List[str], lower_limit: float = None, upper_limit: float = None) array

A function returns an n x m numpy array with the values for the columns specified.

Parameters:
  • vec_file – Input vector file.

  • vec_lyr – Input vector layer within the input file.

  • cols – list of columns to be read and returned.

  • no_data_val – no data value used within the column values. Rows with a no data value will be dropped. If None then ignored (Default: None)

  • lower_limit – Optional lower limit to define valid values. Note the same value is used for all the columns listed. If a value is found to be outside of the threshold the whole row is removed.

  • upper_limit – Optional upper limit to define valid values. Note the same value is used for all the columns listed. If a value is found to be outside of the threshold the whole row is removed.

Returns:

a numpy array with the column values.

Add Columns

rsgislib.vectorattrs.add_fid_col(vec_file: str, vec_lyr: str, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG', out_col: str = 'fid')

A function which adds a numeric feature ID (FID) column with unique values per feature within the file.

Parameters:
  • vec_file – Input vector file.

  • vec_lyr – Input vector layer within the input file.

  • out_vec_file – Output vector file

  • out_vec_lyr – output vector layer name.

  • out_format – output file format (default GPKG).

  • out_col – The output FID column name (Default: fid)

rsgislib.vectorattrs.add_numeric_col_lut(vec_file: str, vec_lyr: str, ref_col: str, val_lut: Dict, out_col: str, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG')

A function which adds a numeric column based off an existing column in the vector file, using an dict LUT to define the values.

Parameters:
  • vec_file – Input vector file.

  • vec_lyr – Input vector layer within the input file.

  • ref_col – The column within which the unique values will be identified.

  • val_lut – A dict LUT (key should be value in ref_col and value be the value outputted to out_col).

  • out_col – The output numeric column

  • out_vec_file – Output vector file

  • out_vec_lyr – output vector layer name.

  • out_format – output file format (default GPKG).

rsgislib.vectorattrs.add_numeric_col(vec_file: str, vec_lyr: str, out_col: str, out_vec_file: str, out_vec_lyr: str, out_val: float = 1, out_format: str = 'GPKG', out_col_int: bool = False)

A function which adds a numeric column with the same value for all the features.

Parameters:
  • vec_file – Input vector file.

  • vec_lyr – Input vector layer within the input file.

  • out_col – The output numeric column

  • out_vec_file – Output vector file

  • out_vec_lyr – output vector layer name.

  • out_val – output numeric value

  • out_format – output file format (default GPKG).

  • out_col_int – Specify whether the output column should be an int datatype. If True (default: False) then the output column will be of type int. If False then it will be type float.

rsgislib.vectorattrs.add_string_col(vec_file: str, vec_lyr: str, out_col: str, out_vec_file: str, out_vec_lyr: str, out_val: str = 'str_val', out_format: str = 'GPKG')

A function which adds a string column with the same value for all the features.

Parameters:
  • vec_file – Input vector file.

  • vec_lyr – Input vector layer within the input file.

  • out_col – The output numeric column

  • out_vec_file – Output vector file

  • out_vec_lyr – output vector layer name.

  • out_val – output numeric value

  • out_format – output file format (default GPKG).

rsgislib.vectorattrs.add_string_col_lut(vec_file: str, vec_lyr: str, ref_col: str, val_lut: Dict, out_col: str, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG')

A function which adds a string (text) column based off an existing column in the vector file, using an dict LUT to define the values.

Parameters:
  • vec_file – Input vector file.

  • vec_lyr – Input vector layer within the input file.

  • ref_col – The column within which the unique values will be identified.

  • val_lut – A dict LUT (key should be value in ref_col and value be the value outputted to out_col).

  • out_col – The output numeric column

  • out_vec_file – Output vector file

  • out_vec_lyr – output vector layer name.

  • out_format – output file format (default GPKG).

rsgislib.vectorattrs.add_numeric_col_range_lut(vec_file: str, vec_lyr: str, vec_col: str, out_vec_file: str, out_vec_lyr: str, out_vec_col: str, val_lut: Dict[int, Tuple[float, float]], out_format: str = 'GPKG')

A function which adds a numerical column to the vector layer using an LUT and low (>=) and upper (<) values with reference to the input column for defining the output value which will be the LUT key.

Parameters:
  • vec_file – Input vector file.

  • vec_lyr – Input vector layer within the input file.

  • vec_col – The column within which the unique values will be identified.

  • out_vec_file – Output vector file

  • out_vec_lyr – Output vector layer name.

  • out_vec_col – The output numeric column

  • val_lut – the LUT for defining the output values. Features outside of the values defined by the LUT will be set as zero. The LUT should define an int as the key which will be the output value and a tuple specifying the lower (>=) and upper (<) values within the vec_col for setting the key value.

  • out_format – output file format (default GPKG).

rsgislib.vectorattrs.add_numeric_col_from_lst_lut(vec_file: str, vec_lyr: str, ref_col: str, vals_lut: List[Tuple[str | int, int]], out_col: str, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG')

A function which adds a numeric column based off an existing column in the vector file, using an list based LUT to define the values. The LUT should be defined as a list of tuples with the value to match as the first value and the second the value to be outputted. For example, (“Hello”, 1) or (“World”, 2)

Parameters:
  • vec_file – Input vector file.

  • vec_lyr – Input vector layer within the input file.

  • ref_col – The column within which the unique values will be identified.

  • vals_lut – A list LUT which should be a list of tuples (LookUp, OutValue).

  • out_col – The output numeric column

  • out_vec_file – Output vector file

  • out_vec_lyr – output vector layer name.

  • out_format – output file format (default GPKG).

rsgislib.vectorattrs.create_name_col(vec_file: str, vec_lyr: str, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG', out_col: str = 'names', x_col: str = 'MinX', y_col: str = 'MaxY', prefix: str = '', postfix: str = '', coords_lat_lon: bool = True, int_coords: bool = True, coord_gain: float = 0.0, zero_x_pad: int = 0, zero_y_pad: int = 0, round_n_digts: int = 0, non_neg: bool = False, replace_dec_pt: bool = True, dec_pt_val: str = '')

A function which creates a column in the vector layer which can define a name using coordinates associated with the feature. Often this is useful if a tiling has been created and from this a set of images are to generated for example.

Parameters:
  • vec_file – input vector file

  • vec_lyr – input vector layer name

  • out_vec_file – output vector file

  • out_vec_lyr – output vector layer name

  • out_format – The output format of the output file. (Default: GPKG)

  • out_col – The name of the output column

  • x_col – The column with the x coordinate

  • y_col – The column with the y coordinate

  • prefix – A prefix to the name

  • postfix – A postfix to the name

  • coords_lat_lon – A boolean specifying if the coordinates are lat / long

  • int_coords – A boolean specifying whether to integerise the coordinates.

  • coord_gain – Apply a gain to the coordinate before integerise. Default = 0.0 (i.e., no gain)

  • zero_x_pad – If larger than zero then the X coordinate will be zero padded.

  • zero_y_pad – If larger than zero then the Y coordinate will be zero padded.

  • round_n_digts – If larger than zero then the coordinates will be rounded to n significant digits

  • non_neg – boolean specifying whether an negative coordinates should be made positive. (Default: False)

  • replace_dec_pt – replace the decimal point with another string. Default: True

  • dec_pt_val – the value used instead of a decimal point. Default: “” i.e., empty string so decimal point is removed.

Column Utilities

rsgislib.vectorattrs.drop_vec_cols(vec_file: str, vec_lyr: str, drop_cols: List[str], out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG', chk_cols_present: bool = True)

A function which allows vector columns to be removed from the layer.

param vec_file: Input vector file :param vec_lyr: Input vector layer :param drop_cols: List of columns to remove from layer :param out_vec_file: the output vector file :param out_vec_lyr: the output vector layer :param out_format: the output vector format (Default: GPKG) :param chk_cols_present: boolean (default: True) to check that the columns to be

removed are present and remove those from the list which are not present.

rsgislib.vectorattrs.rename_vec_cols(vec_file: str, vec_lyr: str, rname_cols_lut: Dict[str, str], out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG')

A function which allows vector column to be renamed.

Parameters:
  • vec_file – Input vector file

  • vec_lyr – Input vector layer

  • rname_cols_lut – dict look up for the columns to be renamed. Format: {“orig_name”: “new_name”}

  • out_vec_file – the output vector file

  • out_vec_lyr – the output vector layer

  • out_format – the output vector format (Default: GPKG)

Joins

rsgislib.vectorattrs.perform_spatial_join(vec_base_file: str, vec_base_lyr: str, vec_join_file: str, vec_join_lyr: str, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG', join_how: str = 'inner', join_op: str = 'within', vec_base_epsg: int = None, vec_join_epsg: int = None)

A function to perform a spatial join between two vector layers. This function uses geopandas so this needs to be installed. You also need to have the rtree package to generate the index used to perform the intersection.

Note, defining epsg codes for the datasets does not reproject the datasets but just makes sure that correct projection is being used.

For more information see: http://geopandas.org/mergingdata.html#spatial-joins

Parameters:
  • vec_base_file – the base vector file with the geometries which will be outputted.

  • vec_base_lyr – the layer name for the base vector.

  • vec_join_file – the vector with the attributes which will be joined to the base vector geometries.

  • vec_join_lyr – the layer name for the join vector.

  • out_vec_file – the output vector file.

  • out_vec_lyr – the layer name for the output vector.

  • out_format – The output vector file format (Default GPKG)

  • join_how – Specifies the type of join that will occur and which geometry is retained. The options are [left, right, inner]. The default is ‘inner’

  • join_op – Defines whether or not to join the attributes of one object to another. The options are [intersects, within, contains] and default is ‘within’

  • vec_base_epsg – Optionally provide the epsg code for the base vector layer.

  • vec_join_epsg – Optionally provide the epsg code for the join vector layer.

Calculate Column Values

rsgislib.vectorattrs.pop_bbox_cols(vec_file: str, vec_lyr: str, x_min_col: str = 'xmin', x_max_col: str = 'xmax', y_min_col: str = 'ymin', y_max_col: str = 'ymax')

A function which adds a polygons boundary bbox as attributes to each feature.

Parameters:
  • vec_file – vector file.

  • vec_lyr – layer within the vector file.

  • x_min_col – output column name.

  • x_max_col – output column name.

  • y_min_col – output column name.

  • y_max_col – output column name.

rsgislib.vectorattrs.add_geom_bbox_cols(vec_file: str, vec_lyr: str, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG', min_x_col: str = 'MinX', max_x_col: str = 'MaxX', min_y_col: str = 'MinY', max_y_col: str = 'MaxY')

A function which adds columns to the vector layer with the bbox of each geometry.

Parameters:
  • vec_file – input vector file

  • vec_lyr – input vector layer name

  • out_vec_file – output vector file

  • out_vec_lyr – output vector layer name

  • out_format – The output format of the output file. (Default: GPKG)

  • min_x_col – Name of the MinX column (Default: MinX)

  • max_x_col – Name of the MaxX column (Default: MaxX)

  • min_y_col – Name of the MinY column (Default: MinY)

  • max_y_col – Name of the MaxY column (Default: MaxY)

rsgislib.vectorattrs.add_unq_numeric_col(vec_file: str, vec_lyr: str, unq_col: str, out_col: str, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG', lut_json_file: str = None)

A function which adds a numeric column based off an existing column in the vector file.

Parameters:
  • vec_file – Input vector file.

  • vec_lyr – Input vector layer within the input file.

  • unq_col – The column within which the unique values will be identified.

  • out_col – The output numeric column

  • out_vec_file – Output vector file

  • out_vec_lyr – output vector layer name.

  • out_format – output file format (default GPKG).

  • lut_json_file – an optional output LUT file.

rsgislib.vectorattrs.calc_npts_in_radius(vec_in_file: str, vec_in_lyr: str, radius: float, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG', out_col_name: str = 'n_pts_r', n_cores: int = 1)

A function which calculate the number of points intersecting within a radius of each point.

Parameters:
  • vec_in_file – Input vector file path (must be points geometry)

  • vec_in_lyr – Input vector layer (must be points geometry)

  • radius – the search radius

  • out_vec_file – Output vector file path

  • out_vec_lyr – Output vector layer

  • out_format – output vector format (Default: GPKG)

  • out_col_name – output column name (Default: n_pts_r)

  • n_cores – the number of cores to be used for the query. If -1 is passed then all available cores will be used.

rsgislib.vectorattrs.create_angle_sets(vec_file: str, vec_lyr: str, angle_col: str, start_angle: int, angle_set_width: int, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG', out_angle_set_col: str = 'angle_set')

A function which creates sets of features based on an angle column. The assumption is that the angle is from a fixed centre point. The angle sets are mirrored so you can look at patterns along an angle.

Parameters:
  • vec_file – Input vector file path

  • vec_lyr – The input vector layer name.

  • angle_col – The name of the column within the vector layer with the angles the angles must be degrees (0-360)

  • start_angle – The angle to start the angle sets from.

  • angle_set_width – The width of the angle sets - must divide in 180.

  • out_vec_file – The output vector file path.

  • out_vec_lyr – The output vector layer name

  • out_format – The output vector file format (Default: GPKG)

  • out_angle_set_col – The column in the output file with the column sets. The column sets are specified by an integer ID (1 - n)

Get Column Summaries

rsgislib.vectorattrs.get_unq_col_values(vec_file: str, vec_lyr: str, col_name: str) array

A function which splits a vector layer by an attribute value into either different layers or different output files.

Parameters:
  • vec_file – Input vector file

  • vec_lyr – Input vector layer

  • col_name – The column name for which a list of unique values will be returned.

Returns:

a numpy array as a list of the unique within the column.

Sort By Attributes

rsgislib.vectorattrs.sort_vec_lyr(vec_file: str, vec_lyr: str, out_vec_file: str, out_vec_lyr: str, sort_by: str | List[str], ascending: bool | List[bool], out_format: str = 'GPKG')

A function which sorts a vector layer based on the attributes of the layer. You can sort by either a single attribute or within multiple attributes if a list is provided. This function is implemented using geopandas.

Parameters:
  • vec_file – the input vector file.

  • vec_lyr – the input vector layer name.

  • out_vec_file – the output vector file.

  • out_vec_lyr – the output vector layer name.

  • sort_by – either a string with the name of a single attribute or a list of strings if multiple attributes are used for the sort.

  • ascending – either a bool (True: ascending; False: descending) or list of bools if a list of attributes was given.

  • out_format – The output vector file format (Default: GPKG)

Change Attribute Values

rsgislib.vectorattrs.find_replace_str_vec_lyr(vec_file: str, vec_lyr: str, out_vec_file: str, out_vec_lyr: str, cols: List[str], find_replace: Dict[str, str], out_format: str = 'GPKG')

A function which performs a find and replace on a string column(s) within the vector layer. For example, replacing a no data value (e.g., NA) with something more useful. This function is implemented using geopandas.

Parameters:
  • vec_file – the input vector file.

  • vec_lyr – the input vector layer name.

  • out_vec_file – the output vector file.

  • out_vec_lyr – the output vector layer name.

  • cols – a list of strings with the names of the columns to which the find and replace is to be applied.

  • find_replace – the value pairs where the dict keys are the values to be replaced and the value is the replacement value.

  • out_format – The output vector file format (Default: GPKG)

rsgislib.vectorattrs.check_str_col(vec_file: str, vec_lyr: str, vec_col: str, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG', rm_non_ascii: bool = True, rm_dashs: bool = False, rm_spaces: bool = False, rm_punc: bool = False)

A function which checks the values in a string column removing non-ascii characters and optionally removing spaces, dashes and punctuation.

Parameters:
  • vec_file – the input vector file.

  • vec_lyr – the input vector layer name.

  • vec_col – the name of the column to be checked.

  • out_vec_file – the output vector file.

  • out_vec_lyr – the output vector layer name.

  • out_format – The output vector file format (Default: GPKG)

  • rm_non_ascii – If True (default True) remove any non-ascii characters from the string

  • rm_dashs – If True (default False) remove any dashes from the string and replace with underscores.

  • rm_spaces – If True (default False) remove any spaces from the string.

  • rm_punc – If True (default False) remove any punctuation (other than ‘_’ or ‘-’) from the string.

Geometry Intersections

rsgislib.vectorattrs.count_pt_intersects(vec_in_file: str, vec_in_lyr: str, vec_pts_file: str, vec_pts_lyr: str, out_vec_file: str, out_vec_lyr: str, out_format: str = 'GPKG', out_count_col: str = 'n_points', tmp_col_name: str = 'tmp_join_fid', vec_in_epsg: int = None, vec_pts_epsg: int = None)

A function which counts the number of points intersecting a set of polygons adding the count to each polygon as a new column.

Note, defining epsg codes for the datasets does not reproject the datasets but just makes sure that correct projection is being used.

Parameters:
  • vec_in_file – the input polygons vector file path.

  • vec_in_lyr – the input polygons vector layer name

  • vec_pts_file – the points vector file path

  • vec_pts_lyr – the points vector layer name

  • out_vec_file – the output vector file path

  • out_vec_lyr – the output vector layer name

  • out_format – the output vector format (e.g., GPKG).

  • out_count_col – the output column name (default: n_points)

  • tmp_col_name – The name of a temporary column added to the input layer used to ensure there are no duplicated features in the output layer. The default name is: “tmp_sel_join_fid”.

  • vec_in_epsg – Optionally provide the epsg code for the input vector layer.

  • vec_pts_epsg – Optionally provide the epsg code for the selection vector layer.

rsgislib.vectorattrs.annotate_vec_selection(vec_in_file: str, vec_in_lyr: str, vec_sel_file: str, vec_sel_lyr: str, out_vec_file: str, out_vec_lyr: str, out_col_name: str = 'sel_feats', out_format: str = 'GPKG', tmp_col_name: str = 'tmp_sel_join_fid', vec_in_epsg: int = None, vec_sel_epsg: int = None)

A function which spatial selects features from the input vector layer which intersects the selection vector layer populating a column within the output vector layer specifying which features intersect.

Note, defining epsg codes for the datasets does not reproject the datasets but just makes sure that correct projection is being used.

Parameters:
  • vec_in_file – the input vector file path.

  • vec_in_lyr – the input vector layer name

  • vec_sel_file – the selection vector file path

  • vec_sel_lyr – the selection vector layer name

  • out_vec_file – the output vector file path

  • out_vec_lyr – the output vector layer name

  • out_col_name – the output boolean column specifying those features which intersect with the vec_sel_lyr layer.

  • out_format – the output vector format (e.g., GPKG).

  • tmp_col_name – The name of a temporary column added to the input layer used to ensure there are no duplicated features in the output layer. The default name is: “tmp_sel_join_fid”.

  • vec_in_epsg – Optionally provide the epsg code for the input vector layer.

  • vec_sel_epsg – Optionally provide the epsg code for the selection vector layer.

Export Attribute Table

rsgislib.vectorattrs.export_vec_attrs_to_csv(vec_file: str, vec_lyr: str, output_file: str)

A function which exports the attribute table from a vector layer to a CSV file.

Parameters:
  • vec_file – The input vector file path

  • vec_lyr – The input vector layer name

  • output_file – The output file path.

rsgislib.vectorattrs.export_vec_attrs_to_excel(vec_file: str, vec_lyr: str, output_file: str, out_sheet_name: str = 'Sheet1')

A function which exports the attribute table from a vector layer to a Excel file (*.xlsx).

Parameters:
  • vec_file – The input vector file path

  • vec_lyr – The input vector layer name

  • output_file – The output file path.

rsgislib.vectorattrs.export_vec_attrs_to_parquet(vec_file: str, vec_lyr: str, output_file: str, gzip_output: bool = True)

A function which exports the attribute table from a vector layer to a parquet file.

Parameters:
  • vec_file – The input vector file path

  • vec_lyr – The input vector layer name

  • output_file – The output file path.