matclassification.methods.feature.feature_extraction package

Submodules

matclassification.methods.feature.feature_extraction.pois module

MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining

The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)

Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)

Authors:
  • Tarlis Portela

  • Francisco Vicenzi (adapted)

matclassification.methods.feature.feature_extraction.pois.geoHasTransform(df, geo_precision=8)[source]

Transforms latitude and longitude values into geohash representations.

Parameters:

dfpandas.DataFrame

The DataFrame containing ‘lat’ and ‘lon’ columns.

geo_precisionint, optional

The precision for the geohash transformation. Defaults to 8.

Returns:

list

A list of geohash values corresponding to the latitude and longitude pairs in the DataFrame.

matclassification.methods.feature.feature_extraction.pois.loadTrainTest(features, folder, dataset='')[source]

Loads the training and testing datasets from CSV files, applying necessary transformations.

Parameters:

featureslist of str

The features to load from the datasets.

folderstr

The folder path where the dataset files are located.

datasetstr, optional

The name of the dataset to process (without extension). If empty, default ‘train’ and ‘test’ files are loaded.

Returns:

tuple

A tuple containing the training DataFrame and the testing DataFrame.

matclassification.methods.feature.feature_extraction.pois.npoi(df_train, df_test, possible_sequences, seq2idx, sequence, feature, result_dir=None, tid_col='tid', class_col='label')[source]

Computes Normalized Point of Interest (NPOI) frequency features for training and testing datasets.

Parameters:

df_trainpandas.DataFrame

The training dataset containing trajectory data.

df_testpandas.DataFrame

The testing dataset containing trajectory data.

possible_sequenceslist of tuple

List of possible sequences to consider for feature extraction.

seq2idxdict

A dictionary mapping sequences to their corresponding indices.

sequenceint

The length of the sequences to be considered.

featurestr

The name of the feature to be analyzed in the datasets.

result_dirstr, optional

Directory path to save the results. If None, results will not be saved.

tid_colstr, optional

The name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.

class_colstr, optional

The name of the column representing the class label in the datasets. Defaults to ‘label’.

Returns:

x_trainnumpy.ndarray

A 2D array of shape (number of trajectories, number of possible sequences) containing the normalized POI frequencies for the training set.

x_testnumpy.ndarray

A 2D array of shape (number of trajectories, number of possible sequences) containing the normalized POI frequencies for the testing set.

y_trainnumpy.ndarray

A 1D array of class labels for the training dataset.

y_testnumpy.ndarray

A 1D array of class labels for the testing dataset.

matclassification.methods.feature.feature_extraction.pois.poi(df_train, df_test, possible_sequences, seq2idx, sequence, feature, result_dir=None, tid_col='tid', class_col='label')[source]

Computes Point of Interest (POI) frequency features for training and testing datasets.

Parameters:

df_trainpandas.DataFrame

The training dataset containing trajectory data.

df_testpandas.DataFrame

The testing dataset containing trajectory data.

possible_sequenceslist of tuple

List of possible sequences to consider for feature extraction.

seq2idxdict

A dictionary mapping sequences to their corresponding indices.

sequenceint

The length of the sequences to be considered.

featurestr

The name of the feature to be analyzed in the datasets.

result_dirstr, optional

Directory path to save the results. If None, results will not be saved.

tid_colstr, optional

The name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.

class_colstr, optional

The name of the column representing the class label in the datasets. Defaults to ‘label’.

Returns:

x_trainnumpy.ndarray

A 2D array of shape (number of trajectories, number of possible sequences) containing the POI frequencies for the training set.

x_testnumpy.ndarray

A 2D array of shape (number of trajectories, number of possible sequences) containing the POI frequencies for the testing set.

y_trainnumpy.ndarray

A 1D array of class labels for the training dataset.

y_testnumpy.ndarray

A 1D array of class labels for the testing dataset.

matclassification.methods.feature.feature_extraction.pois.poifreq_all(sequence, dataset, feature, folder, result_dir, tid_col='tid', class_col='label')[source]

Extracts Point of Interest (POI) frequency features for a given dataset and saves the results. For command line use.

Parameters:

sequenceint

The length of the sequences to be considered for POI frequency extraction.

datasetstr

The name of the dataset to process (without extension).

featurestr

The name of the feature to analyze in the dataset.

folderstr

The folder path where the dataset files are located.

result_dirstr

The directory path where results will be saved.

tid_colstr, optional

The name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.

class_colstr, optional

The name of the column representing the class label in the datasets. Defaults to ‘label’.

Returns:

None

matclassification.methods.feature.feature_extraction.pois.pois(df_train, df_test, sequences, features, method='npoi', result_dir='.', save_all=False, tid_col='tid', class_col='label', verbose=True)[source]

Extracts features from the training and testing datasets based on specified sequences and methods (POI, NPOI, WNPOI) for trajectory classification.

Parameters:

df_trainpandas.DataFrame

The training dataset containing trajectory data, including time and location information.

df_testpandas.DataFrame

The testing dataset containing trajectory data for evaluation.

sequenceslist of int

List of integers specifying the sequence lengths to consider for feature extraction.

featureslist of str

List of feature names from the datasets to be used for extraction. If None, the function will automatically determine a feature based on variance.

methodstr, optional

The method to use for feature extraction. Options include: - ‘poi’: Point of Interest frequency. - ‘npoi’: Normalized Point of Interest frequency. - ‘wnpoi’: Weighted Normalized Point of Interest frequency. Defaults to ‘npoi’.

result_dirstr, optional

Directory path where results should be saved. Defaults to the current directory.

save_allbool, optional

If True, all intermediate results will be saved to the specified directory. Defaults to False.

tid_colstr, optional

Name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.

class_colstr, optional

Name of the column representing the class label in the datasets. Defaults to ‘label’.

verbosebool, optional

If True, prints detailed information about the processing steps. Defaults to True.

Returns:

agg_x_trainpandas.DataFrame

A DataFrame containing aggregated features for the training dataset.

agg_x_testpandas.DataFrame

A DataFrame containing aggregated features for the testing dataset.

y_trainnumpy.ndarray

A numpy array containing the labels for the training dataset.

y_testnumpy.ndarray

A numpy array containing the labels for the testing dataset.

core_namestr

A string representing the core name for the generated feature files, based on the selected method and features.

matclassification.methods.feature.feature_extraction.pois.pois_read(sequences, features, method='npoi', dataset='specific', folder='./data', result_dir='.', save_all=False, tid_col='tid', class_col='label')[source]

Reads datasets and applies the POI extraction methods to generate features based on specified sequences. (Wrapper for ‘pois’ method)

Parameters:

sequenceslist of int

A list of sequence lengths to analyze for POI extraction.

featureslist of str

The list of features to analyze from the dataset.

methodstr, optional

The method to use for POI extraction (‘poi’, ‘npoi’, or ‘wnpoi’). Defaults to ‘npoi’.

datasetstr, optional

The name of the dataset to process. Defaults to ‘specific’.

folderstr, optional

The folder path where the dataset files are located. Defaults to ‘./data’.

result_dirstr, optional

The directory path where results will be saved. Defaults to ‘.’.

save_allbool, optional

If True, saves all possible sequences to the result directory. Defaults to False.

tid_colstr, optional

The name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.

class_colstr, optional

The name of the column representing the class label in the datasets. Defaults to ‘label’.

Returns:

tuple

A tuple containing the aggregated training feature matrix, testing feature matrix, training labels, testing labels, and the core name for the processed data.

matclassification.methods.feature.feature_extraction.pois.to_file(core_name, x_train, x_test, y_train, y_test)[source]

Saves the training and testing feature matrices and labels to CSV files.

Parameters:

core_namestr

The base name for the output files.

x_trainnumpy.ndarray

The training feature matrix.

x_testnumpy.ndarray

The testing feature matrix.

y_trainnumpy.ndarray

The training labels.

y_testnumpy.ndarray

The testing labels.

Returns:

None

matclassification.methods.feature.feature_extraction.pois.wnpoi(df_train, df_test, possible_sequences, seq2idx, sequence, feature, result_dir=None, tid_col='tid', class_col='label')[source]

Computes Weighted Normalized Point of Interest (WNPOI) frequency features for training and testing datasets.

Parameters:

df_trainpandas.DataFrame

The training dataset containing trajectory data.

df_testpandas.DataFrame

The testing dataset containing trajectory data.

possible_sequenceslist of tuple

List of possible sequences to consider for feature extraction.

seq2idxdict

A dictionary mapping sequences to their corresponding indices.

sequenceint

The length of the sequences to be considered.

featurestr

The name of the feature to be analyzed in the datasets.

result_dirstr, optional

Directory path to save the results. If None, results will not be saved.

tid_colstr, optional

The name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.

class_colstr, optional

The name of the column representing the class label in the datasets. Defaults to ‘label’.

Returns:

x_trainnumpy.ndarray

A 2D array of shape (number of trajectories, number of possible sequences) containing the weighted normalized POI frequencies for the training set.

x_testnumpy.ndarray

A 2D array of shape (number of trajectories, number of possible sequences) containing the weighted normalized POI frequencies for the testing set.

y_trainnumpy.ndarray

A 1D array of class labels for the training dataset.

y_testnumpy.ndarray

A 1D array of class labels for the testing dataset.

Module contents