matclassification.methods.feature.feature_extraction package
Submodules
matclassification.methods.feature.feature_extraction.pois module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
Francisco Vicenzi (adapted)
- matclassification.methods.feature.feature_extraction.pois.geoHasTransform(df, geo_precision=8)[source]
Transforms latitude and longitude values into geohash representations.
Parameters:
- dfpandas.DataFrame
The DataFrame containing ‘lat’ and ‘lon’ columns.
- geo_precisionint, optional
The precision for the geohash transformation. Defaults to 8.
Returns:
- list
A list of geohash values corresponding to the latitude and longitude pairs in the DataFrame.
- matclassification.methods.feature.feature_extraction.pois.loadTrainTest(features, folder, dataset='')[source]
Loads the training and testing datasets from CSV files, applying necessary transformations.
Parameters:
- featureslist of str
The features to load from the datasets.
- folderstr
The folder path where the dataset files are located.
- datasetstr, optional
The name of the dataset to process (without extension). If empty, default ‘train’ and ‘test’ files are loaded.
Returns:
- tuple
A tuple containing the training DataFrame and the testing DataFrame.
- matclassification.methods.feature.feature_extraction.pois.npoi(df_train, df_test, possible_sequences, seq2idx, sequence, feature, result_dir=None, tid_col='tid', class_col='label')[source]
Computes Normalized Point of Interest (NPOI) frequency features for training and testing datasets.
Parameters:
- df_trainpandas.DataFrame
The training dataset containing trajectory data.
- df_testpandas.DataFrame
The testing dataset containing trajectory data.
- possible_sequenceslist of tuple
List of possible sequences to consider for feature extraction.
- seq2idxdict
A dictionary mapping sequences to their corresponding indices.
- sequenceint
The length of the sequences to be considered.
- featurestr
The name of the feature to be analyzed in the datasets.
- result_dirstr, optional
Directory path to save the results. If None, results will not be saved.
- tid_colstr, optional
The name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.
- class_colstr, optional
The name of the column representing the class label in the datasets. Defaults to ‘label’.
Returns:
- x_trainnumpy.ndarray
A 2D array of shape (number of trajectories, number of possible sequences) containing the normalized POI frequencies for the training set.
- x_testnumpy.ndarray
A 2D array of shape (number of trajectories, number of possible sequences) containing the normalized POI frequencies for the testing set.
- y_trainnumpy.ndarray
A 1D array of class labels for the training dataset.
- y_testnumpy.ndarray
A 1D array of class labels for the testing dataset.
- matclassification.methods.feature.feature_extraction.pois.poi(df_train, df_test, possible_sequences, seq2idx, sequence, feature, result_dir=None, tid_col='tid', class_col='label')[source]
Computes Point of Interest (POI) frequency features for training and testing datasets.
Parameters:
- df_trainpandas.DataFrame
The training dataset containing trajectory data.
- df_testpandas.DataFrame
The testing dataset containing trajectory data.
- possible_sequenceslist of tuple
List of possible sequences to consider for feature extraction.
- seq2idxdict
A dictionary mapping sequences to their corresponding indices.
- sequenceint
The length of the sequences to be considered.
- featurestr
The name of the feature to be analyzed in the datasets.
- result_dirstr, optional
Directory path to save the results. If None, results will not be saved.
- tid_colstr, optional
The name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.
- class_colstr, optional
The name of the column representing the class label in the datasets. Defaults to ‘label’.
Returns:
- x_trainnumpy.ndarray
A 2D array of shape (number of trajectories, number of possible sequences) containing the POI frequencies for the training set.
- x_testnumpy.ndarray
A 2D array of shape (number of trajectories, number of possible sequences) containing the POI frequencies for the testing set.
- y_trainnumpy.ndarray
A 1D array of class labels for the training dataset.
- y_testnumpy.ndarray
A 1D array of class labels for the testing dataset.
- matclassification.methods.feature.feature_extraction.pois.poifreq_all(sequence, dataset, feature, folder, result_dir, tid_col='tid', class_col='label')[source]
Extracts Point of Interest (POI) frequency features for a given dataset and saves the results. For command line use.
Parameters:
- sequenceint
The length of the sequences to be considered for POI frequency extraction.
- datasetstr
The name of the dataset to process (without extension).
- featurestr
The name of the feature to analyze in the dataset.
- folderstr
The folder path where the dataset files are located.
- result_dirstr
The directory path where results will be saved.
- tid_colstr, optional
The name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.
- class_colstr, optional
The name of the column representing the class label in the datasets. Defaults to ‘label’.
Returns:
None
- matclassification.methods.feature.feature_extraction.pois.pois(df_train, df_test, sequences, features, method='npoi', result_dir='.', save_all=False, tid_col='tid', class_col='label', verbose=True)[source]
Extracts features from the training and testing datasets based on specified sequences and methods (POI, NPOI, WNPOI) for trajectory classification.
Parameters:
- df_trainpandas.DataFrame
The training dataset containing trajectory data, including time and location information.
- df_testpandas.DataFrame
The testing dataset containing trajectory data for evaluation.
- sequenceslist of int
List of integers specifying the sequence lengths to consider for feature extraction.
- featureslist of str
List of feature names from the datasets to be used for extraction. If None, the function will automatically determine a feature based on variance.
- methodstr, optional
The method to use for feature extraction. Options include: - ‘poi’: Point of Interest frequency. - ‘npoi’: Normalized Point of Interest frequency. - ‘wnpoi’: Weighted Normalized Point of Interest frequency. Defaults to ‘npoi’.
- result_dirstr, optional
Directory path where results should be saved. Defaults to the current directory.
- save_allbool, optional
If True, all intermediate results will be saved to the specified directory. Defaults to False.
- tid_colstr, optional
Name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.
- class_colstr, optional
Name of the column representing the class label in the datasets. Defaults to ‘label’.
- verbosebool, optional
If True, prints detailed information about the processing steps. Defaults to True.
Returns:
- agg_x_trainpandas.DataFrame
A DataFrame containing aggregated features for the training dataset.
- agg_x_testpandas.DataFrame
A DataFrame containing aggregated features for the testing dataset.
- y_trainnumpy.ndarray
A numpy array containing the labels for the training dataset.
- y_testnumpy.ndarray
A numpy array containing the labels for the testing dataset.
- core_namestr
A string representing the core name for the generated feature files, based on the selected method and features.
- matclassification.methods.feature.feature_extraction.pois.pois_read(sequences, features, method='npoi', dataset='specific', folder='./data', result_dir='.', save_all=False, tid_col='tid', class_col='label')[source]
Reads datasets and applies the POI extraction methods to generate features based on specified sequences. (Wrapper for ‘pois’ method)
Parameters:
- sequenceslist of int
A list of sequence lengths to analyze for POI extraction.
- featureslist of str
The list of features to analyze from the dataset.
- methodstr, optional
The method to use for POI extraction (‘poi’, ‘npoi’, or ‘wnpoi’). Defaults to ‘npoi’.
- datasetstr, optional
The name of the dataset to process. Defaults to ‘specific’.
- folderstr, optional
The folder path where the dataset files are located. Defaults to ‘./data’.
- result_dirstr, optional
The directory path where results will be saved. Defaults to ‘.’.
- save_allbool, optional
If True, saves all possible sequences to the result directory. Defaults to False.
- tid_colstr, optional
The name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.
- class_colstr, optional
The name of the column representing the class label in the datasets. Defaults to ‘label’.
Returns:
- tuple
A tuple containing the aggregated training feature matrix, testing feature matrix, training labels, testing labels, and the core name for the processed data.
- matclassification.methods.feature.feature_extraction.pois.to_file(core_name, x_train, x_test, y_train, y_test)[source]
Saves the training and testing feature matrices and labels to CSV files.
Parameters:
- core_namestr
The base name for the output files.
- x_trainnumpy.ndarray
The training feature matrix.
- x_testnumpy.ndarray
The testing feature matrix.
- y_trainnumpy.ndarray
The training labels.
- y_testnumpy.ndarray
The testing labels.
Returns:
None
- matclassification.methods.feature.feature_extraction.pois.wnpoi(df_train, df_test, possible_sequences, seq2idx, sequence, feature, result_dir=None, tid_col='tid', class_col='label')[source]
Computes Weighted Normalized Point of Interest (WNPOI) frequency features for training and testing datasets.
Parameters:
- df_trainpandas.DataFrame
The training dataset containing trajectory data.
- df_testpandas.DataFrame
The testing dataset containing trajectory data.
- possible_sequenceslist of tuple
List of possible sequences to consider for feature extraction.
- seq2idxdict
A dictionary mapping sequences to their corresponding indices.
- sequenceint
The length of the sequences to be considered.
- featurestr
The name of the feature to be analyzed in the datasets.
- result_dirstr, optional
Directory path to save the results. If None, results will not be saved.
- tid_colstr, optional
The name of the column representing the trajectory ID in the datasets. Defaults to ‘tid’.
- class_colstr, optional
The name of the column representing the class label in the datasets. Defaults to ‘label’.
Returns:
- x_trainnumpy.ndarray
A 2D array of shape (number of trajectories, number of possible sequences) containing the weighted normalized POI frequencies for the training set.
- x_testnumpy.ndarray
A 2D array of shape (number of trajectories, number of possible sequences) containing the weighted normalized POI frequencies for the testing set.
- y_trainnumpy.ndarray
A 1D array of class labels for the training dataset.
- y_testnumpy.ndarray
A 1D array of class labels for the testing dataset.