matclassification.methods.feature package
Subpackages
Submodules
matclassification.methods.feature.MoveletDT module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
Carlos Andrés Ferrero (adapted)
- class matclassification.methods.feature.MoveletDT.MDT(n_jobs=-1, verbose=2, random_state=42, filterwarnings='ignore')[source]
Bases:
MHSClassifier
Movelet Decision Tree (MDT) Classifier, extending MHSClassifier, designed for movelet-based classification using decision trees. Provides tools for decision tree visualization and manipulation.
Parameters:
- n_jobsint, optional (default=-1)
The number of parallel jobs to run for computation. -1 means using all processors.
- verboseint, optional (default=2)
Verbosity level. Higher values enable more detailed output during training and model creation.
- random_stateint, optional (default=42)
Random seed used for reproducibility.
- filterwarningsstr, optional (default=’ignore’)
Controls the filter for output warnings.
Methods:
- prepare_input(train, test, tid_col=’tid’, class_col=’label’, geo_precision=30, validate=False):
Prepares the input data by extracting the features and labels, and configures the trajectory classification process.
- create():
Initializes and returns a new instance of the decision tree classifier.
- fit(X_train, y_train, X_val, y_val):
Trains the decision tree classifier on the training data and evaluates it on the validation data. Returns a report on the validation performance.
- plot_tree(figsize=(20, 10)):
Visualizes the trained decision tree using matplotlib, showing features and tree structure in a user-defined size.
- graph_tree():
Generates a visual graph of the decision tree using Graphviz, returning the tree structure as a graph.
- prepare_input(train, test, tid_col='tid', class_col='label', geo_precision=30, validate=False)[source]
Prepares the input datasets (training, validation, and test) for the classifier by invoking the xy() method, storing the processed data, and setting the classifier configuration.
Parameters:
- trainpd.DataFrame
The training dataset.
- testpd.DataFrame
The test dataset.
- tid_colstr, optional
Column name representing the trajectory ID (default: ‘tid’).
- class_colstr, optional
Column name representing the class label (default: ‘label’).
- validatebool, optional
If True, splits the training data into training and validation sets (default: False)>> #TODO Under Dev.
Returns:
- X_setlist
List containing the feature matrices (training, validation, test).
- y_setlist
List containing the label vectors (training, validation, test).
- num_featuresint
The number of features in the dataset, excluding the class label.
- num_classesint
The number of unique classes in the dataset.
matclassification.methods.feature.MoveletMLP module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
Carlos Andrés Ferrero (adapted)
- class matclassification.methods.feature.MoveletMLP.MMLP(num_features=-1, num_classes=-1, par_dropout=0.5, par_batch_size=200, lst_par_epochs_lr=[[80, 0.00095], [50, 0.00075], [50, 0.00055], [30, 0.00025], [20, 0.00015]], n_jobs=-1, verbose=2, random_state=42, filterwarnings='ignore')[source]
Bases:
MHSClassifier
Movelet Multi-layer Perceptron (MMLP) Classifier, extending MHSClassifier, designed for movelet-based classification using a neural network with configurable layers, dropout, and learning rates.
Parameters:
- num_featuresint, optional (default=-1)
Number of input features for the neural network.
- num_classesint, optional (default=-1)
Number of output classes for classification.
- par_dropoutfloat, optional (default=0.5)
Dropout rate for regularization in the neural network.
- par_batch_sizeint, optional (default=200)
Batch size used during training.
- lst_par_epochs_lrlist of tuples, optional (default=[(80, 0.00095), (50, 0.00075), (50, 0.00055), (30, 0.00025), (20, 0.00015)])
A list where each element is a tuple containing the number of epochs and the learning rate for each stage of training.
- n_jobsint, optional (default=-1)
The number of parallel jobs to run for computation. -1 means using all processors.
- verboseint, optional (default=2)
Verbosity level. Higher values enable more detailed output during training.
- random_stateint, optional (default=42)
Random seed used for reproducibility.
- filterwarningsstr, optional (default=’ignore’)
Controls the filter for output warnings.
Methods:
- create():
Builds and returns a multi-layer perceptron model with specified input features, output classes, and dropout rate.
- fit(X_train, y_train, X_val, y_val):
Trains the MLP model on the training data with configurable epochs and learning rates, and evaluates it on the validation data. Returns a report on the training history.
- predict(X_test, y_test):
Predicts the output class probabilities for the test set, returning the evaluation summary and predicted probabilities.
- class matclassification.methods.feature.MoveletMLP.MMLP1(num_features=-1, num_classes=-1, par_dropout=0.5, par_batch_size=200, par_epochs=80, par_lr=0.00095, n_jobs=-1, verbose=2, random_state=42, filterwarnings='ignore')[source]
Bases:
MHSClassifier
matclassification.methods.feature.MoveletRF module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
Carlos Andrés Ferrero (adapted)
- class matclassification.methods.feature.MoveletRF.MRF(n_estimators=[300], n_jobs=-1, verbose=0, random_state=42, filterwarnings='ignore')[source]
Bases:
MHSClassifier
Movelet Random Forest (MRF) Classifier, extending MHSClassifier, designed for movelet-based classification using a Random Forest model with multiple configurations for the number of trees.
Parameters:
- n_estimatorslist of int, optional (default=[300])
A list specifying the different number of trees (estimators) to be used in the Random Forest.
- n_jobsint, optional (default=-1)
The number of parallel jobs to run for computation. -1 means using all processors.
- verboseint, optional (default=0)
Verbosity level for logging and output during the model’s training process.
- random_stateint, optional (default=42)
Random seed for reproducibility of results.
- filterwarningsstr, optional (default=’ignore’)
Controls the filter for output warnings.
Methods:
- create(n_tree=None):
Creates and returns a Random Forest model with a specified number of trees (n_tree). If not provided, defaults to the configuration’s ‘n_tree’.
- fit(X_train, y_train, X_val, y_val):
Fits the Random Forest model on the training data, evaluates it on the validation data, and logs the performance for each configuration of n_estimators. Returns a report on the evaluation metrics.
matclassification.methods.feature.MoveletRFHP module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
Carlos Andrés Ferrero (adapted)
- class matclassification.methods.feature.MoveletRFHP.MRFHP(n_estimators=[300, 350, 400, 450, 500, 550, 600], max_features=['auto', 'sqrt'], max_depth=[30], min_samples_split=[2, 4, 6], min_samples_leaf=[2, 3, 4], bootstrap=[True, False], criterion=['entropy', 'gini'], n_jobs=-1, verbose=2, random_state=42, filterwarnings='ignore')[source]
Bases:
MHSClassifier
Movelet Random Forest with Hyperparameter Optimization (MRFHP) Classifier, extending MHSClassifier, designed to optimize hyperparameters for movelet-based classification using RandomizedSearchCV.
Parameters:
- n_estimatorslist of int, optional (default=[300, 350, 400, 450, 500, 550, 600])
A list of the number of trees (estimators) to be used in the Random Forest.
- max_featureslist of str, optional (default=[‘auto’, ‘sqrt’])
Number of features to consider at every split. Options include ‘auto’, ‘sqrt’, etc.
- max_depthlist of int, optional (default=[30])
Maximum number of levels in each decision tree. The default is 30, and None is added to allow full growth.
- min_samples_splitlist of int, optional (default=[2, 4, 6])
The minimum number of samples required to split a node.
- min_samples_leaflist of int, optional (default=[2, 3, 4])
The minimum number of samples required to be at a leaf node.
- bootstraplist of bool, optional (default=[True, False])
Whether to bootstrap samples when building trees.
- criterionlist of str, optional (default=[‘entropy’, ‘gini’])
The function to measure the quality of a split. Options are ‘gini’ or ‘entropy’.
- n_jobsint, optional (default=-1)
The number of parallel jobs to run for computation. -1 means using all processors.
- verboseint, optional (default=2)
Verbosity level for logging and output during model training and hyperparameter tuning.
- random_stateint, optional (default=42)
Seed used by the random number generator to ensure reproducibility.
- filterwarningsstr, optional (default=’ignore’)
Controls the filter for output warnings.
Methods:
- create():
Creates a Random Forest model wrapped with RandomizedSearchCV to search for optimal hyperparameters.
- fit(X_train, y_train, X_val, y_val):
Fits the model using the training data and performs hyperparameter optimization using cross-validation. Returns a report on the search results and sets the model to the best estimator.
matclassification.methods.feature.MoveletSVC module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
Carlos Andrés Ferrero (adapted)
- class matclassification.methods.feature.MoveletSVC.MSVC(kernel='linear', probability=True, n_jobs=-1, verbose=2, random_state=42, filterwarnings='ignore')[source]
Bases:
MHSClassifier
Movelet Support Vector Classifier (MSVC) extending MHSClassifier, designed for movelet-based classification using Support Vector Machines (SVM).
Parameters:
- kernelstr, optional (default=”linear”)
Specifies the kernel type to be used in the algorithm. Options include “linear”, “poly”, “rbf”, “sigmoid”, etc.
- probabilitybool, optional (default=True)
Enables probability estimates for classification, which allows predict_proba() to be used.
- n_jobsint, optional (default=-1)
The number of parallel jobs to run for computation. This parameter is not directly used in SVC but passed for consistency.
- verboseint, optional (default=2)
Controls the verbosity of logging during model training.
- random_stateint, optional (default=42)
Seed used by the random number generator to ensure reproducibility.
- filterwarningsstr, optional (default=’ignore’)
Controls the filter for output warnings.
Methods:
- create():
Creates an SVM model with the specified kernel and probability configuration.
- fit(X_train, y_train, X_val, y_val):
Fits the model using the training data and returns a report on the classification results. It uses predict_proba() for predictions.
- predict(X_test, y_test):
Predicts the probabilities for the test data and returns a summary of the results, including true and predicted class labels.
matclassification.methods.feature.POIS module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
Francisco Vicenzi (adapted)
- class matclassification.methods.feature.POIS.POIS(method='npoi', sequences=[1, 2, 3], features=None, n_jobs=-1, verbose=True, random_state=42, filterwarnings='ignore')[source]
Bases:
HSClassifier
POIS: Point of Interest Sequence Feature Exctractor and Classifier.
This class implements a trajectory classifier based on the POI-F/POIS approach, which considers the frequency of visits to Points of Interest (POIs). It has been extended to concatenate sequences of POIs, which allows classification based on patterns in POI sequences.
POI Frequency types: (i) poi: POI frequency (ii) npoi: Normalized POI frequency (iii) wnpoi: Weighted Normalized POI frequency
Parameters:
- methodstr, optional (default=’npoi’)
The method used to compute POI frequencies. Options include ‘poi’, ‘npoi’, and ‘wnpoi’.
- sequenceslist of int, optional (default=[1, 2, 3])
Defines the length of the sequences of POIs used for classification.
- featureslist, optional
Specifies which features from the dataset to use. If None, choose the feature with higher variance.
- n_jobsint, optional (default=-1)
The number of parallel jobs to run for computation.
- verbosebool, optional (default=True)
Controls verbosity of logging during model training.
- random_stateint, optional (default=42)
Seed used by the random number generator to ensure reproducibility.
- filterwarningsstr, optional (default=’ignore’)
Controls the filter for output warnings.
Methods:
- xy(train, test, tid_col, class_col, geo_precision, validate, res_path):
Prepares the data for training and testing by computing the POI sequences and extracting features for classification.
- prepare_input(train, test, tid_col, class_col, geo_precision, validate, res_path):
Prepares and splits the input data into training, validation, and testing sets. Adds the necessary configuration details.
- create():
Initializes a neural network model with two layers: a hidden layer with 100 units and a softmax output layer for classification.
- fit(X_train, y_train, X_val, y_val, save_results, res_path):
Trains the model using the training data. Optionally saves the results.
- predict(X_test, y_test):
Predicts the labels for the test data and returns the classification report.
- matclassification.methods.feature.POIS.loadData(dir_path)[source]
Loads training and testing datasets from CSV files.
Parameters:
- dir_pathstr
The directory path (without file extension) from which to load the datasets. It expects the following CSV files: - ‘{dir_path}-x_train.csv’: Features for training data. - ‘{dir_path}-y_train.csv’: Labels for training data. - ‘{dir_path}-x_test.csv’: Features for testing data. - ‘{dir_path}-y_test.csv’: Labels for testing data.
Returns:
- x_trainpandas.DataFrame
A DataFrame containing the features for the training dataset.
- x_testpandas.DataFrame
A DataFrame containing the features for the testing dataset.
- y_trainnumpy.ndarray
A numpy array containing the labels for the training dataset.
- y_testnumpy.ndarray
A numpy array containing the labels for the testing dataset.
- matclassification.methods.feature.POIS.prepareData(x_train, x_test, y_train, y_test, validate=False, random_state=42)[source]
Prepares the dataset for training, testing, and optional validation (#TODO) for POIS.
Parameters:
- x_trainpandas.DataFrame or numpy.ndarray
Feature set for the training data.
- x_testpandas.DataFrame or numpy.ndarray
Feature set for the test data.
- y_trainpandas.Series or numpy.ndarray
Labels for the training data.
- y_testpandas.Series or numpy.ndarray
Labels for the test data.
- validatebool, optional (default=False)
If True, splits the training data into training and validation sets. Validation handling is currently not implemented.
- random_stateint, optional (default=42)
Random seed used for reproducibility when splitting data.
Returns:
- num_featuresint
The number of features in the dataset.
- num_classesint
The number of unique classes in the target labels.
- labelsnumpy.ndarray
An array of the unique class labels.
- Xlist
A list containing feature sets. If validate is False, returns [X_train, X_test]. Otherwise, returns [X_train, X_val, X_test].
- ylist
A list containing one-hot encoded target sets. If validate is False, returns [y_train, y_test]. Otherwise, returns [y_train, y_val, y_test].
- one_hot_encodersklearn.preprocessing.OneHotEncoder
The fitted OneHotEncoder object used to encode the target labels.