matclassification.methods.core package

Submodules

matclassification.methods.core.AbstractClassifier module

MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining

The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)

Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)

Authors:
  • Tarlis Portela

class matclassification.methods.core.AbstractClassifier.AbstractClassifier(name='NN', n_jobs=-1, verbose=0, random_state=42, filterwarnings='ignore')[source]

Bases: ABC

Simple Abstract Classifier Model.

This abstract class defines the core structure for a machine learning classifier model. It provides common methods such as model creation, and evaluation, while specific implementations of the model must be defined in derived classes by overriding the create() method.

Attributes:

namestr

Name of the classifier implementation (default: ‘NN’).

modelobject

The actual machine learning model to be defined in the subclass.

leobject

Label encoder (optional).

approachEnum

The category of approach used (default: Approach.NN).

isverbosebool

Flag to control verbosity of model’s output (default: based on verbose parameter).

save_resultsbool

Indicates whether to save results (default: False).

validatebool

Indicates whether validation should be performed (default: False).

configdict

Dictionary of configuration parameters.

y_test_truearray-like

True labels for the test dataset (available after call to predict).

y_test_predarray-like

Predicted labels for the test dataset (available after call to predict).

Parameters:

namestr, optional

Classifier name (default: ‘NN’).

n_jobsint, optional

Number of parallel jobs to run (default: -1 for using all processors).

verboseint, optional

Verbosity level (default: 0).

random_stateint, optional

Random seed for reproducibility (default: 42).

filterwarningsstr, optional

Warning filter level (default: ‘ignore’).

Methods:

add_config(**kwargs):

Updates the configuration with additional parameters.

grid_search(*args):

Defines the grid of hyperparameters to search over.

duration():

Returns the duration in milliseconds since the start of model execution.

message(pbar, text):

Logs a message if verbosity is enabled.

labels:

Returns the unique labels in the test data.

create():

Abstract method to be overridden in subclasses to define the model.

clear():

Clears the model from memory.

fit(X_train, y_train, X_val, y_val):

Trains the model on the training data and evaluates it on validation data.

predict(X_test, y_test):

Generates predictions on the test data and returns a performance summary.

score(y_test, y_pred):

Computes various evaluation metrics (accuracy, precision, recall, F1, etc.) from the true and predicted labels.

summary():

Returns a summary of the test performance.

cm():

Plots the confusion matrix for the test results.

save_model(dir_path=’.’, modelfolder=’model’, model_name=’’):

Saves the trained model to the specified directory.

save(dir_path=’.’, modelfolder=’model’):

Saves the prediction, classification report, and performance metrics to files.

add_config(**kwargs)[source]
classification_report()[source]
clear()[source]
cm()[source]
abstract create()[source]
duration()[source]
fit(X_train, y_train, X_val, y_val)[source]
property labels
message(pbar, text)[source]
predict(X_test, y_test)[source]
prediction_report()[source]
save(dir_path='.', modelfolder='model')[source]
save_model(dir_path='.', modelfolder='model', model_name='')[source]
score(y_test, y_pred)[source]
summary()[source]
testing_report()[source]
training_report()[source]
class matclassification.methods.core.AbstractClassifier.Approach(value)[source]

Bases: Enum

An enumeration.

DT = 3
NN = 1
RF = 2
SVM = 4
XGB = 5

matclassification.methods.core.HSClassifier module

MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining

The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)

Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)

Authors:
  • Tarlis Portela

class matclassification.methods.core.HSClassifier.HSClassifier(name='NN', save_results=False, n_jobs=-1, verbose=False, random_state=42, filterwarnings='ignore')[source]

Bases: AbstractClassifier

Hyperparameter Optimization Classifier for Trajectory Input Data.

This class extends AbstractClassifier to include functionality for training and testing machine learning models with hyperparameter optimization. It is designed for trajectory data inputs and handles multiple configurations to find the best-performing model.

Check: help(AbstractClassifier)

Parameters:

namestr, optional

Classifier name (default: ‘NN’).

save_resultsbool, optional

Flag to enable saving results to disk (default: False).

n_jobsint, optional

Number of parallel jobs to run (default: -1 for using all processors).

verbosebool, optional

Flag for verbosity (default: False).

random_stateint, optional

Random seed for reproducibility (default: 42).

filterwarningsstr, optional

Warning filter level (default: ‘ignore’).

Methods:

train(dir_validation=’.’):

Trains the model using a single hyperparameter configuration. If validation is enabled, it will evaluate on a validation set; otherwise, it evaluates on the test set. Results are optionally saved to a CSV file.

dir_validationstr, optional

Directory where validation results will be saved (default: current directory).

pd.DataFrame

A DataFrame containing the training report with evaluation metrics for the model.

test(rounds=1, dir_evaluation=’.’):

Tests the model over a specified number of rounds, each with a different random seed, to simulate multiple model evaluations.

roundsint, optional

The number of evaluation rounds (default: 1).

dir_evaluationstr, optional

Directory where evaluation results will be saved (default: current directory).

pd.DataFrame, np.array

A DataFrame containing the evaluation report and the predicted labels for the test data.

test(rounds=1, dir_evaluation='.')[source]

Tests the model in teh simgle best model trained, over a specified number of rounds, each with a different random seeds, to simulate multiple model evaluations.

Parameters:

roundsint, optional

The number of evaluation rounds (default: 1).

dir_evaluationstr, optional

Directory where evaluation results will be saved (default: current directory).

Returns:

pd.DataFrame, np.array

A DataFrame containing the evaluation report and the predicted labels for the test data.

train(dir_validation='.')[source]

Trains the model using all hyperparameter configurations.

If validation is enabled, it will evaluate on a validation set; otherwise, it evaluates on the test set. Results are optionally saved to a CSV file.

Parameters:

dir_validationstr, optional

Directory where validation results will be saved (default: current directory).

Returns:

pd.DataFrame

A DataFrame containing the training report with evaluation metrics for the model.

matclassification.methods.core.MClassifier module

MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining

The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)

Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)

Authors:
  • Tarlis Portela

class matclassification.methods.core.MClassifier.MClassifier(name='NN', n_jobs=-1, verbose=0, random_state=42, filterwarnings='ignore')[source]

Bases: AbstractClassifier

Generic Classifier for Movelet Input.

This class extends AbstractClassifier to handle trajectory data in the form of movelets.

Check: help(AbstractClassifier)

Parameters:

namestr, optional

Name of the classifier model (default: ‘NN’).

n_jobsint, optional

Number of parallel jobs to run (default: -1 for using all processors).

verboseint, optional

Level of verbosity (default: 0 for no output).

random_stateint, optional

Random seed for reproducibility (default: 42).

filterwarningsstr, optional

Warning filter level (default: ‘ignore’).

prepare_input(train, test, tid_col='tid', class_col='label', validate=False)[source]

Prepares the input datasets (training, validation, and test) for the classifier by invoking the xy() method, storing the processed data, and setting the classifier configuration.

Parameters:

trainpd.DataFrame

The training dataset.

testpd.DataFrame

The test dataset.

tid_colstr, optional

Column name representing the trajectory ID (default: ‘tid’).

class_colstr, optional

Column name representing the class label (default: ‘label’).

validatebool, optional

If True, splits the training data into training and validation sets (default: False)>> #TODO Under Dev.

Returns:

X_setlist

List containing the feature matrices (training, validation, test).

y_setlist

List containing the label vectors (training, validation, test).

num_featuresint

The number of features in the dataset, excluding the class label.

num_classesint

The number of unique classes in the dataset.

xy(train, test, tid_col='tid', class_col='label', validate=False, encode_labels=True)[source]

Prepares the feature and label data for the classifier by splitting the training set, encoding labels, and scaling the features.

Parameters:

trainpd.DataFrame

The training dataset.

testpd.DataFrame

The test dataset.

tid_colstr, optional

Column name representing the trajectory ID (default: ‘tid’).

class_colstr, optional

Column name representing the class label (default: ‘label’).

validatebool, optional

If True, splits the training data into training and validation sets (default: False) >> #TODO Under Dev.

encode_labelsbool, optional

If True, encodes the labels using LabelEncoder and one-hot encoding (default: True).

Returns:

num_classesint

Number of unique class labels.

num_featuresint

Number of features in the dataset excluding the class label.

leLabelEncoder or None

LabelEncoder instance used to transform the class labels, if encode_labels is True.

X_setlist

List containing the feature matrices (training, validation, test).

y_setlist

List containing the encoded labels (training, validation, test).

matclassification.methods.core.MHSClassifier module

MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining

The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)

Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)

Authors:
  • Tarlis Portela

class matclassification.methods.core.MHSClassifier.MHSClassifier(name='NN', save_results=False, n_jobs=-1, verbose=False, random_state=42, filterwarnings='ignore')[source]

Bases: HSClassifier, MClassifier

Movelet Hyperparameter Optimization Classifier.

The MHSClassifier is a hybrid classifier that integrates hyperparameter optimization (inherited from HSClassifier) with data preprocessing and feature preparation (inherited from MClassifier) for movelet input. It is specifically designed to handle movelet-based or feature-based trajectory data input and supports hyperparameter tuning across different configurations.

Attributes:

save_resultsbool

If True, saves the training and evaluation results (default: False).

Parameters:

namestr, optional

Name of the classifier model (default: ‘NN’).

save_resultsbool, optional

Flag to indicate whether results should be saved (default: False).

n_jobsint, optional

Number of parallel jobs to run (default: -1 for using all processors).

verbosebool, optional

Flag to control verbosity of the model output (default: False).

random_stateint, optional

Random seed for reproducibility (default: 42).

filterwarningsstr, optional

Warning filter level (default: ‘ignore’).

Methods:

Inherits methods from both HSClassifier and MClassifier for training, testing, and handling data.

matclassification.methods.core.SimilarityClassifier module

MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining

The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)

Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)

Authors:
  • Tarlis Portela

class matclassification.methods.core.SimilarityClassifier.SimilarityClassifier(name, save_results=False, n_jobs=-1, verbose=False, random_state=42, filterwarnings='ignore')[source]

Bases: THSClassifier

A similarity-based classifier for trajectory data, leveraging hyperparameter optimization and multiple similarity metrics.

Check: help(AbstractClassifier) and help(THSClassifier)

Parameters:

namestr

Name of the classifier model.

save_resultsbool, optional (default=False)

Whether to save the results of the classification.

n_jobsint, optional (default=-1)

The number of parallel jobs to run for computation. -1 means using all processors.

verbosebool, optional (default=False)

Verbosity mode. If True, enables detailed output.

random_stateint, optional (default=42)

Random seed used for reproducibility.

filterwarningsstr, optional (default=’ignore’)

Warning filter setting. Used to control warnings generated by the model.

default_metric(dataset_descriptor)[source]
fit(X_train, y_train, X_val=None, y_val=None, config=None)[source]
predict(X_test, y_test)[source]
prepare_input(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, validate=False, metric=None, dataset_descriptor=None, inverse=True)[source]
xy(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, validate=False, metric=None, dataset_descriptor=None, inverse=True)[source]
matclassification.methods.core.SimilarityClassifier.similarity_matrix(A, B=None, measure=None, n_jobs=1)[source]

matclassification.methods.core.THSClassifier module

MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining

The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)

Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)

Authors:
  • Tarlis Portela

class matclassification.methods.core.THSClassifier.THSClassifier(name='NN', save_results=False, n_jobs=-1, verbose=False, random_state=42, filterwarnings='ignore')[source]

Bases: HSClassifier

A hyperparameter optimization classifier for trajectory data, leveraging similarity measures and support for geospatial data encoding (Geohash or IndexGrid).

#TODO Geohash and IndexGrid encoding and testing

Check: help(AbstractClassifier) and help(HSClassifier)

Parameters:

namestr

Name of the classifier model.

save_resultsbool, optional (default=False)

Whether to save the results of the classification.

n_jobsint, optional (default=-1)

The number of parallel jobs to run for computation. -1 means using all processors.

verbosebool, optional (default=False)

Verbosity mode. If True, enables detailed output.

random_stateint, optional (default=42)

Random seed used for reproducibility.

filterwarningsstr, optional (default=’ignore’)

Warning filter setting to control output warnings.

fit(X_train, y_train, X_val, y_val, config=None)[source]
message(pbar, text)[source]
predict(X_test, y_test)[source]
prepare_input(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, validate=False)[source]
read_report(filename, prefix='')[source]
test(rounds=1, dir_evaluation='.')[source]

Tests the model in teh simgle best model trained, over a specified number of rounds, each with a different random seeds, to simulate multiple model evaluations.

Parameters:

roundsint, optional

The number of evaluation rounds (default: 1).

dir_evaluationstr, optional

Directory where evaluation results will be saved (default: current directory).

Returns:

pd.DataFrame, np.array

A DataFrame containing the evaluation report and the predicted labels for the test data.

train(dir_validation='.')[source]

Trains the model using all hyperparameter configurations.

If validation is enabled, it will evaluate on a validation set; otherwise, it evaluates on the test set. Results are optionally saved to a CSV file.

Parameters:

dir_validationstr, optional

Directory where validation results will be saved (default: current directory).

Returns:

pd.DataFrame

A DataFrame containing the training report with evaluation metrics for the model.

xy(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, validate=False)[source]

Module contents