matclassification.methods.core package
Submodules
matclassification.methods.core.AbstractClassifier module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
- class matclassification.methods.core.AbstractClassifier.AbstractClassifier(name='NN', n_jobs=-1, verbose=0, random_state=42, filterwarnings='ignore')[source]
Bases:
ABC
Simple Abstract Classifier Model.
This abstract class defines the core structure for a machine learning classifier model. It provides common methods such as model creation, and evaluation, while specific implementations of the model must be defined in derived classes by overriding the create() method.
Attributes:
- namestr
Name of the classifier implementation (default: ‘NN’).
- modelobject
The actual machine learning model to be defined in the subclass.
- leobject
Label encoder (optional).
- approachEnum
The category of approach used (default: Approach.NN).
- isverbosebool
Flag to control verbosity of model’s output (default: based on verbose parameter).
- save_resultsbool
Indicates whether to save results (default: False).
- validatebool
Indicates whether validation should be performed (default: False).
- configdict
Dictionary of configuration parameters.
- y_test_truearray-like
True labels for the test dataset (available after call to predict).
- y_test_predarray-like
Predicted labels for the test dataset (available after call to predict).
Parameters:
- namestr, optional
Classifier name (default: ‘NN’).
- n_jobsint, optional
Number of parallel jobs to run (default: -1 for using all processors).
- verboseint, optional
Verbosity level (default: 0).
- random_stateint, optional
Random seed for reproducibility (default: 42).
- filterwarningsstr, optional
Warning filter level (default: ‘ignore’).
Methods:
- add_config(**kwargs):
Updates the configuration with additional parameters.
- grid_search(*args):
Defines the grid of hyperparameters to search over.
- duration():
Returns the duration in milliseconds since the start of model execution.
- message(pbar, text):
Logs a message if verbosity is enabled.
- labels:
Returns the unique labels in the test data.
- create():
Abstract method to be overridden in subclasses to define the model.
- clear():
Clears the model from memory.
- fit(X_train, y_train, X_val, y_val):
Trains the model on the training data and evaluates it on validation data.
- predict(X_test, y_test):
Generates predictions on the test data and returns a performance summary.
- score(y_test, y_pred):
Computes various evaluation metrics (accuracy, precision, recall, F1, etc.) from the true and predicted labels.
- summary():
Returns a summary of the test performance.
- cm():
Plots the confusion matrix for the test results.
- save_model(dir_path=’.’, modelfolder=’model’, model_name=’’):
Saves the trained model to the specified directory.
- save(dir_path=’.’, modelfolder=’model’):
Saves the prediction, classification report, and performance metrics to files.
- property labels
matclassification.methods.core.HSClassifier module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
- class matclassification.methods.core.HSClassifier.HSClassifier(name='NN', save_results=False, n_jobs=-1, verbose=False, random_state=42, filterwarnings='ignore')[source]
Bases:
AbstractClassifier
Hyperparameter Optimization Classifier for Trajectory Input Data.
This class extends AbstractClassifier to include functionality for training and testing machine learning models with hyperparameter optimization. It is designed for trajectory data inputs and handles multiple configurations to find the best-performing model.
Check: help(AbstractClassifier)
Parameters:
- namestr, optional
Classifier name (default: ‘NN’).
- save_resultsbool, optional
Flag to enable saving results to disk (default: False).
- n_jobsint, optional
Number of parallel jobs to run (default: -1 for using all processors).
- verbosebool, optional
Flag for verbosity (default: False).
- random_stateint, optional
Random seed for reproducibility (default: 42).
- filterwarningsstr, optional
Warning filter level (default: ‘ignore’).
Methods:
- train(dir_validation=’.’):
Trains the model using a single hyperparameter configuration. If validation is enabled, it will evaluate on a validation set; otherwise, it evaluates on the test set. Results are optionally saved to a CSV file.
- dir_validationstr, optional
Directory where validation results will be saved (default: current directory).
- pd.DataFrame
A DataFrame containing the training report with evaluation metrics for the model.
- test(rounds=1, dir_evaluation=’.’):
Tests the model over a specified number of rounds, each with a different random seed, to simulate multiple model evaluations.
- roundsint, optional
The number of evaluation rounds (default: 1).
- dir_evaluationstr, optional
Directory where evaluation results will be saved (default: current directory).
- pd.DataFrame, np.array
A DataFrame containing the evaluation report and the predicted labels for the test data.
- test(rounds=1, dir_evaluation='.')[source]
Tests the model in teh simgle best model trained, over a specified number of rounds, each with a different random seeds, to simulate multiple model evaluations.
Parameters:
- roundsint, optional
The number of evaluation rounds (default: 1).
- dir_evaluationstr, optional
Directory where evaluation results will be saved (default: current directory).
Returns:
- pd.DataFrame, np.array
A DataFrame containing the evaluation report and the predicted labels for the test data.
- train(dir_validation='.')[source]
Trains the model using all hyperparameter configurations.
If validation is enabled, it will evaluate on a validation set; otherwise, it evaluates on the test set. Results are optionally saved to a CSV file.
Parameters:
- dir_validationstr, optional
Directory where validation results will be saved (default: current directory).
Returns:
- pd.DataFrame
A DataFrame containing the training report with evaluation metrics for the model.
matclassification.methods.core.MClassifier module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
- class matclassification.methods.core.MClassifier.MClassifier(name='NN', n_jobs=-1, verbose=0, random_state=42, filterwarnings='ignore')[source]
Bases:
AbstractClassifier
Generic Classifier for Movelet Input.
This class extends AbstractClassifier to handle trajectory data in the form of movelets.
Check: help(AbstractClassifier)
Parameters:
- namestr, optional
Name of the classifier model (default: ‘NN’).
- n_jobsint, optional
Number of parallel jobs to run (default: -1 for using all processors).
- verboseint, optional
Level of verbosity (default: 0 for no output).
- random_stateint, optional
Random seed for reproducibility (default: 42).
- filterwarningsstr, optional
Warning filter level (default: ‘ignore’).
- prepare_input(train, test, tid_col='tid', class_col='label', validate=False)[source]
Prepares the input datasets (training, validation, and test) for the classifier by invoking the xy() method, storing the processed data, and setting the classifier configuration.
Parameters:
- trainpd.DataFrame
The training dataset.
- testpd.DataFrame
The test dataset.
- tid_colstr, optional
Column name representing the trajectory ID (default: ‘tid’).
- class_colstr, optional
Column name representing the class label (default: ‘label’).
- validatebool, optional
If True, splits the training data into training and validation sets (default: False)>> #TODO Under Dev.
Returns:
- X_setlist
List containing the feature matrices (training, validation, test).
- y_setlist
List containing the label vectors (training, validation, test).
- num_featuresint
The number of features in the dataset, excluding the class label.
- num_classesint
The number of unique classes in the dataset.
- xy(train, test, tid_col='tid', class_col='label', validate=False, encode_labels=True)[source]
Prepares the feature and label data for the classifier by splitting the training set, encoding labels, and scaling the features.
Parameters:
- trainpd.DataFrame
The training dataset.
- testpd.DataFrame
The test dataset.
- tid_colstr, optional
Column name representing the trajectory ID (default: ‘tid’).
- class_colstr, optional
Column name representing the class label (default: ‘label’).
- validatebool, optional
If True, splits the training data into training and validation sets (default: False) >> #TODO Under Dev.
- encode_labelsbool, optional
If True, encodes the labels using LabelEncoder and one-hot encoding (default: True).
Returns:
- num_classesint
Number of unique class labels.
- num_featuresint
Number of features in the dataset excluding the class label.
- leLabelEncoder or None
LabelEncoder instance used to transform the class labels, if encode_labels is True.
- X_setlist
List containing the feature matrices (training, validation, test).
- y_setlist
List containing the encoded labels (training, validation, test).
matclassification.methods.core.MHSClassifier module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
- class matclassification.methods.core.MHSClassifier.MHSClassifier(name='NN', save_results=False, n_jobs=-1, verbose=False, random_state=42, filterwarnings='ignore')[source]
Bases:
HSClassifier
,MClassifier
Movelet Hyperparameter Optimization Classifier.
The MHSClassifier is a hybrid classifier that integrates hyperparameter optimization (inherited from HSClassifier) with data preprocessing and feature preparation (inherited from MClassifier) for movelet input. It is specifically designed to handle movelet-based or feature-based trajectory data input and supports hyperparameter tuning across different configurations.
Attributes:
- save_resultsbool
If True, saves the training and evaluation results (default: False).
Parameters:
- namestr, optional
Name of the classifier model (default: ‘NN’).
- save_resultsbool, optional
Flag to indicate whether results should be saved (default: False).
- n_jobsint, optional
Number of parallel jobs to run (default: -1 for using all processors).
- verbosebool, optional
Flag to control verbosity of the model output (default: False).
- random_stateint, optional
Random seed for reproducibility (default: 42).
- filterwarningsstr, optional
Warning filter level (default: ‘ignore’).
Methods:
Inherits methods from both HSClassifier and MClassifier for training, testing, and handling data.
matclassification.methods.core.SimilarityClassifier module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
- class matclassification.methods.core.SimilarityClassifier.SimilarityClassifier(name, save_results=False, n_jobs=-1, verbose=False, random_state=42, filterwarnings='ignore')[source]
Bases:
THSClassifier
A similarity-based classifier for trajectory data, leveraging hyperparameter optimization and multiple similarity metrics.
Check: help(AbstractClassifier) and help(THSClassifier)
Parameters:
- namestr
Name of the classifier model.
- save_resultsbool, optional (default=False)
Whether to save the results of the classification.
- n_jobsint, optional (default=-1)
The number of parallel jobs to run for computation. -1 means using all processors.
- verbosebool, optional (default=False)
Verbosity mode. If True, enables detailed output.
- random_stateint, optional (default=42)
Random seed used for reproducibility.
- filterwarningsstr, optional (default=’ignore’)
Warning filter setting. Used to control warnings generated by the model.
matclassification.methods.core.THSClassifier module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
- class matclassification.methods.core.THSClassifier.THSClassifier(name='NN', save_results=False, n_jobs=-1, verbose=False, random_state=42, filterwarnings='ignore')[source]
Bases:
HSClassifier
A hyperparameter optimization classifier for trajectory data, leveraging similarity measures and support for geospatial data encoding (Geohash or IndexGrid).
#TODO Geohash and IndexGrid encoding and testing
Check: help(AbstractClassifier) and help(HSClassifier)
Parameters:
- namestr
Name of the classifier model.
- save_resultsbool, optional (default=False)
Whether to save the results of the classification.
- n_jobsint, optional (default=-1)
The number of parallel jobs to run for computation. -1 means using all processors.
- verbosebool, optional (default=False)
Verbosity mode. If True, enables detailed output.
- random_stateint, optional (default=42)
Random seed used for reproducibility.
- filterwarningsstr, optional (default=’ignore’)
Warning filter setting to control output warnings.
- prepare_input(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, validate=False)[source]
- test(rounds=1, dir_evaluation='.')[source]
Tests the model in teh simgle best model trained, over a specified number of rounds, each with a different random seeds, to simulate multiple model evaluations.
Parameters:
- roundsint, optional
The number of evaluation rounds (default: 1).
- dir_evaluationstr, optional
Directory where evaluation results will be saved (default: current directory).
Returns:
- pd.DataFrame, np.array
A DataFrame containing the evaluation report and the predicted labels for the test data.
- train(dir_validation='.')[source]
Trains the model using all hyperparameter configurations.
If validation is enabled, it will evaluate on a validation set; otherwise, it evaluates on the test set. Results are optionally saved to a CSV file.
Parameters:
- dir_validationstr, optional
Directory where validation results will be saved (default: current directory).
Returns:
- pd.DataFrame
A DataFrame containing the training report with evaluation metrics for the model.