matclassification.methods.mat package
Submodules
matclassification.methods.mat.Bituler module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (this portion of code is subject to licensing from source project distribution)
- Authors:
Tarlis Portela
- Original source:
Nicksson C. A. de Freitas,
Ticiana L. Coelho da Silva,
Jose António Fernandes de Macêdo,
Leopoldo Melo Junior,
Matheus Gomes Cordeiro
Adapted from: https://github.com/nickssonfreitas/ICAART2021
- class matclassification.methods.mat.Bituler.Bituler(rnn=['bilstm'], units=[100, 200, 250, 300], stack=[1], dropout=[0.5], embedding_size=[100, 200, 300, 400], batch_size=[64], epochs=[1000], patience=[20], monitor=['val_acc'], optimizer=['ada'], learning_rate=[0.001], save_results=False, n_jobs=-1, verbose=0, random_state=42, filterwarnings='ignore')[source]
Bases:
THSClassifier
Gao et al. (2017) proposed BiTULER, a model that uses word embeddings and a Bidirectional Recurrent Neural Network, but it is limited to the sequence of check-in identifiers, not supporting other dimensions.
Bituler is a trajectory classification model that extends the THSClassifier. It is designed for handling multiple aspect trajectory data, utilizing deep learning techniques such as Recurrent Neural Networks (RNNs) for feature extraction and classification. The model can be configured with various neural network architectures, embedding sizes, and optimization settings.
- Parameters:
rnn (list, optional) – List of RNN architectures to use. Currently, only ‘bilstm’ is supported (default: [‘bilstm’]).
units (list, optional) – List of integers specifying the number of hidden units in each layer (default: [100, 200, 250, 300]).
stack (list, optional) – List of integers specifying the number of RNN layers to stack (default: [1]).
dropout (list, optional) – List of floats specifying the dropout rate for regularization (default: [0.5]).
embedding_size (list, optional) – List of integers specifying the size of the embedding layer (default: [100, 200, 300, 400]).
batch_size (list, optional) – List of batch sizes for training (default: [64]).
epochs (list, optional) – List of the number of epochs for training (default: [1000]).
patience (list, optional) – List of integers specifying the number of epochs to wait for early stopping (default: [20], currently unused).
monitor (list, optional) – List of metrics to monitor for early stopping (default: [‘val_acc’]).
optimizer (list, optional) – List of optimizers to use during training (default: [‘ada’]).
learning_rate (list, optional) – List of learning rates for the optimizer (default: [0.001]).
save_results (bool, optional) – If True, saves the results of the training process (default: False).
n_jobs (int, optional) – Number of parallel jobs to run (default: -1).
verbose (int, optional) – Verbosity level of the training process (default: 0).
random_state (int, optional) – Seed for random number generation (default: 42).
filterwarnings (str, optional) – Configures warning filtering (default: ‘ignore’).
- xy(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, features=['poi'], validate=False)[source]
Prepares the trajectory data for model training and testing.
- prepare_input(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, features=['poi'], validate=False)[source]
Prepares the input data by configuring the model parameters and splitting the data into training and testing sets.
- create(config)[source]
Creates and returns the RNN model architecture based on the provided configuration.
- fit(X_train, y_train, X_val, y_val, config=None)[source]
Trains the model on the provided training data and evaluates it on the validation data.
matclassification.methods.mat.DeepeST module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (this portion of code is subject to licensing from source project distribution)
- Authors:
Tarlis Portela
- Original source:
Nicksson C. A. de Freitas,
Ticiana L. Coelho da Silva,
Jose António Fernandes de Macêdo,
Leopoldo Melo Junior,
Matheus Gomes Cordeiro
Adapted from: https://github.com/nickssonfreitas/ICAART2021
- class matclassification.methods.mat.DeepeST.DeepeST(rnn=['bilstm', 'lstm'], units=[100, 200, 300, 400, 500], merge_type=['concat'], dropout_before_rnn=[0, 0.5], dropout_after_rnn=[0.5], embedding_size=[50, 100, 200, 300, 400], batch_size=[64], epochs=[1000], patience=[20], monitor=['val_acc'], optimizer=['ada'], learning_rate=[0.001], loss=['CCE'], loss_parameters=[{}], y_one_hot_encodding=True, save_results=False, n_jobs=-1, verbose=0, random_state=42, filterwarnings='ignore')[source]
Bases:
THSClassifier
DeepeST: (Deep Learning for Sub-Trajectory classification)
The DeepeST class is a deep learning model for trajectory-based classification, which extends the THSClassifier. It uses RNN-based architectures, such as LSTM and BiLSTM, to handle spatial-temporal data.
- Parameters:
rnn (list, default=['bilstm', 'lstm']) – Types of recurrent neural networks to use (‘bilstm’ or ‘lstm’).
units (list, default=[100, 200, 300, 400, 500]) – List of number of units for the recurrent layers.
merge_type (list, default=['concat']) – How to merge embedding layers. Options: ‘concat’, ‘add’, ‘avg’.
dropout_before_rnn (list, default=[0, 0.5]) – Dropout rates applied before the recurrent layers.
dropout_after_rnn (list, default=[0.5]) – Dropout rates applied after the recurrent layers.
embedding_size (list, default=[50, 100, 200, 300, 400]) – Sizes for the embedding layers.
batch_size (list, default=[64]) – Batch sizes for training the model.
epochs (list, default=[1000]) – Number of epochs to train the model.
patience (list, default=[20]) – Patience for early stopping based on monitored metric.
monitor (list, default=['val_acc']) – Metric to monitor for early stopping.
optimizer (list, default=['ada']) – Optimizer for training (‘ada’ for Adam, ‘rmsprop’ for RMSProp).
learning_rate (list, default=[0.001]) – Learning rate for the optimizer.
loss (list, default=['CCE']) – Loss function to use (‘CCE’ for categorical cross-entropy).
loss_parameters (list, default=[{}]) – Additional parameters for the loss function.
y_one_hot_encoding (bool, default=True) – Whether to one-hot encode the target labels.
save_results (bool, default=False) – Whether to save results after execution.
n_jobs (int, default=-1) – Number of parallel jobs for computations.
verbose (int, default=0) – Verbosity level for output (0: silent, 1: progress).
random_state (int, default=42) – Random seed for reproducibility.
filterwarnings (str, default='ignore') – Filter warnings during execution.
- xy(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, validate=False)[source]
Prepares the trajectory data for training and testing, returning features and labels.
- prepare_input(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, validate=False)[source]
Prepares input features and configurations from the data for model training.
- fit(X_train, y_train, X_val, y_val, config=None)[source]
Trains the model on the training data with validation using the specified configuration.
- predict(X_test, y_test)[source]
Generates predictions on the test data and computes performance metrics.
matclassification.methods.mat.MARC module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
Lucas May Petry (adapted)
- class matclassification.methods.mat.MARC.EpochLogger(X_train, y_train, X_test, y_test, dataset='', metric='val_acc', baseline=0, patience=30, metrics_file=None, verbose=1)[source]
Bases:
EarlyStopping
- on_epoch_begin(epoch, logs={})[source]
Called at the start of an epoch.
Subclasses should override for any actions to run. This function should only be called during TRAIN mode.
- Parameters:
epoch – Integer, index of epoch.
logs – Dict. Currently no data is passed to this argument for this method but that may change in the future.
- on_epoch_end(epoch, logs={})[source]
Called at the end of an epoch.
Subclasses should override for any actions to run. This function should only be called during TRAIN mode.
- Parameters:
epoch – Integer, index of epoch.
logs – Dict, metric results for this training epoch, and for the validation epoch if validation is performed. Validation result keys are prefixed with val_. For training epoch, the values of the Model’s metrics are returned. Example: {‘loss’: 0.2, ‘accuracy’: 0.7}.
- class matclassification.methods.mat.MARC.MARC(embedder_size=[100, 200, 300], merge_type=['add', 'average', 'concatenate'], rnn_cell=['gru', 'lstm'], class_dropout=0.5, class_hidden_units=100, class_lrate=0.001, class_batch_size=64, class_epochs=1000, early_stopping_patience=30, baseline_metric='acc', baseline_value=0.5, n_jobs=-1, verbose=True, random_state=42, filterwarnings='ignore')[source]
Bases:
THSClassifier
MARC: a robust method for multiple-aspect trajectory classification via space, time, and semantic embeddings
The MARC class is a deep learning classifier for sequential trajectory data using different recurrent neural network cells and various strategies for embedding and merging.
- Parameters:
embedder_size (list, default=[100, 200, 300]) – List of sizes for the embedding layers.
merge_type (list, default=['add', 'average', 'concatenate']) – Merge strategy for combining embedding layers (‘add’, ‘average’, ‘concatenate’).
rnn_cell (list, default=['gru', 'lstm']) – Types of recurrent neural network cells (‘gru’ or ‘lstm’).
class_dropout (float, default=0.5) – Dropout rate to apply after the recurrent layer.
class_hidden_units (int, default=100) – Number of hidden units in the recurrent layer.
class_lrate (float, default=0.001) – Learning rate for the optimizer.
class_batch_size (int, default=64) – Batch size for training.
class_epochs (int, default=1000) – Number of epochs for training the model.
early_stopping_patience (int, default=30) – Patience for early stopping based on validation metrics.
baseline_metric (str, default='acc') – Metric to monitor for early stopping (‘acc’ for accuracy).
baseline_value (float, default=0.5) – Baseline value for early stopping based on the monitored metric.
n_jobs (int, default=-1) – Number of parallel jobs for computations.
verbose (bool, default=True) – Verbosity level for model training (True for detailed output).
random_state (int, default=42) – Random seed for reproducibility.
filterwarnings (str, default='ignore') – Filter warnings during execution.
- xy(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, validate=False)[source]
Prepares the trajectory data for training and testing, returning features and labels.
- prepare_input(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, validate=False)[source]
Prepares input features and configurations from the data for model training.
- fit(X_train, y_train, X_val, y_val, config=None)[source]
Trains the model on the training data with validation using the specified configuration.
- predict(X_test, y_test)[source]
Generates predictions on the test data and computes performance metrics.
- clear()
Resets the model and clears the Keras session to free memory.
Notes
This implementation currently sets a default configuration (best_config) and initializes grid search with embedding sizes, merge types, and RNN cells.
The model uses early stopping based on accuracy with a baseline of 0.5.
matclassification.methods.mat.TRF module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (this portion of code is subject to licensing from source project distribution)
- Authors:
Tarlis Portela
- Original source:
Nicksson C. A. de Freitas,
Ticiana L. Coelho da Silva,
Jose António Fernandes de Macêdo,
Leopoldo Melo Junior,
Matheus Gomes Cordeiro
Adapted from: https://github.com/nickssonfreitas/ICAART2021
- class matclassification.methods.mat.TRF.TRF(n_estimators=[200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000], max_depth=[20, 30, 40], min_samples_split=[2, 5, 10], min_samples_leaf=[1, 2, 4], max_features=['sqrt', 'log2'], bootstrap=[True, False], save_results=False, n_jobs=-1, verbose=0, random_state=42, filterwarnings='ignore')[source]
Bases:
THSClassifier
TRF: Trajectory Random Forest Classifier
The TRF class is a Random Forest trajectory-based classifier specifically designed for trajectory classification tasks. It provides tunable hyperparameters and performs grid search to find the optimal model configuration. #TODO: It supports parallelization, result saving, and allows for a flexible and efficient approach to handling complex trajectory data.
- Parameters:
n_estimators (list, default=[200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]) – Number of trees in the forest.
max_depth (list, default=[20, 30, 40]) – Maximum depth of each tree.
min_samples_split (list, default=[2, 5, 10]) – Minimum number of samples required to split an internal node.
min_samples_leaf (list, default=[1, 2, 4]) – Minimum number of samples required to be at a leaf node.
max_features (list, default=['sqrt', 'log2']) – Number of features to consider when looking for the best split.
bootstrap (list, default=[True, False]) – Method of selecting samples for training each tree (whether to use bootstrap sampling).
save_results (bool, default=False) – Option to save the results of the classifier.
n_jobs (int, default=-1) – Number of jobs to run in parallel. -1 means using all processors.
verbose (int, default=0) – Controls the verbosity during model training.
random_state (int, default=42) – Controls the randomness of the estimator.
filterwarnings (str, default='ignore') – Whether to suppress or display warnings during model training and evaluation.
- create(config):
Initializes the Random Forest classifier with the given configuration.
- fit(X_train, y_train, X_val, y_val, config=None):
Trains the Random Forest classifier on the provided training data.
- predict(X_test, y_test):
Predicts class probabilities and the most likely class for the test data.
matclassification.methods.mat.TXGB module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (this portion of code is subject to licensing from source project distribution)
- Authors:
Tarlis Portela
- Original source:
Nicksson C. A. de Freitas,
Ticiana L. Coelho da Silva,
Jose António Fernandes de Macêdo,
Leopoldo Melo Junior,
Matheus Gomes Cordeiro
Adapted from: https://github.com/nickssonfreitas/ICAART2021
- class matclassification.methods.mat.TXGB.TXGB(n_estimators=[2000], max_depth=[3, 5], learning_rate=[0.01], gamma=[0.0, 1, 5], subsample=[0.1, 0.2, 0.5, 0.8], colsample_bytree=[0.5, 0.7], reg_alpha_l1=[1.0], reg_lambda_l2=[100], eval_metric=['merror', 'mlogloss'], tree_method='auto', esr=[20], save_results=False, n_jobs=-1, verbose=0, random_state=42, filterwarnings='ignore')[source]
Bases:
THSClassifier
TXGB: Trajectory XGBoost Classifier
The TXGB class is an implementation of the XGBoost classifier, tailored specifically for trajectory classification tasks. It utilizes the efficient gradient boosting algorithm provided by XGBoost and supports a wide range of tunable hyperparameters.
The model selection process is driven by grid search, making it adaptable to various data configurations and problem complexities.
- Parameters:
n_estimators (list, default=[2000]) – Number of boosting rounds.
max_depth (list, default=[3, 5]) – Maximum depth of a tree.
learning_rate (list, default=[0.01]) – Step size shrinkage used in update to prevent overfitting.
gamma (list, default=[0.0, 1, 5]) – Minimum loss reduction required to make a further partition on a leaf node.
subsample (list, default=[0.1, 0.2, 0.5, 0.8]) – Subsample ratio of the training instance.
colsample_bytree (list, default=[0.5, 0.7]) – Subsample ratio of columns when constructing each tree.
reg_alpha_l1 (list, default=[1.0]) – L1 regularization term on weights.
reg_lambda_l2 (list, default=[100]) – L2 regularization term on weights.
eval_metric (list, default=['merror', 'mlogloss']) – Evaluation metrics used to monitor performance (merror: classification error, mlogloss: log loss).
tree_method (str, default='auto') – The tree construction algorithm used by XGBoost (e.g., ‘auto’, ‘gpu_hist’).
esr (list, default=[20]) – Early stopping rounds (used to stop training early if no improvement is seen).
save_results (bool, default=False) – Whether to save the results of the classifier.
n_jobs (int, default=-1) – Number of parallel threads used by XGBoost.
verbose (int, default=0) – Verbosity of XGBoost training output.
random_state (int, default=42) – Controls the randomness of the model.
filterwarnings (str, default='ignore') – Whether to suppress or display warnings during model training and evaluation.
- create(config):
Initializes the XGBoost classifier with the given configuration.
- fit(X_train, y_train, X_val, y_val, config=None):
Trains the XGBoost classifier on the provided training data, with optional early stopping.
matclassification.methods.mat.Tulvae module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present package offers a tool, to support the user in the task of data analysis of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2021 Copyright (C) 2022, License GPL Version 3 or superior (this portion of code is subject to licensing from source project distribution)
- Authors:
Tarlis Portela
- Original source:
Nicksson C. A. de Freitas,
Ticiana L. Coelho da Silva,
Jose António Fernandes de Macêdo,
Leopoldo Melo Junior,
Matheus Gomes Cordeiro
Adapted from: https://github.com/nickssonfreitas/ICAART2021
- class matclassification.methods.mat.Tulvae.Tulvae(rnn=['bilstm'], units=[100, 200, 300], stack=[1], dropout=[0.5], embedding_size=[100, 200, 300], z_values=[100, 200, 300], batch_size=[64], epochs=[1000], patience=[20], monitor=['val_acc'], optimizer=['ada'], learning_rate=[0.001], save_results=False, n_jobs=-1, verbose=0, random_state=42, filterwarnings='ignore')[source]
Bases:
THSClassifier
Tulvae: Trajectory-user linking via variational autoencoder
The Tulvae class is an implementation of a deep learning model based on variable auto-encoders, designed for trajectory classification tasks. It utilizes a variety of tunable hyperparameters and neural network structures to encode spatial trajectories and decode them for predictive modeling.
- Parameters:
rnn (list, default=['bilstm']) – Recurrent neural network cell used, e.g., ‘bilstm’ (Bidirectional LSTM).
units (list, default=[100, 200, 300]) – Number of units in the recurrent layers.
stack (list, default=[1]) – Number of stacked recurrent layers.
dropout (list, default=[0.5]) – Fraction of units to drop for the linear transformation of the inputs.
embedding_size (list, default=[100, 200, 300]) – Size of the embedding vectors used to represent trajectory features.
z_values (list, default=[100, 200, 300]) – Dimensionality of the latent variable space.
batch_size (list, default=[64]) – Number of samples per batch of computation.
epochs (list, default=[1000]) – Number of epochs to train the model.
patience (list, default=[20]) – Number of epochs with no improvement after which training will be stopped.
monitor (list, default=['val_acc']) – Metric used for early stopping and performance evaluation.
optimizer (list, default=['ada']) – Optimizer used to minimize the loss function.
learning_rate (list, default=[0.001]) – Learning rate for the optimizer.
- xy(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, features=['poi'], validate=False):
Prepares trajectory data and transforms it into training and testing datasets.
- prepare_input(train, test, tid_col='tid', class_col='label', space_geohash=False, geo_precision=30, features=['poi'], validate=False):
Prepares the input data, generates configuration, and initializes grid search.
- create(config):
Builds the neural network model based on the configuration.
- fit(X_train, y_train, X_val, y_val, config=None):
Trains the model on the provided training data, using early stopping and validation data.
- clear():
Clears the session and resets the model.