Generator

MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining

The present application offers a tool, to support the user in the preprocessing of multiple aspect trajectory data. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)

Created on Dec, 2023 Copyright (C) 2023, License GPL Version 3 or superior (see LICENSE file)

Authors:
  • Tarlis Portela

matdata.generator.randomGenerator(N=10, M=50, L=10, C=10, random_seed=1, fileprefix='random', fileposfix='train', attr_desc=None, save_to=False, outformats=['csv'])[source]

Function to generate trajectories based on random data.

Parameters:

Nint, optional

Number of trajectories (default 10)

Mint, optional

Size of trajectories (default 50)

Lint, optional

Number of attributes (default 10)

Cint, optional

Number of classes (default 10)

random_seedint, optional

Random Seed (default 1)

attr_desclist of dict, optional

Data type intervals to generate attributes as a list of descriptive dicts. Default: None (uses default types) OR a list of instances of AttributeGenerator

save_tostr or bool, optional

Destination folder to save, or False if not to save CSV files (default False)

fileprefixstr, optional

Output filename prefix (default ‘sample’)

fileposfixstr, optional

Output filename postfix (default ‘train’)

outformatslist, optional

Output file formats for saving (default [‘csv’])

Returns:

pandas.DataFrame

The generated dataset.

matdata.generator.samplerGenerator(N=10, M=50, C=1, random_seed=1, fileprefix='sample', fileposfix='train', cols_for_sampling=['space', 'time', 'day', 'rating', 'price', 'weather', 'root_type', 'type'], save_to=False, base_data=None, outformats=['csv'])[source]

Function to generate trajectories based on real data.

Parameters:

Nint, optional

Number of trajectories (default 10)

Mint, optional

Size of trajectories, number of points (default 50)

Cint, optional

Number of classes (default 1)

random_seedint, optional

Random seed (default 1)

cols_for_samplinglist, optional

Columns to add in the generated dataset. Default: [‘space’, ‘time’, ‘day’, ‘rating’, ‘price’, ‘weather’, ‘root_type’, ‘type’].

save_tostr or bool, optional

Destination folder to save, or False if not to save CSV files (default False)

fileprefixstr, optional

Output filename prefix (default ‘sample’)

fileposfixstr, optional

Output filename postfix (default ‘train’)

base_dataDataFrame, optional

DataFrame of trajectories to use as a base for sampling data. Default: None (uses example data)

outformatslist, optional

Output file formats for saving (default [‘csv’])

Returns:

pandas.DataFrame

The generated dataset.

matdata.generator.scalerRandomGenerator(Ns=[100, 10], Ms=[10, 10], Ls=[8, 10], Cs=[2, 10], random_seed=1, fileprefix='scalability', fileposfix='train', attr_desc=None, save_to=None, save_desc_files=True, outformats=['csv'])[source]

Function to generate trajectory datasets based on random data.

Parameters:

Nslist of int, optional

Parameters to scale the number of trajectories. List of 2 values: starting number, number of elements (default [100, 10])

Mslist of int, optional

Parameters to scale the size of trajectories. List of 2 values: starting number, number of elements (default [10, 10])

Lslist of int, optional

Parameters to scale the number of attributes (* doubles the columns). List of 2 values: starting number, number of elements (default [8, 10])

Cslist of int, optional

Parameters to scale the number of classes. List of 2 values: starting number, number of elements (default [2, 10])

random_seedint, optional

Random seed (default 1)

attr_desclist, optional

Data type intervals to generate attributes as a list of descriptive dicts. Default: None (uses default types)

save_tostr or bool, optional

Destination folder to save, or False if not to save CSV files (default False)

fileprefixstr, optional

Output filename prefix (default ‘sample’)

fileposfixstr, optional

Output filename postfix (default ‘train’)

save_desc_filesbool, optional

True if to save the .json description files, False otherwise (default True)

outformatslist, optional

Output file formats for saving (default [‘csv’])

Returns:

None

matdata.generator.scalerSamplerGenerator(Ns=[100, 10], Ms=[10, 10], Ls=[8, 10], Cs=[2, 10], random_seed=1, fileprefix='scalability', fileposfix='train', cols_for_sampling=['space', 'time', 'day', 'rating', 'price', 'weather', 'root_type', 'type'], save_to=None, base_data=None, save_desc_files=True, outformats=['csv'])[source]

Generates trajectory datasets based on real data.

Parameters:

Nslist of int, optional

Parameters to scale the number of trajectories. List of 2 values: starting number, number of elements (default [100, 10])

Mslist of int, optional

Parameters to scale the size of trajectories. List of 2 values: starting number, number of elements (default [10, 10])

Lslist of int, optional

Parameters to scale the number of attributes (* doubles the columns). List of 2 values: starting number, number of elements (default [8, 10])

Cslist of int, optional

Parameters to scale the number of classes. List of 2 values: starting number, number of elements (default [2, 10])

random_seedint, optional

Random seed (default 1)

fileprefixstr, optional

Output filename prefix (default ‘scalability’)

fileposfixstr, optional

Output filename postfix (default ‘train’)

cols_for_samplinglist or dict, optional

Columns to add in the generated dataset. Default: [‘space’, ‘time’, ‘day’, ‘rating’, ‘price’, ‘weather’, ‘root_type’, ‘type’]. If a dictionary is provided in the format: {‘aspectName’: ‘type’, ‘aspectName’: ‘type’}, it is used when providing base_data and saving .MAT.

save_tostr or bool, optional

Destination folder to save, or False if not to save CSV files (default False)

base_dataDataFrame, optional

DataFrame of trajectories to use as a base for sampling data. Default: None (uses example data)

save_desc_filesbool, optional

True if to save the .json description files, False otherwise (default True)

outformatslist, optional

Output file formats for saving (default [‘csv’])

Returns:

None