matdata.inc package

Submodules

matdata.inc.ts_io module

MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining

The present application offers a tool, to support the user in the preprocessing of multiple aspect trajectory data. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)

Created on Dec, 2023 Copyright (C) 2023, License GPL Version 3 or superior (see LICENSE file)

Authors:
  • Tarlis Portela

  • sktime package (adapted)

exception matdata.inc.ts_io.LongFormatDataParseException[source]

Bases: Exception

Should be raised when parsing a .csv file with long-formatted date and the format is incorrect

exception matdata.inc.ts_io.TsFileParseException[source]

Bases: Exception

Should be raised when parsing a .ts file and the format is incorrect.

matdata.inc.ts_io.from_long_to_nested(long_dataframe)[source]
matdata.inc.ts_io.generate_example_long_table(num_cases=50, series_len=20, num_dims=2)[source]

Generates example from long table format file.

Parameters:
  • num_cases (int) – Number of cases.

  • series_len (int) – Length of the series.

  • num_dims (int) – Number of dimensions.

Return type:

DataFrame

matdata.inc.ts_io.load_from_arff_to_dataframe(full_file_path_and_name, has_class_labels=True, return_separate_X_and_y=True, replace_missing_vals_with='NaN')[source]

Loads data from a .ts file into a Pandas DataFrame.

Parameters:
  • full_file_path_and_name (str) – The full pathname of the .ts file to read.

  • has_class_labels (bool) – true then line contains separated strings and class value contains list of separated strings, check for ‘return_separate_X_and_y’ false otherwise.

  • return_separate_X_and_y (bool) – true then X and Y values should be returned as separate Data Frames ( X) and a numpy array (y), false otherwise. This is only relevant for data.

  • replace_missing_vals_with (str) – The value that missing values in the text file should be replaced with prior to parsing.

Returns:

  • DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.

  • DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.

matdata.inc.ts_io.load_from_long_to_dataframe(full_file_path_and_name, separator=',')[source]

Loads data from a long format file into a Pandas DataFrame.

Parameters:
  • full_file_path_and_name (str) – The full pathname of the .csv file to read.

  • separator (str) – The character that the csv uses as a delimiter

Returns:

A dataframe with sktime-formatted data

Return type:

DataFrame

matdata.inc.ts_io.load_from_tsfile(file, return_separate_X_and_y=False, replace_missing_vals_with='NaN', opLabel='Processing TS')[source]

Loads data from a .ts file into a Pandas DataFrame.

Parameters:
  • full_file_path_and_name (str) – The full pathname of the .ts file to read.

  • return_separate_X_and_y (bool) – true if X and Y values should be returned as separate Data Frames ( X) and a numpy array (y), false otherwise. This is only relevant for data that

  • replace_missing_vals_with (str) – The value that missing values in the text file should be replaced with prior to parsing.

Returns:

  • DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.

  • DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.

matdata.inc.ts_io.load_from_tsfile_to_dataframe(full_file_path_and_name, return_separate_X_and_y=False, replace_missing_vals_with='?', opLabel='Processing TS')[source]
matdata.inc.ts_io.load_from_ucr_tsv_to_dataframe(full_file_path_and_name, return_separate_X_and_y=True)[source]

Loads data from a .tsv file into a Pandas DataFrame.

Parameters:
  • full_file_path_and_name (str) – The full pathname of the .tsv file to read.

  • return_separate_X_and_y (bool) – true then X and Y values should be returned as separate Data Frames ( X) and a numpy array (y), false otherwise. This is only relevant for data.

Returns:

  • DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.

  • DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.

matdata.inc.ts_io.write_dataframe_to_tsfile(data, path, problem_name='sample_data', timestamp=False, univariate=True, class_label=None, class_value_list=None, equal_length=False, series_length=-1, missing_values='NaN', comment=None)[source]

Output a dataset in dataframe format to .ts file :param data: the dataset in a dataframe to be written as a ts file

which must be of the structure specified in the documentation https://github.com/whackteachers/sktime/blob/master/examples/loading_data.ipynb index | dim_0 | dim_1 | … | dim_c-1

0 | pd.Series | pd.Series | pd.Series | pd.Series 1 | pd.Series | pd.Series | pd.Series | pd.Series

… | … | … | … | …

n | pd.Series | pd.Series | pd.Series | pd.Series

Parameters:
  • path (str) – The full path to output the ts file

  • problem_name (str) – The problemName to print in the header of the ts file and also the name of the file.

  • timestamp ({False, bool}, optional) – Indicate whether the data contains timestamps in the header.

  • univariate ({True, bool}, optional) – Indicate whether the data is univariate or multivariate in the header. If univariate, only the first dimension will be written to file

  • class_label ({list, None}, optional) – Provide class label to show the possible class values for classification problems in the header.

  • class_value_list ({list/ndarray, []}, optional) – ndarray containing the class values for each case in classification problems

  • equal_length ({False, bool}, optional) – Indicate whether each series has equal length. It only write to file if true.

  • series_length ({-1, int}, optional) – Indicate each series length if they are of equal length. It only write to file if true.

  • missing_values ({NaN, str}, optional) – Representation for missing value, default is NaN.

  • comment ({None, str}, optional) – Comment text to be inserted before the header in a block.

Return type:

None

Notes

This version currently does not support writing timestamp data.

References

The code for writing series data into file is adopted from https://stackoverflow.com/questions/37877708/ how-to-turn-a-pandas-dataframe-row-into-a-comma-separated-string

matdata.inc.ts_io.write_results_to_uea_format(path, strategy_name, dataset_name, y_true, y_pred, split='TEST', resample_seed=0, y_proba=None, second_line='N/A')[source]

Module contents