matdata.inc package
Submodules
matdata.inc.ts_io module
MAT-Tools: Python Framework for Multiple Aspect Trajectory Data Mining
The present application offers a tool, to support the user in the preprocessing of multiple aspect trajectory data. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods. Copyright (C) 2022, MIT license (this portion of code is subject to licensing from source project distribution)
Created on Dec, 2023 Copyright (C) 2023, License GPL Version 3 or superior (see LICENSE file)
- Authors:
Tarlis Portela
sktime package (adapted)
- exception matdata.inc.ts_io.LongFormatDataParseException[source]
Bases:
Exception
Should be raised when parsing a .csv file with long-formatted date and the format is incorrect
- exception matdata.inc.ts_io.TsFileParseException[source]
Bases:
Exception
Should be raised when parsing a .ts file and the format is incorrect.
- matdata.inc.ts_io.generate_example_long_table(num_cases=50, series_len=20, num_dims=2)[source]
Generates example from long table format file.
- Parameters:
num_cases (int) – Number of cases.
series_len (int) – Length of the series.
num_dims (int) – Number of dimensions.
- Return type:
DataFrame
- matdata.inc.ts_io.load_from_arff_to_dataframe(full_file_path_and_name, has_class_labels=True, return_separate_X_and_y=True, replace_missing_vals_with='NaN')[source]
Loads data from a .ts file into a Pandas DataFrame.
- Parameters:
full_file_path_and_name (str) – The full pathname of the .ts file to read.
has_class_labels (bool) – true then line contains separated strings and class value contains list of separated strings, check for ‘return_separate_X_and_y’ false otherwise.
return_separate_X_and_y (bool) – true then X and Y values should be returned as separate Data Frames ( X) and a numpy array (y), false otherwise. This is only relevant for data.
replace_missing_vals_with (str) – The value that missing values in the text file should be replaced with prior to parsing.
- Returns:
DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.
DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.
- matdata.inc.ts_io.load_from_long_to_dataframe(full_file_path_and_name, separator=',')[source]
Loads data from a long format file into a Pandas DataFrame.
- Parameters:
full_file_path_and_name (str) – The full pathname of the .csv file to read.
separator (str) – The character that the csv uses as a delimiter
- Returns:
A dataframe with sktime-formatted data
- Return type:
DataFrame
- matdata.inc.ts_io.load_from_tsfile(file, return_separate_X_and_y=False, replace_missing_vals_with='NaN', opLabel='Processing TS')[source]
Loads data from a .ts file into a Pandas DataFrame.
- Parameters:
full_file_path_and_name (str) – The full pathname of the .ts file to read.
return_separate_X_and_y (bool) – true if X and Y values should be returned as separate Data Frames ( X) and a numpy array (y), false otherwise. This is only relevant for data that
replace_missing_vals_with (str) – The value that missing values in the text file should be replaced with prior to parsing.
- Returns:
DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.
DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.
- matdata.inc.ts_io.load_from_tsfile_to_dataframe(full_file_path_and_name, return_separate_X_and_y=False, replace_missing_vals_with='?', opLabel='Processing TS')[source]
- matdata.inc.ts_io.load_from_ucr_tsv_to_dataframe(full_file_path_and_name, return_separate_X_and_y=True)[source]
Loads data from a .tsv file into a Pandas DataFrame.
- Parameters:
full_file_path_and_name (str) – The full pathname of the .tsv file to read.
return_separate_X_and_y (bool) – true then X and Y values should be returned as separate Data Frames ( X) and a numpy array (y), false otherwise. This is only relevant for data.
- Returns:
DataFrame, ndarray – If return_separate_X_and_y then a tuple containing a DataFrame and a numpy array containing the relevant time-series and corresponding class values.
DataFrame – If not return_separate_X_and_y then a single DataFrame containing all time-series and (if relevant) a column “class_vals” the associated class values.
- matdata.inc.ts_io.write_dataframe_to_tsfile(data, path, problem_name='sample_data', timestamp=False, univariate=True, class_label=None, class_value_list=None, equal_length=False, series_length=-1, missing_values='NaN', comment=None)[source]
Output a dataset in dataframe format to .ts file :param data: the dataset in a dataframe to be written as a ts file
which must be of the structure specified in the documentation https://github.com/whackteachers/sktime/blob/master/examples/loading_data.ipynb index | dim_0 | dim_1 | … | dim_c-1
0 | pd.Series | pd.Series | pd.Series | pd.Series 1 | pd.Series | pd.Series | pd.Series | pd.Series
- … | … | … | … | …
n | pd.Series | pd.Series | pd.Series | pd.Series
- Parameters:
path (str) – The full path to output the ts file
problem_name (str) – The problemName to print in the header of the ts file and also the name of the file.
timestamp ({False, bool}, optional) – Indicate whether the data contains timestamps in the header.
univariate ({True, bool}, optional) – Indicate whether the data is univariate or multivariate in the header. If univariate, only the first dimension will be written to file
class_label ({list, None}, optional) – Provide class label to show the possible class values for classification problems in the header.
class_value_list ({list/ndarray, []}, optional) – ndarray containing the class values for each case in classification problems
equal_length ({False, bool}, optional) – Indicate whether each series has equal length. It only write to file if true.
series_length ({-1, int}, optional) – Indicate each series length if they are of equal length. It only write to file if true.
missing_values ({NaN, str}, optional) – Representation for missing value, default is NaN.
comment ({None, str}, optional) – Comment text to be inserted before the header in a block.
- Return type:
None
Notes
This version currently does not support writing timestamp data.
References
The code for writing series data into file is adopted from https://stackoverflow.com/questions/37877708/ how-to-turn-a-pandas-dataframe-row-into-a-comma-separated-string