API Reference

class bibmon.Autoencoder(hidden_layer_sizes=(2,), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)

Bases: GenericModel

Autoencoder using sklearn’s MLPRegressor interface. For details on the parameters for input, see https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html

compute_SPE_contributions(pred, X)

Calculation of SPE contributions for diagnosis based on partial decomposition analysis. Valid for reconstruction models, in which self.has_Y = False.

Parâmetros:

pred, X: numpy.array

Data windows to compute contributions.

Retornos:

SPE_contributions: numpy.array

Contributions to the SPE.

fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)

Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.

  • frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.

  • tune (boolean, optional) – Indicator of automatic hyperparameter tuning.

  • params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.

  • params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.

  • params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.

  • n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.

hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)

Performs hyperparameter tuning using the Optuna library.

Parameters:
  • params (pandas.DataFrame) –

    Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,

    list_possibilities2, …],

    ’types’: [str_type1,

    str_type2, …]},

    index = [str_param1_name, str_param2_name, …])

  • n_trials (int, optional) – Number of trials in the optimization performed by Optuna.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.

  • n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)

Receives parameters from a previously trained model for making predictions and tests without the need for training.

Parameters:
  • limSPE (float) – Detection limit and mean of the SPE.

  • SPE_mean (float) – Mean of the SPE.

  • count_window_size (int) – Window sizes used in count alarms calculation.

  • Mux (pandas.Series) – Means of the X variables in the training period.

  • SDx (pandas.Series) – Standard deviations of the X variables in the training period.

  • Muy (pandas.Series, optional) – Means of the Y variables in the training period.

  • SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.

map_from_X(X)

Receives a data matrix X and returns a matrix of predicted or reconstructed values.

Parameters:

X (numpy.array) – Window X of data for prediction or reconstruction.

Returns:

  • numpy.array

  • Reconstructed X (or predicted Y).

plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)

Plotting the temporal evolution of SPE.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).

  • legends (boolean, optional) – If the graph should display legends.

  • plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.

plot_SPE_contributions(ax=None, train_or_test='test')

Plotting the temporal evolution of SPE contributions on a heatmap.

Parameters:

ax: matplotlib.axes._subplots.AxesSubplot

Axis where the graph will be plotted.

train_or_test: string

Indicates whether to plot the ‘train’ or ‘test’ graph.

plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)

Plotting the temporal evolution of the predictions along with the respective true values.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)

Receives a window of data and prepares it for the model testing.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in the count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)

Receives the data for model training and prepares them for training.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}

predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)

Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

set_hyperparameters(params_dict)

Receives a dict with hyperparameters to be assigned in the model.

Parameters:

params_dict (dict) – Dictionary with hyperparameter values

test(redefine_limit=False, delete_testing_data=False)

Analyzes a window of data, applying a model test. Must be called after the pre_test function.

Parameters:
  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

train(lim_conf=0.99, delete_training_data=False)

Performs the model training. Must be called after the pre_train function.

Parameters:
  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

train_core()

The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().

class bibmon.ESN(n_reservoir=400, spectral_radius=0.95, sparsity=0.95, noise=0.01, input_shift=None, input_scaling=None, teacher_forcing=False, feedback_scaling=None, teacher_scaling=None, teacher_shift=None, out_activation=<function linear>, inverse_out_activation=<function linear>, random_state=None, silent=True)

Bases: GenericModel

Echo State Networks.

For details on the technique, see the paper by Lemos et al. (2021) - Echo State Network Based Soft Sensor for Monitoring and Fault Detection of Industrial Processes, https://doi.org/10.1016/j.compchemeng.2021.107512

This code has been modified and adapted from the following repository: https://github.com/cknd/pyESN

Parameters:
  • n_reservoir (int, optional) – Number of neurons in the reservoir.

  • spectral_radius (float, optional) – Spectral radius of the recurrent weight matrix.

  • sparsity (float, optional) – Proportion of recurrent weights set to zero.

  • noise (float, optional) – Noise added to each neuron (regularization).

  • input_shift (float or numpy.array) – Scalar or vector of length n_inputs to be added to each input dimension before feeding it to the network.

  • input_scaling (float or numpy.array) – Scalar or vector of length n_inputs to be multiplied with each input dimension before feeding it to the network.

  • teacher_forcing (boolean, optional) – If True, results in an ESN with output layer recursion to the dynamic reservoir.

  • teacher_scaling (float, optional) – Factor applied to the target signal.

  • teacher_shift (float, optional) – Additive term applied to the target signal.

  • out_activation (func, optional) – Output activation function (applied to the readout).

  • inverse_out_activation (func, optional) – Inverse of the output activation function.

  • random_state (int or np.rand.RandomState, optional) – Positive integer seed, np.rand.RandomState object, or None to use numpy’s builting RandomState.

  • silent (boolean, optional) – Suppress messages.

compute_SPE_contributions(pred, X)

Calculation of SPE contributions for diagnosis based on partial decomposition analysis. Valid for reconstruction models, in which self.has_Y = False.

Parâmetros:

pred, X: numpy.array

Data windows to compute contributions.

Retornos:

SPE_contributions: numpy.array

Contributions to the SPE.

fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)

Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.

  • frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.

  • tune (boolean, optional) – Indicator of automatic hyperparameter tuning.

  • params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.

  • params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.

  • params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.

  • n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.

hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)

Performs hyperparameter tuning using the Optuna library.

Parameters:
  • params (pandas.DataFrame) –

    Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,

    list_possibilities2, …],

    ’types’: [str_type1,

    str_type2, …]},

    index = [str_param1_name, str_param2_name, …])

  • n_trials (int, optional) – Number of trials in the optimization performed by Optuna.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.

  • n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

initweights()

Initializes the weights of the network.

load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)

Receives parameters from a previously trained model for making predictions and tests without the need for training.

Parameters:
  • limSPE (float) – Detection limit and mean of the SPE.

  • SPE_mean (float) – Mean of the SPE.

  • count_window_size (int) – Window sizes used in count alarms calculation.

  • Mux (pandas.Series) – Means of the X variables in the training period.

  • SDx (pandas.Series) – Standard deviations of the X variables in the training period.

  • Muy (pandas.Series, optional) – Means of the Y variables in the training period.

  • SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.

map_from_X(X, continuation=True)

Apply the learned weights to the network’s reactions to new inputs.

Parameters:
  • inputs (numpy.array) – Inputs of shape (N_test_samples x n_inputs)

  • continuation (boolean, optional) – If True, start the network from the last training state.

Returns:

Output activation matrix.

Return type:

numpy.array

plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)

Plotting the temporal evolution of SPE.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).

  • legends (boolean, optional) – If the graph should display legends.

  • plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.

plot_SPE_contributions(ax=None, train_or_test='test')

Plotting the temporal evolution of SPE contributions on a heatmap.

Parameters:

ax: matplotlib.axes._subplots.AxesSubplot

Axis where the graph will be plotted.

train_or_test: string

Indicates whether to plot the ‘train’ or ‘test’ graph.

plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)

Plotting the temporal evolution of the predictions along with the respective true values.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)

Receives a window of data and prepares it for the model testing.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in the count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

pre_train(*args, **kwargs)

Receives the data for model training and prepares them for training.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}

predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)

Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

set_hyperparameters(params_dict)

Receives a dict with hyperparameters to be assigned in the model.

Parameters:

params_dict (dict) – Dictionary with hyperparameter values

test(redefine_limit=False, delete_testing_data=False)

Analyzes a window of data, applying a model test. Must be called after the pre_test function.

Parameters:
  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

train(lim_conf=0.99, delete_training_data=False)

Performs the model training. Must be called after the pre_train function.

Parameters:
  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

train_core()

Harvest the network’s reaction to training data, train the readout weights. The result is the output of the network on the training data, using the trained weights.

class bibmon.PCA(ncomp=0.9)

Bases: GenericModel

Principal Component Analysis.

For details on the technique, see https://doi.org/10.3390/pr12020251

Parameters:

ncomp (int or float) –

float: number between 0.0 and 1.0 that corresponds to the minimum

fraction of accumulated variance for component selection;

int: defines the number of components.

compute_SPE_contributions(pred, X)

Calculation of SPE contributions for diagnosis based on partial decomposition analysis. Valid for reconstruction models, in which self.has_Y = False.

Parâmetros:

pred, X: numpy.array

Data windows to compute contributions.

Retornos:

SPE_contributions: numpy.array

Contributions to the SPE.

fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)

Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.

  • frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.

  • tune (boolean, optional) – Indicator of automatic hyperparameter tuning.

  • params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.

  • params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.

  • params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.

  • n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.

hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)

Performs hyperparameter tuning using the Optuna library.

Parameters:
  • params (pandas.DataFrame) –

    Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,

    list_possibilities2, …],

    ’types’: [str_type1,

    str_type2, …]},

    index = [str_param1_name, str_param2_name, …])

  • n_trials (int, optional) – Number of trials in the optimization performed by Optuna.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.

  • n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, S, V, n)

Receives parameters from a previously trained model to perform predictions and tests without the need for training.

Parameters:
  • Mux (pandas.Series, optional) – Means of the X variables in the training period.

  • SDx (pandas.Series, optional) – Standard deviations of the X variables in the training period.

  • Mux – Means of the Y variables in the training period.

  • SDx – Standard deviations of the Y variables in the training period.

  • limSPE (float) – Detection limit and mean of the SPE.

  • SPE_mean (float) – Mean of the SPE.

  • count_window_size (int) – Window sizes used in the count alarms calculation.

  • S (numpy.array) – Specific parameter of PCA.

  • V (numpy.array) – Specific parameter of PCA.

  • n (int) – Specific parameter of PCA.

map_from_X(X)

Receives a data matrix X and returns a matrix of predicted or reconstructed values.

Parameters:

X (numpy.array) – Window X of data for prediction or reconstruction.

Returns:

  • numpy.array

  • Reconstructed X (or predicted Y).

plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)

Plotting the temporal evolution of SPE.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).

  • legends (boolean, optional) – If the graph should display legends.

  • plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.

plot_SPE_contributions(ax=None, train_or_test='test')

Plotting the temporal evolution of SPE contributions on a heatmap.

Parameters:

ax: matplotlib.axes._subplots.AxesSubplot

Axis where the graph will be plotted.

train_or_test: string

Indicates whether to plot the ‘train’ or ‘test’ graph.

plot_cumulative_variance(ax=None)

Plots the cumulative variance.

Parameters:

ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)

Plotting the temporal evolution of the predictions along with the respective true values.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)

Receives a window of data and prepares it for the model testing.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in the count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)

Receives the data for model training and prepares them for training.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}

predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)

Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

set_hyperparameters(params_dict)

Receives a dict with hyperparameters to be assigned in the model.

Parameters:

params_dict (dict) – Dictionary with hyperparameter values

test(redefine_limit=False, delete_testing_data=False)

Analyzes a window of data, applying a model test. Must be called after the pre_test function.

Parameters:
  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

train(lim_conf=0.99, delete_training_data=False)

Performs the model training. Must be called after the pre_train function.

Parameters:
  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

train_core()

The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().

class bibmon.PreProcess(f_pp=None, a_pp=None, is_Y=False)

Bases: object

Class used to encapsulate data preprocessing methods.

Parameters:
  • f_pp (list, optional) – List containing strings with names of methods to be used in the preprocessing of the train data. The list of methods is shown below.

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform preprocessing of the train data, in the format {‘functionname__argname’: argvalue, …}

  • is_Y (boolean, optional) – If the data being preprocessed is Y (that is, to be predicted).

Methods:

  • Variable selection:

    remove_empty_variables(); remove_frozen_variables()

  • Missing values imputation:

    ffill() remove_observations_with_nan(); replace_nan_with_values()

  • Normalization:

    back_to_units(); normalize()

  • Adding dynamics:

    apply_lag(); add_moving_average()

  • Noise treatment:

    moving_average_filter()

  • Outlier handling:

    process_outliers_iqr()

add_moving_average(df, train_or_test='train', WS=10)

Adding variables filtered by moving average. Attention! Do not confuse with moving_average_filter, in which the original variables are not kept in the dataset.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • WS (int, optional) – Window size of the filter.

Returns:

  • pandas.DataFrame

  • Processed data.

apply(df, train_or_test='train')

Sequentially applies the preprocessing functions defined during initialization.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

Returns:

Processed data.

Return type:

pandas.DataFrame

apply_lag(df, train_or_test='train', lag=1)

Generation of time-delayed variables.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • lag (int, optional) – Number of delays to be considered.

Returns:

  • pandas.DataFrame

  • Processed data.

back_to_units(df)

Returns the variables to the original scale, reverting effects of a normalization.

Parameters:

df (pandas.DataFrame) – Data to be processed.

Returns:

  • pandas.DataFrame

  • Processed data.

ffill_nan(df, train_or_test='train')

Fills missing (NaN) values with the last valid value. Uses the next valid value if there is no last available.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

Returns:

Processed data.

Return type:

pandas.DataFrame

moving_average_filter(df, train_or_test='train', WS=10)

Moving average noise filter.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • WS (int, optional) – Window size of the filter.

Returns:

  • pandas.DataFrame

  • Processed data.

normalize(df, train_or_test='train', mode='standard')

Variable normalization.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • mode (string, optional) – Type of normalization (standard, robust, m-robust or s-robust).

Returns:

  • pandas.DataFrame

  • Processed data.

process_outliers_iqr(df: DataFrame, train_or_test: bool = 'train', cols: list = None, method: str = 'remove') DataFrame

Removes or handles univariate outliers in a DataFrame using the IQR (Interquartile Range) method.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • cols (list, optional) – List of columns for which outliers will be removed or handled. Default: None (which results in considering all cols)

  • method (str) –

    Method for handling outliers. Can be ‘remove’ (removes outliers),

    ’median’ (replaces outliers with the median), or ‘winsorize’ (applies winsorization).

    Default: ‘remove’.

Returns:

DataFrame with outliers removed or handled.

Return type:

pandas.DataFrame:

remove_empty_variables(df, train_or_test='train')

Removes variables with no values.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

Returns:

Processed data.

Return type:

pandas.DataFrame

remove_frozen_variables(df, train_or_test='train', threshold=1e-06)

Removes variables whose variation falls below a given limit.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • threshold (float, optional) – Variance limit to consider a variable as frozen.

Returns:

Processed data.

Return type:

pandas.DataFrame

remove_observations_with_nan(df, train_or_test='train')

Removes observations with missing data (NaN).

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

Returns:

Processed data.

Return type:

pandas.DataFrame

replace_nan_with_values(df, train_or_test='train', val=0)

Replaces missing data (NaN) with a predefined value.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • val (int or float) – Value to be used in the replacement.

Returns:

  • pandas.DataFrame

  • Processed data.

class bibmon.SBM(p=2.0, functional_form='rbf', gamma=1.0, eta=1e-10, train_method='geometrical_median', tau=1e-10, verbose=False)

Bases: GenericModel

Similarity-based method (SBM).

For details on the technique, see the papers:

http://www.pee.ufrj.br/index.php/pt/producao-academica/teses-de-doutorado/tese-1/2016033299-similarity-based-methods-for-machine-diagnosis/file

Parameters:
  • p (float, optional) – p-value for the definition of the norm.

  • functional_form (string, optional) – Functional form to be used in the similarity calculation. rbf = radial basis function; ies = inverse euclidean similarity; iqk = inverse quadratic kernel; exp_kernel = exponential kernel; cauchy_kernel = cauchy kernel.

  • gamma (float, optional) – Parameter present in the various functional forms of similarity.

  • eta (float, optional) – Minimum value to be returned in similarity calculations.

  • train_method (string, optional) – Training method. Options: ‘all_archetypes’ and ‘geometrical_median’.

  • tau (float, optional) – Similarity threshold.

  • verbose (boolean, optional) – Whether to print information during execution.

compute_SPE_contributions(pred, X)

Calculation of SPE contributions for diagnosis based on partial decomposition analysis. Valid for reconstruction models, in which self.has_Y = False.

Parâmetros:

pred, X: numpy.array

Data windows to compute contributions.

Retornos:

SPE_contributions: numpy.array

Contributions to the SPE.

fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)

Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.

  • frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.

  • tune (boolean, optional) – Indicator of automatic hyperparameter tuning.

  • params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.

  • params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.

  • params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.

  • n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.

hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)

Performs hyperparameter tuning using the Optuna library.

Parameters:
  • params (pandas.DataFrame) –

    Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,

    list_possibilities2, …],

    ’types’: [str_type1,

    str_type2, …]},

    index = [str_param1_name, str_param2_name, …])

  • n_trials (int, optional) – Number of trials in the optimization performed by Optuna.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.

  • n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)

Receives parameters from a previously trained model for making predictions and tests without the need for training.

Parameters:
  • limSPE (float) – Detection limit and mean of the SPE.

  • SPE_mean (float) – Mean of the SPE.

  • count_window_size (int) – Window sizes used in count alarms calculation.

  • Mux (pandas.Series) – Means of the X variables in the training period.

  • SDx (pandas.Series) – Standard deviations of the X variables in the training period.

  • Muy (pandas.Series, optional) – Means of the Y variables in the training period.

  • SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.

map_from_X(X)

Receives a data matrix X and returns a matrix of predicted or reconstructed values.

Parameters:

X (numpy.array) – Window X of data for prediction or reconstruction.

Returns:

  • numpy.array

  • Reconstructed X (or predicted Y).

plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)

Plotting the temporal evolution of SPE.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).

  • legends (boolean, optional) – If the graph should display legends.

  • plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.

plot_SPE_contributions(ax=None, train_or_test='test')

Plotting the temporal evolution of SPE contributions on a heatmap.

Parameters:

ax: matplotlib.axes._subplots.AxesSubplot

Axis where the graph will be plotted.

train_or_test: string

Indicates whether to plot the ‘train’ or ‘test’ graph.

plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)

Plotting the temporal evolution of the predictions along with the respective true values.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)

Receives a window of data and prepares it for the model testing.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in the count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)

Receives the data for model training and prepares them for training.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}

predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)

Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

set_hyperparameters(params_dict)

Receives a dict with hyperparameters to be assigned in the model.

Parameters:

params_dict (dict) – Dictionary with hyperparameter values

test(redefine_limit=False, delete_testing_data=False)

Analyzes a window of data, applying a model test. Must be called after the pre_test function.

Parameters:
  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

train(lim_conf=0.99, delete_training_data=False)

Performs the model training. Must be called after the pre_train function.

Parameters:
  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

train_core()

The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().

bibmon.align_dfs_by_rows(df1, df2)

Aligns DataFrames by rows.

Parameters:
  • df1 (pandas.DataFrame) – Original data.

  • df2 (pandas.DataFrame) – Original data.

Returns:

new_df1, new_df2 – Processed data.

Return type:

pandas.DataFrame

bibmon.comparative_table(models, X_train, X_validation, X_test, Y_train=None, Y_validation=None, Y_test=None, lim_conf=0.99, f_pp_train=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp_train=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, logy=True, metrics=None, X_pred_to_plot=None, count_limit=1, count_window_size=0, fault_start=None, fault_end=None, mask=None, times=True, plot_SPE=True, plot_predictions=True, fit_model=True)

Performs complete monitoring analysis of multiple models and builds comparative result tables.

Parameters:
  • models (list of BibMon models) – Models to be considered in the analysis.

  • X_train (pandas.DataFrame or numpy.array) – Training data X.

  • X_validation (pandas.DataFrame or numpy.array) – Validation data X.

  • X_test (pandas.DataFrame or numpy.array) – Test data X.

  • Y_train (pandas.DataFrame or numpy.array, optional) – Training data Y.

  • Y_validation (pandas.DataFrame or numpy.array, optional) – Validation data Y.

  • Y_test (pandas.DataFrame or numpy.array, optional) – Test data Y.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • f_pp_train (list, optional) – List containing strings with names of functions to be used in pre-processing the train data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp_train (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the train data, in the format {‘functionname__argname’: argvalue, …}.

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the test data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the test data, in the format {‘functionname__argname’: argvalue, …}.

  • logy (boolean, optional) – If use logarithmic scale in the SPE plots.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to sound.

  • count_window_size (int, optional) – Window sizes used in count alarm calculation.

  • fault_start (string, optional) – Start timestamp of the fault.

  • fault_end (string, optional) – End timestamp of the fault.

  • mask (numpy.array, optional) – Boolean array indicating the indices where the process is in fault.

  • times (boolean, optional) – If execution times should be calculated.

  • plot_SPE (boolean, optional) – If SPE plots should be plotted.

  • plot_predictions (boolean, optional) – If prediction plots should be plotted.

  • fit_model (boolean, optional) – If models should be trained.

Returns:

List with the generated tables (prediction and/or detection).

Return type:

list of pandas.DataFrames

bibmon.complete_analysis(model, X_train, X_validation, X_test, Y_train=None, Y_validation=None, Y_test=None, lim_conf=0.99, f_pp_train=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp_train=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, logy=True, metrics=None, X_pred_to_plot=None, count_limit=1, count_window_size=0, fault_start=None, fault_end=None)

Performs a complete monitoring analysis, with train, validation, and test.

Parameters:
  • model (BibMon model) – Model to be considered in the analysis.

  • X_train (pandas.DataFrame or numpy.array) – Training data X.

  • X_validation (pandas.DataFrame or numpy.array) – Validation data X.

  • X_test (pandas.DataFrame or numpy.array) – Test data X.

  • Y_train (pandas.DataFrame or numpy.array, optional) – Training data Y.

  • Y_validation (pandas.DataFrame or numpy.array, optional) – Validation data Y.

  • Y_test (pandas.DataFrame or numpy.array, optional) – Test data Y.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • f_pp_train (list, optional) – List containing strings with names of functions to be used in pre-processing the train data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp_train (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the train data, in the format {‘functionname__argname’: argvalue, …}.

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the test data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the test data, in the format {‘functionname__argname’: argvalue, …}.

  • logy (boolean, optional) – If use logarithmic scale in the SPE plots.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to sound.

  • count_window_size (int, optional) – Window sizes used in count alarm calculation.

  • fault_start (string, optional) – Start timestamp of the fault.

  • fault_end (string, optional) – End timestamp of the fault.

bibmon.create_df_with_dates(array, start='2020-01-01 00:00:00', freq='1min')
Parameters:
  • array (pandas.DataFrame or numpy.array) – Original data.

  • start (string, optional) – Start timestamp.

  • freq (string, optional) – Sampling interval.

Returns:

df – Processed data.

Return type:

pandas.DataFrame

bibmon.create_df_with_noise(array, noise_frac, max_index_for_noise)

Adds artificial measurement noise to data.

Parameters:
  • array (pandas.DataFrame or numpy.array) – Original data.

  • noise_frac (float) – Fraction (between 0 and 1) of the total amplitude of the variable that will be used as the noise standard deviation.

  • max_index_for_noise (int) – Maximum index to consider the amplitude in the standard deviation calculation.

Returns:

df – Processed data.

Return type:

pandas.DataFrame

bibmon.load_real_data()

Load a sample of real process data. The variables have been anonymized for availability in the library.

Returns:

Process data.

Return type:

pandas.DataFrame

bibmon.load_tennessee_eastman(train_id=0, test_id=0)

Load the ‘Tennessee Eastman Process’ benchmark data.

Parameters:
  • train_id (int, optional) – Identifier of the training data. No fault: 0. With faults: 1 to 20.

  • test_id (int, optional) – Identifier of the test data. No fault: 0. With faults: 1 to 20.

Returns:

  • train_df (pandas.DataFrame) – Training data.

  • test_df (pandas.DataFrame) – Test data.

class bibmon.sklearnManifold(manifold_model)

Bases: GenericModel

Interface for sklearn manifold learning models.

Parameters:

manifold_model (any manifold model that uses the sklearn interface.) –

For example:
  • sklearn.manifold.MDS,

  • sklearn.manifold.Isomap,

  • sklearn.manifold.TSNE,

  • sklearn.manifold.LocallyLinearEmbedding,

  • etc….

clusters_visualization(X)

Fits the manifold model, transforms the data, and plots the resulting 2D or 3D embedding.

Parameters:

X (array-like or DataFrame) – The data to fit and transform.

compute_SPE_contributions(pred, X)

Calculation of SPE contributions for diagnosis based on partial decomposition analysis. Valid for reconstruction models, in which self.has_Y = False.

Parâmetros:

pred, X: numpy.array

Data windows to compute contributions.

Retornos:

SPE_contributions: numpy.array

Contributions to the SPE.

fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)

Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.

  • frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.

  • tune (boolean, optional) – Indicator of automatic hyperparameter tuning.

  • params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.

  • params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.

  • params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.

  • n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.

fit_transform(X)

Fits the clustering method and returns the transformed data

hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)

Performs hyperparameter tuning using the Optuna library.

Parameters:
  • params (pandas.DataFrame) –

    Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,

    list_possibilities2, …],

    ’types’: [str_type1,

    str_type2, …]},

    index = [str_param1_name, str_param2_name, …])

  • n_trials (int, optional) – Number of trials in the optimization performed by Optuna.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.

  • n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)

Receives parameters from a previously trained model for making predictions and tests without the need for training.

Parameters:
  • limSPE (float) – Detection limit and mean of the SPE.

  • SPE_mean (float) – Mean of the SPE.

  • count_window_size (int) – Window sizes used in count alarms calculation.

  • Mux (pandas.Series) – Means of the X variables in the training period.

  • SDx (pandas.Series) – Standard deviations of the X variables in the training period.

  • Muy (pandas.Series, optional) – Means of the Y variables in the training period.

  • SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.

map_from_X(X_test)

Applies the transformation to a new dataset. Note that some manifold models, like TSNE, may not have a direct transform method.

plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)

Plotting the temporal evolution of SPE.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).

  • legends (boolean, optional) – If the graph should display legends.

  • plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.

plot_SPE_contributions(ax=None, train_or_test='test')

Plotting the temporal evolution of SPE contributions on a heatmap.

Parameters:

ax: matplotlib.axes._subplots.AxesSubplot

Axis where the graph will be plotted.

train_or_test: string

Indicates whether to plot the ‘train’ or ‘test’ graph.

plot_embedding()

Plots the 2D or 3D embedding resulting from the manifold model.

plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)

Plotting the temporal evolution of the predictions along with the respective true values.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)

Receives a window of data and prepares it for the model testing.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in the count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)

Receives the data for model training and prepares them for training.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}

predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)

Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

set_hyperparameters(params_dict)

Sets the hyperparameters for the manifold model.

test(redefine_limit=False, delete_testing_data=False)

Analyzes a window of data, applying a model test. Must be called after the pre_test function.

Parameters:
  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

train(lim_conf=0.99, delete_training_data=False)

Performs the model training. Must be called after the pre_train function.

Parameters:
  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

train_core()

Fits the manifold model using the training data.

transform(X_test)

Transforms the input data using the trained manifold model by calling map_from_X.

Parameters:

X_test (array-like or DataFrame) – The new data to transform.

Returns:

transformed_data – The transformed data.

Return type:

array-like

class bibmon.sklearnRegressor(regressor, permutation_importance=False)

Bases: GenericModel

Interface for sklearn regressors.

Parameters:
  • regressor (any regressor that uses the sklearn interface.) –

    For example:
    • sklearn.svm.classes.SVR,

    • sklearn.ensemble.forest.RandomForestRegressor,

    • sklearn.neural_network.multilayer_perceptron.MLPRegressor,

    • etc….

  • permutation_importance (boolean, optional) – Whether permutation variable importance should be calculated.

compute_SPE_contributions(pred, X)

Calculation of SPE contributions for diagnosis based on partial decomposition analysis. Valid for reconstruction models, in which self.has_Y = False.

Parâmetros:

pred, X: numpy.array

Data windows to compute contributions.

Retornos:

SPE_contributions: numpy.array

Contributions to the SPE.

fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)

Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.

  • frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.

  • tune (boolean, optional) – Indicator of automatic hyperparameter tuning.

  • params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.

  • params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.

  • params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.

  • n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.

hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)

Performs hyperparameter tuning using the Optuna library.

Parameters:
  • params (pandas.DataFrame) –

    Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,

    list_possibilities2, …],

    ’types’: [str_type1,

    str_type2, …]},

    index = [str_param1_name, str_param2_name, …])

  • n_trials (int, optional) – Number of trials in the optimization performed by Optuna.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.

  • n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)

Receives parameters from a previously trained model for making predictions and tests without the need for training.

Parameters:
  • limSPE (float) – Detection limit and mean of the SPE.

  • SPE_mean (float) – Mean of the SPE.

  • count_window_size (int) – Window sizes used in count alarms calculation.

  • Mux (pandas.Series) – Means of the X variables in the training period.

  • SDx (pandas.Series) – Standard deviations of the X variables in the training period.

  • Muy (pandas.Series, optional) – Means of the Y variables in the training period.

  • SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.

map_from_X(X)

Receives a data matrix X and returns a matrix of predicted or reconstructed values.

Parameters:

X (numpy.array) – Window X of data for prediction or reconstruction.

Returns:

  • numpy.array

  • Reconstructed X (or predicted Y).

plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)

Plotting the temporal evolution of SPE.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).

  • legends (boolean, optional) – If the graph should display legends.

  • plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.

plot_SPE_contributions(ax=None, train_or_test='test')

Plotting the temporal evolution of SPE contributions on a heatmap.

Parameters:

ax: matplotlib.axes._subplots.AxesSubplot

Axis where the graph will be plotted.

train_or_test: string

Indicates whether to plot the ‘train’ or ‘test’ graph.

plot_importances(n=None, permutation_importance=False)

Plots the permutation importances of the variables.

Parameters:
  • n (int, optional) – Maximum number of variables to be plotted.

  • permutation_importance (boolean, optional) – If permutation importances should be prioritized over coefficients in linear models.

plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)

Plotting the temporal evolution of the predictions along with the respective true values.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)

Receives a window of data and prepares it for the model testing.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in the count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)

Receives the data for model training and prepares them for training.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}

predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)

Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

set_hyperparameters(params_dict)

Receives a dict with hyperparameters to be assigned in the model.

Parameters:

params_dict (dict) – Dictionary with hyperparameter values

test(redefine_limit=False, delete_testing_data=False)

Analyzes a window of data, applying a model test. Must be called after the pre_test function.

Parameters:
  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

train(lim_conf=0.99, delete_training_data=False)

Performs the model training. Must be called after the pre_train function.

Parameters:
  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

train_core()

The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().

update_importances()

Calculates permutation importances of the variables.

bibmon.spearmanr_dendrogram(df, figsize=(18, 8))

Generates a dendrogram of Spearman correlations.

Parameters:
  • df (pandas.DataFrame) – Dados to be analyzed.

  • figsize (tuple of ints, optional) – Figure dimensions.

bibmon.targets_comparative_table(model, data, start_train, end_train, end_validation, end_test, tags, metrics, fault_start=None, fault_end=None, count_window_size=0, count_limit=1, lim_conf=0.99, tags_X=None, f_pp_train=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp_train=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, mask=None)

This function evaluates the performance of all dependent variables (targets) in a regression model. It systematically tests each target and computes key performance metrics, including fitting metrics such as the R² score (a measure of how well the predictions align with actual data) and mean absolute error (quantifying the average prediction error), and also alarm metrics such as false alarm rate (FAR) (representing the frequency of false positives), and fault detection rate (FDR) (indicating the accuracy in identifying true faults).

Parameters:
  • model (BibMon model) – Model to be considered in the analysis.

  • data (pandas.DataFrame) – DataFrame containing the time series data for the variables of interest.

  • start_train (string) – Start timestamp of the training data.

  • end_train (string) – End timestamp of the training data.

  • end_validation (string) – End timestamp of the validation data.

  • end_test (string) – End timestamp of the test data.

  • tags (list of strings) – List of variable names (tags) to be used as targets (Y) in the model.

  • metrics (list of functions) – Functions for calculating metrics to be displayed in the title of the graph.

  • fault_start (string, optional) – Start timestamp of the fault.

  • fault_end (string, optional) – End timestamp of the fault.

  • count_window_size (int, optional) – Window sizes used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to sound.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • tags_X (list of strings) – Variables to be considered in the X set.

  • f_pp_train (list, optional) – List containing strings with names of functions to be used in pre-processing the train data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp_train (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the train data, in the format {‘functionname__argname’: argvalue, …}.

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the test data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the test data, in the format {‘functionname__argname’: argvalue, …}.

  • mask (numpy.array, optional) – Boolean array indicating the indices where the process is in fault.

Returns:

return_tables – A list containing two or more DataFrames, depending on the parameters passed:

  • Prediction table: A DataFrame containing prediction metrics (such as R² score and mean absolute error) for training, validation, and test regression. The table is multi-indexed, with tags (variables) and metrics as the indices. Each metric is calculated for each tag and data split (train, validation, test).

  • Detection table: (Optional) A DataFrame that includes detection metrics, such as False Detection Rate (FDR) and False Alarm Rate (FAR), for each tag and fault detection alarm. This table is returned if fault_start is provided, and it includes performance information related to the model’s ability to detect faults during the specified period. It is also multi-indexed, with tags and alarms as the indices.

Return type:

list of pandas.DataFrame

bibmon.train_val_test_split(data, start_train, end_train, end_validation, end_test, tags_X=None, tags_Y=None)

Separates the data into consecutive portions of train, validation, and test, returning 3 DataFrames. It can also separate into predictor variables (X) and predicted variables (Y), which in this case will return 6 DataFrames.

Parameters:
  • data (pandas.DataFrame) – Data to be separated.

  • start_train (string) – Start timestamp of the train portion.

  • end_train (string) – End timestamp of the train portion.

  • end_validation (string) – End timestamp of the validation portion.

  • end_test (string) – End timestamp of the test portion.

  • tags_X (list of strings) – Variables to be considered in the X set.

  • tags_Y (list of strings) – Variables to be considered in the Y set.

Returns:

Separated data.

Return type:

pandas.DataFrames