API Reference

class bibmon.Autoencoder(hidden_layer_sizes=(2,), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)

Bases: GenericModel

Autoencoder using sklearn’s MLPRegressor interface. For details on the parameters for input, see https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html

fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)

Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.

  • frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.

  • tune (boolean, optional) – Indicator of automatic hyperparameter tuning.

  • params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.

  • params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.

  • params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.

  • n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.

hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)

Performs hyperparameter tuning using the Optuna library.

Parameters:
  • params (pandas.DataFrame) –

    Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,

    list_possibilities2, …],

    ’types’: [str_type1,

    str_type2, …]},

    index = [str_param1_name, str_param2_name, …])

  • n_trials (int, optional) – Number of trials in the optimization performed by Optuna.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.

  • n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)

Receives parameters from a previously trained model for making predictions and tests without the need for training.

Parameters:
  • limSPE (float) – Detection limit and mean of the SPE.

  • SPE_mean (float) – Mean of the SPE.

  • count_window_size (int) – Window sizes used in count alarms calculation.

  • Mux (pandas.Series) – Means of the X variables in the training period.

  • SDx (pandas.Series) – Standard deviations of the X variables in the training period.

  • Muy (pandas.Series, optional) – Means of the Y variables in the training period.

  • SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.

map_from_X(X)

Receives a data matrix X and returns a matrix of predicted or reconstructed values.

Parameters:

X (numpy.array) – Window X of data for prediction or reconstruction.

Returns:

  • numpy.array

  • Reconstructed X (or predicted Y).

plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)

Plotting the temporal evolution of SPE.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).

  • legends (boolean, optional) – If the graph should display legends.

  • plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.

plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)

Plotting the temporal evolution of the predictions along with the respective true values.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)

Receives a window of data and prepares it for the model testing.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in the count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)

Receives the data for model training and prepares them for training.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}

predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)

Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

set_hyperparameters(params_dict)

Receives a dict with hyperparameters to be assigned in the model.

Parameters:

params_dict (dict) – Dictionary with hyperparameter values

test(redefine_limit=False, delete_testing_data=False)

Analyzes a window of data, applying a model test. Must be called after the pre_test function.

Parameters:
  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

train(lim_conf=0.99, delete_training_data=False)

Performs the model training. Must be called after the pre_train function.

Parameters:
  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

train_core()

The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().

class bibmon.ESN(n_reservoir=400, spectral_radius=0.95, sparsity=0.95, noise=0.01, input_shift=None, input_scaling=None, teacher_forcing=False, feedback_scaling=None, teacher_scaling=None, teacher_shift=None, out_activation=<function linear>, inverse_out_activation=<function linear>, random_state=None, silent=True)

Bases: GenericModel

Echo State Networks.

For details on the technique, see the paper by Lemos et al. (2021) - Echo State Network Based Soft Sensor for Monitoring and Fault Detection of Industrial Processes, https://doi.org/10.1016/j.compchemeng.2021.107512

This code has been modified and adapted from the following repository: https://github.com/cknd/pyESN

Parameters:
  • n_reservoir (int, optional) – Number of neurons in the reservoir.

  • spectral_radius (float, optional) – Spectral radius of the recurrent weight matrix.

  • sparsity (float, optional) – Proportion of recurrent weights set to zero.

  • noise (float, optional) – Noise added to each neuron (regularization).

  • input_shift (float or numpy.array) – Scalar or vector of length n_inputs to be added to each input dimension before feeding it to the network.

  • input_scaling (float or numpy.array) – Scalar or vector of length n_inputs to be multiplied with each input dimension before feeding it to the network.

  • teacher_forcing (boolean, optional) – If True, results in an ESN with output layer recursion to the dynamic reservoir.

  • teacher_scaling (float, optional) – Factor applied to the target signal.

  • teacher_shift (float, optional) – Additive term applied to the target signal.

  • out_activation (func, optional) – Output activation function (applied to the readout).

  • inverse_out_activation (func, optional) – Inverse of the output activation function.

  • random_state (int or np.rand.RandomState, optional) – Positive integer seed, np.rand.RandomState object, or None to use numpy’s builting RandomState.

  • silent (boolean, optional) – Suppress messages.

fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)

Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.

  • frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.

  • tune (boolean, optional) – Indicator of automatic hyperparameter tuning.

  • params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.

  • params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.

  • params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.

  • n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.

hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)

Performs hyperparameter tuning using the Optuna library.

Parameters:
  • params (pandas.DataFrame) –

    Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,

    list_possibilities2, …],

    ’types’: [str_type1,

    str_type2, …]},

    index = [str_param1_name, str_param2_name, …])

  • n_trials (int, optional) – Number of trials in the optimization performed by Optuna.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.

  • n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

initweights()

Initializes the weights of the network.

load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)

Receives parameters from a previously trained model for making predictions and tests without the need for training.

Parameters:
  • limSPE (float) – Detection limit and mean of the SPE.

  • SPE_mean (float) – Mean of the SPE.

  • count_window_size (int) – Window sizes used in count alarms calculation.

  • Mux (pandas.Series) – Means of the X variables in the training period.

  • SDx (pandas.Series) – Standard deviations of the X variables in the training period.

  • Muy (pandas.Series, optional) – Means of the Y variables in the training period.

  • SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.

map_from_X(X, continuation=True)

Apply the learned weights to the network’s reactions to new inputs.

Parameters:
  • inputs (numpy.array) – Inputs of shape (N_test_samples x n_inputs)

  • continuation (boolean, optional) – If True, start the network from the last training state.

Returns:

Output activation matrix.

Return type:

numpy.array

plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)

Plotting the temporal evolution of SPE.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).

  • legends (boolean, optional) – If the graph should display legends.

  • plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.

plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)

Plotting the temporal evolution of the predictions along with the respective true values.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)

Receives a window of data and prepares it for the model testing.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in the count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

pre_train(*args, **kwargs)

Receives the data for model training and prepares them for training.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}

predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)

Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

set_hyperparameters(params_dict)

Receives a dict with hyperparameters to be assigned in the model.

Parameters:

params_dict (dict) – Dictionary with hyperparameter values

test(redefine_limit=False, delete_testing_data=False)

Analyzes a window of data, applying a model test. Must be called after the pre_test function.

Parameters:
  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

train(lim_conf=0.99, delete_training_data=False)

Performs the model training. Must be called after the pre_train function.

Parameters:
  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

train_core()

Harvest the network’s reaction to training data, train the readout weights. The result is the output of the network on the training data, using the trained weights.

class bibmon.PCA(ncomp=0.9)

Bases: GenericModel

Principal Component Analysis.

For details on the technique, see https://doi.org/10.3390/pr12020251

Parameters:

ncomp (int or float) –

float: number between 0.0 and 1.0 that corresponds to the minimum

fraction of accumulated variance for component selection;

int: defines the number of components.

fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)

Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.

  • frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.

  • tune (boolean, optional) – Indicator of automatic hyperparameter tuning.

  • params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.

  • params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.

  • params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.

  • n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.

hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)

Performs hyperparameter tuning using the Optuna library.

Parameters:
  • params (pandas.DataFrame) –

    Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,

    list_possibilities2, …],

    ’types’: [str_type1,

    str_type2, …]},

    index = [str_param1_name, str_param2_name, …])

  • n_trials (int, optional) – Number of trials in the optimization performed by Optuna.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.

  • n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, S, V, n)

Receives parameters from a previously trained model to perform predictions and tests without the need for training.

Parameters:
  • Mux (pandas.Series, optional) – Means of the X variables in the training period.

  • SDx (pandas.Series, optional) – Standard deviations of the X variables in the training period.

  • Mux – Means of the Y variables in the training period.

  • SDx – Standard deviations of the Y variables in the training period.

  • limSPE (float) – Detection limit and mean of the SPE.

  • SPE_mean (float) – Mean of the SPE.

  • count_window_size (int) – Window sizes used in the count alarms calculation.

  • S (numpy.array) – Specific parameter of PCA.

  • V (numpy.array) – Specific parameter of PCA.

  • n (int) – Specific parameter of PCA.

map_from_X(X)

Receives a data matrix X and returns a matrix of predicted or reconstructed values.

Parameters:

X (numpy.array) – Window X of data for prediction or reconstruction.

Returns:

  • numpy.array

  • Reconstructed X (or predicted Y).

plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)

Plotting the temporal evolution of SPE.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).

  • legends (boolean, optional) – If the graph should display legends.

  • plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.

plot_cumulative_variance(ax=None)

Plots the cumulative variance.

Parameters:

ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)

Plotting the temporal evolution of the predictions along with the respective true values.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)

Receives a window of data and prepares it for the model testing.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in the count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)

Receives the data for model training and prepares them for training.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}

predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)

Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

set_hyperparameters(params_dict)

Receives a dict with hyperparameters to be assigned in the model.

Parameters:

params_dict (dict) – Dictionary with hyperparameter values

test(redefine_limit=False, delete_testing_data=False)

Analyzes a window of data, applying a model test. Must be called after the pre_test function.

Parameters:
  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

train(lim_conf=0.99, delete_training_data=False)

Performs the model training. Must be called after the pre_train function.

Parameters:
  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

train_core()

The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().

class bibmon.PreProcess(f_pp=None, a_pp=None, is_Y=False)

Bases: object

Class used to encapsulate data preprocessing methods.

Parameters:
  • f_pp (list, optional) – List containing strings with names of methods to be used in the preprocessing of the train data. The list of methods is shown below.

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform preprocessing of the train data, in the format {‘functionname__argname’: argvalue, …}

  • is_Y (boolean, optional) – If the data being preprocessed is Y (that is, to be predicted).

Methods:

  • Variable selection:

    remove_empty_variables(); remove_frozen_variables()

  • Missing values imputation:

    ffill() remove_observations_with_nan(); replace_nan_with_values()

  • Normalization:

    back_to_units(); normalize()

  • Adding dynamics:

    apply_lag(); add_moving_average()

  • Noise treatment:

    moving_average_filter()

add_moving_average(df, train_or_test='train', WS=10)

Adding variables filtered by moving average. Attention! Do not confuse with moving_average_filter, in which the original variables are not kept in the dataset.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • WS (int, optional) – Window size of the filter.

Returns:

  • pandas.DataFrame

  • Processed data.

apply(df, train_or_test='train')

Sequentially applies the preprocessing functions defined during initialization.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

Returns:

Processed data.

Return type:

pandas.DataFrame

apply_lag(df, train_or_test='train', lag=1)

Generation of time-delayed variables.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • lag (int, optional) – Number of delays to be considered.

Returns:

  • pandas.DataFrame

  • Processed data.

back_to_units(df)

Returns the variables to the original scale, reverting effects of a normalization.

Parameters:

df (pandas.DataFrame) – Data to be processed.

Returns:

  • pandas.DataFrame

  • Processed data.

ffill_nan(df, train_or_test='train')

Fills missing (NaN) values with the last valid value. Uses the next valid value if there is no last available.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

Returns:

Processed data.

Return type:

pandas.DataFrame

moving_average_filter(df, train_or_test='train', WS=10)

Moving average noise filter.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • WS (int, optional) – Window size of the filter.

Returns:

  • pandas.DataFrame

  • Processed data.

normalize(df, train_or_test='train', mode='standard')

Variable normalization.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • mode (string, optional) – Type of normalization (standard, robust, m-robust or s-robust).

Returns:

  • pandas.DataFrame

  • Processed data.

remove_empty_variables(df, train_or_test='train')

Removes variables with no values.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

Returns:

Processed data.

Return type:

pandas.DataFrame

remove_frozen_variables(df, train_or_test='train', threshold=1e-06)

Removes variables whose variation falls below a given limit.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • threshold (float, optional) – Variance limit to consider a variable as frozen.

Returns:

Processed data.

Return type:

pandas.DataFrame

remove_observations_with_nan(df, train_or_test='train')

Removes observations with missing data (NaN).

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

Returns:

Processed data.

Return type:

pandas.DataFrame

replace_nan_with_values(df, train_or_test='train', val=0)

Replaces missing data (NaN) with a predefined value.

Parameters:
  • df (pandas.DataFrame) – Data to be processed.

  • train_or_test (string, optional) – Indicates which step the data corresponds to.

  • val (int or float) – Value to be used in the replacement.

Returns:

  • pandas.DataFrame

  • Processed data.

class bibmon.SBM(p=2.0, functional_form='rbf', gamma=1.0, eta=1e-10, train_method='geometrical_median', tau=1e-10, verbose=False)

Bases: GenericModel

Similarity-based method (SBM).

For details on the technique, see the papers:

http://www.pee.ufrj.br/index.php/pt/producao-academica/teses-de-doutorado/tese-1/2016033299-similarity-based-methods-for-machine-diagnosis/file

Parameters:
  • p (float, optional) – p-value for the definition of the norm.

  • functional_form (string, optional) – Functional form to be used in the similarity calculation. rbf = radial basis function; ies = inverse euclidean similarity; iqk = inverse quadratic kernel; exp_kernel = exponential kernel; cauchy_kernel = cauchy kernel.

  • gamma (float, optional) – Parameter present in the various functional forms of similarity.

  • eta (float, optional) – Minimum value to be returned in similarity calculations.

  • train_method (string, optional) – Training method. Options: ‘all_archetypes’ and ‘geometrical_median’.

  • tau (float, optional) – Similarity threshold.

  • verbose (boolean, optional) – Whether to print information during execution.

fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)

Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.

  • frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.

  • tune (boolean, optional) – Indicator of automatic hyperparameter tuning.

  • params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.

  • params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.

  • params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.

  • n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.

hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)

Performs hyperparameter tuning using the Optuna library.

Parameters:
  • params (pandas.DataFrame) –

    Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,

    list_possibilities2, …],

    ’types’: [str_type1,

    str_type2, …]},

    index = [str_param1_name, str_param2_name, …])

  • n_trials (int, optional) – Number of trials in the optimization performed by Optuna.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.

  • n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)

Receives parameters from a previously trained model for making predictions and tests without the need for training.

Parameters:
  • limSPE (float) – Detection limit and mean of the SPE.

  • SPE_mean (float) – Mean of the SPE.

  • count_window_size (int) – Window sizes used in count alarms calculation.

  • Mux (pandas.Series) – Means of the X variables in the training period.

  • SDx (pandas.Series) – Standard deviations of the X variables in the training period.

  • Muy (pandas.Series, optional) – Means of the Y variables in the training period.

  • SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.

map_from_X(X)

Receives a data matrix X and returns a matrix of predicted or reconstructed values.

Parameters:

X (numpy.array) – Window X of data for prediction or reconstruction.

Returns:

  • numpy.array

  • Reconstructed X (or predicted Y).

plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)

Plotting the temporal evolution of SPE.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).

  • legends (boolean, optional) – If the graph should display legends.

  • plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.

plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)

Plotting the temporal evolution of the predictions along with the respective true values.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)

Receives a window of data and prepares it for the model testing.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in the count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)

Receives the data for model training and prepares them for training.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}

predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)

Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

set_hyperparameters(params_dict)

Receives a dict with hyperparameters to be assigned in the model.

Parameters:

params_dict (dict) – Dictionary with hyperparameter values

test(redefine_limit=False, delete_testing_data=False)

Analyzes a window of data, applying a model test. Must be called after the pre_test function.

Parameters:
  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

train(lim_conf=0.99, delete_training_data=False)

Performs the model training. Must be called after the pre_train function.

Parameters:
  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

train_core()

The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().

bibmon.align_dfs_by_rows(df1, df2)

Aligns DataFrames by rows.

Parameters:
  • df1 (pandas.DataFrame) – Original data.

  • df2 (pandas.DataFrame) – Original data.

Returns:

new_df1, new_df2 – Processed data.

Return type:

pandas.DataFrame

bibmon.comparative_table(models, X_train, X_validation, X_test, Y_train=None, Y_validation=None, Y_test=None, lim_conf=0.99, f_pp_train=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp_train=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, logy=True, metrics=None, X_pred_to_plot=None, count_limit=1, count_window_size=0, fault_start=None, fault_end=None, mask=None, times=True, plot_SPE=True, plot_predictions=True, fit_model=True)

Performs complete monitoring analysis of multiple models and builds comparative result tables.

Parameters:
  • models (list of BibMon models) – Models to be considered in the analysis.

  • X_train (pandas.DataFrame or numpy.array) – Training data X.

  • X_validation (pandas.DataFrame or numpy.array) – Validation data X.

  • X_test (pandas.DataFrame or numpy.array) – Test data X.

  • Y_train (pandas.DataFrame or numpy.array, optional) – Training data Y.

  • Y_validation (pandas.DataFrame or numpy.array, optional) – Validation data Y.

  • Y_test (pandas.DataFrame or numpy.array, optional) – Test data Y.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • f_pp_train (list, optional) – List containing strings with names of functions to be used in pre-processing the train data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp_train (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the train data, in the format {‘functionname__argname’: argvalue, …}.

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the test data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the test data, in the format {‘functionname__argname’: argvalue, …}.

  • logy (boolean, optional) – If use logarithmic scale in the SPE plots.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to sound.

  • count_window_size (int, optional) – Window sizes used in count alarm calculation.

  • fault_start (string, optional) – Start timestamp of the fault.

  • fault_end (string, optional) – End timestamp of the fault.

  • mask (numpy.array, optional) – Boolean array indicating the indices where the process is in fault.

  • times (boolean, optional) – If execution times should be calculated.

  • plot_SPE (boolean, optional) – If SPE plots should be plotted.

  • plot_predictions (boolean, optional) – If prediction plots should be plotted.

  • fit_model (boolean, optional) – If models should be trained.

Returns:

List with the generated tables (prediction and/or detection).

Return type:

list of pandas.DataFrames

bibmon.complete_analysis(model, X_train, X_validation, X_test, Y_train=None, Y_validation=None, Y_test=None, lim_conf=0.99, f_pp_train=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp_train=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, logy=True, metrics=None, X_pred_to_plot=None, count_limit=1, count_window_size=0, fault_start=None, fault_end=None)

Performs a complete monitoring analysis, with train, validation, and test.

Parameters:
  • model (BibMon model) – Model to be considered in the analysis.

  • X_train (pandas.DataFrame or numpy.array) – Training data X.

  • X_validation (pandas.DataFrame or numpy.array) – Validation data X.

  • X_test (pandas.DataFrame or numpy.array) – Test data X.

  • Y_train (pandas.DataFrame or numpy.array, optional) – Training data Y.

  • Y_validation (pandas.DataFrame or numpy.array, optional) – Validation data Y.

  • Y_test (pandas.DataFrame or numpy.array, optional) – Test data Y.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • f_pp_train (list, optional) – List containing strings with names of functions to be used in pre-processing the train data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp_train (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the train data, in the format {‘functionname__argname’: argvalue, …}.

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the test data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the test data, in the format {‘functionname__argname’: argvalue, …}.

  • logy (boolean, optional) – If use logarithmic scale in the SPE plots.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to sound.

  • count_window_size (int, optional) – Window sizes used in count alarm calculation.

  • fault_start (string, optional) – Start timestamp of the fault.

  • fault_end (string, optional) – End timestamp of the fault.

bibmon.create_df_with_dates(array, start='2020-01-01 00:00:00', freq='1min')
Parameters:
  • array (pandas.DataFrame or numpy.array) – Original data.

  • start (string, optional) – Start timestamp.

  • freq (string, optional) – Sampling interval.

Returns:

df – Processed data.

Return type:

pandas.DataFrame

bibmon.create_df_with_noise(array, noise_frac, max_index_for_noise)

Adds artificial measurement noise to data.

Parameters:
  • array (pandas.DataFrame or numpy.array) – Original data.

  • noise_frac (float) – Fraction (between 0 and 1) of the total amplitude of the variable that will be used as the noise standard deviation.

  • max_index_for_noise (int) – Maximum index to consider the amplitude in the standard deviation calculation.

Returns:

df – Processed data.

Return type:

pandas.DataFrame

bibmon.load_real_data()

Load a sample of real process data. The variables have been anonymized for availability in the library.

Returns:

Process data.

Return type:

pandas.DataFrame

bibmon.load_tennessee_eastman(train_id=0, test_id=0)

Load the ‘Tennessee Eastman Process’ benchmark data.

Parameters:
  • train_id (int, optional) – Identifier of the training data. No fault: 0. With faults: 1 to 20.

  • test_id (int, optional) – Identifier of the test data. No fault: 0. With faults: 1 to 20.

Returns:

  • train_df (pandas.DataFrame) – Training data.

  • test_df (pandas.DataFrame) – Test data.

class bibmon.sklearnRegressor(regressor, permutation_importance=False)

Bases: GenericModel

Interface for sklearn regressors.

Parameters:
  • regressor (any regressor that uses the sklearn interface.) –

    For example:
    • sklearn.svm.classes.SVR,

    • sklearn.ensemble.forest.RandomForestRegressor,

    • sklearn.neural_network.multilayer_perceptron.MLPRegressor,

    • etc….

  • permutation_importance (boolean, optional) – Whether permutation variable importance should be calculated.

fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)

Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.

  • frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.

  • tune (boolean, optional) – Indicator of automatic hyperparameter tuning.

  • params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.

  • params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.

  • params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.

  • n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.

hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)

Performs hyperparameter tuning using the Optuna library.

Parameters:
  • params (pandas.DataFrame) –

    Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,

    list_possibilities2, …],

    ’types’: [str_type1,

    str_type2, …]},

    index = [str_param1_name, str_param2_name, …])

  • n_trials (int, optional) – Number of trials in the optimization performed by Optuna.

  • lim_conf (float, optional) – Confidence limit for the detection index.

  • percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.

  • n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)

Receives parameters from a previously trained model for making predictions and tests without the need for training.

Parameters:
  • limSPE (float) – Detection limit and mean of the SPE.

  • SPE_mean (float) – Mean of the SPE.

  • count_window_size (int) – Window sizes used in count alarms calculation.

  • Mux (pandas.Series) – Means of the X variables in the training period.

  • SDx (pandas.Series) – Standard deviations of the X variables in the training period.

  • Muy (pandas.Series, optional) – Means of the Y variables in the training period.

  • SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.

map_from_X(X)

Receives a data matrix X and returns a matrix of predicted or reconstructed values.

Parameters:

X (numpy.array) – Window X of data for prediction or reconstruction.

Returns:

  • numpy.array

  • Reconstructed X (or predicted Y).

plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)

Plotting the temporal evolution of SPE.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).

  • legends (boolean, optional) – If the graph should display legends.

  • plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.

plot_importances(n=None, permutation_importance=False)

Plots the permutation importances of the variables.

Parameters:
  • n (int, optional) – Maximum number of variables to be plotted.

  • permutation_importance (boolean, optional) – If permutation importances should be prioritized over coefficients in linear models.

plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)

Plotting the temporal evolution of the predictions along with the respective true values.

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.

  • train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.

  • X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.

  • metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.

pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)

Receives a window of data and prepares it for the model testing.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in the count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)

Receives the data for model training and prepares them for training.

Parameters:
  • X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.

  • Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}

  • a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}

predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)

Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.

Parameters:
  • X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.

  • Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.

  • count_window_size (int, optional) – Window size used in count alarm calculation.

  • count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.

  • f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).

  • a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

set_hyperparameters(params_dict)

Receives a dict with hyperparameters to be assigned in the model.

Parameters:

params_dict (dict) – Dictionary with hyperparameter values

test(redefine_limit=False, delete_testing_data=False)

Analyzes a window of data, applying a model test. Must be called after the pre_test function.

Parameters:
  • redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.

  • delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.

train(lim_conf=0.99, delete_training_data=False)

Performs the model training. Must be called after the pre_train function.

Parameters:
  • lim_conf (float, optional) – Confidence limit for the detection index.

  • delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.

train_core()

The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().

update_importances()

Calculates permutation importances of the variables.

bibmon.spearmanr_dendrogram(df, figsize=(18, 8))

Generates a dendrogram of Spearman correlations.

Parameters:
  • df (pandas.DataFrame) – Dados to be analyzed.

  • figsize (tuple of ints, optional) – Figure dimensions.

bibmon.train_val_test_split(data, start_train, end_train, end_validation, end_test, tags_X=None, tags_Y=None)

Separates the data into consecutive portions of train, validation, and test, returning 3 DataFrames. It can also separate into predictor variables (X) and predicted variables (Y), which in this case will return 6 DataFrames.

Parameters:
  • data (pandas.DataFrame) – Data to be separated.

  • start_train (string) – Start timestamp of the train portion.

  • end_train (string) – End timestamp of the train portion.

  • end_validation (string) – End timestamp of the validation portion.

  • end_test (string) – End timestamp of the test portion.

  • tags_X (list of strings) – Variables to be considered in the X set.

  • tags_Y (list of strings) – Variables to be considered in the Y set.

Returns:

Separated data.

Return type:

pandas.DataFrames