API Reference
- class bibmon.Autoencoder(hidden_layer_sizes=(2,), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)
Bases:
GenericModelAutoencoder using sklearn’s MLPRegressor interface. For details on the parameters for input, see https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html
- fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)
Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.
- Parameters:
X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.
Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.
lim_conf (float, optional) – Confidence limit for the detection index.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.
frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.
tune (boolean, optional) – Indicator of automatic hyperparameter tuning.
params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.
params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.
params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.
n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.
- hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)
Performs hyperparameter tuning using the Optuna library.
- Parameters:
params (pandas.DataFrame) –
Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,
list_possibilities2, …],
- ’types’: [str_type1,
str_type2, …]},
index = [str_param1_name, str_param2_name, …])
n_trials (int, optional) – Number of trials in the optimization performed by Optuna.
lim_conf (float, optional) – Confidence limit for the detection index.
percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.
n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
- load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)
Receives parameters from a previously trained model for making predictions and tests without the need for training.
- Parameters:
limSPE (float) – Detection limit and mean of the SPE.
SPE_mean (float) – Mean of the SPE.
count_window_size (int) – Window sizes used in count alarms calculation.
Mux (pandas.Series) – Means of the X variables in the training period.
SDx (pandas.Series) – Standard deviations of the X variables in the training period.
Muy (pandas.Series, optional) – Means of the Y variables in the training period.
SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.
- map_from_X(X)
Receives a data matrix X and returns a matrix of predicted or reconstructed values.
- Parameters:
X (numpy.array) – Window X of data for prediction or reconstruction.
- Returns:
numpy.array
Reconstructed X (or predicted Y).
- plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)
Plotting the temporal evolution of SPE.
- Parameters:
ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.
train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.
logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).
legends (boolean, optional) – If the graph should display legends.
plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.
- plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)
Plotting the temporal evolution of the predictions along with the respective true values.
- Parameters:
ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.
train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.
X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.
metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.
- pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)
Receives a window of data and prepares it for the model testing.
- Parameters:
X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.
Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.
count_window_size (int, optional) – Window size used in the count alarm calculation.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}
- pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)
Receives the data for model training and prepares them for training.
- Parameters:
X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.
Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}
- predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)
Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.
- Parameters:
X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.
Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.
count_window_size (int, optional) – Window size used in count alarm calculation.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}
delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.
- set_hyperparameters(params_dict)
Receives a dict with hyperparameters to be assigned in the model.
- Parameters:
params_dict (dict) – Dictionary with hyperparameter values
- test(redefine_limit=False, delete_testing_data=False)
Analyzes a window of data, applying a model test. Must be called after the pre_test function.
- Parameters:
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.
delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.
- train(lim_conf=0.99, delete_training_data=False)
Performs the model training. Must be called after the pre_train function.
- Parameters:
lim_conf (float, optional) – Confidence limit for the detection index.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
- train_core()
The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().
- class bibmon.ESN(n_reservoir=400, spectral_radius=0.95, sparsity=0.95, noise=0.01, input_shift=None, input_scaling=None, teacher_forcing=False, feedback_scaling=None, teacher_scaling=None, teacher_shift=None, out_activation=<function linear>, inverse_out_activation=<function linear>, random_state=None, silent=True)
Bases:
GenericModelEcho State Networks.
For details on the technique, see the paper by Lemos et al. (2021) - Echo State Network Based Soft Sensor for Monitoring and Fault Detection of Industrial Processes, https://doi.org/10.1016/j.compchemeng.2021.107512
This code has been modified and adapted from the following repository: https://github.com/cknd/pyESN”
- Parameters:
n_reservoir (int, optional) – Number of neurons in the reservoir.
spectral_radius (float, optional) – Spectral radius of the recurrent weight matrix.
sparsity (float, optional) – Proportion of recurrent weights set to zero.
noise (float, optional) – Noise added to each neuron (regularization).
input_shift (float or numpy.array) – Scalar or vector of length n_inputs to be added to each input dimension before feeding it to the network.
input_scaling (float or numpy.array) – Scalar or vector of length n_inputs to be multiplied with each input dimension before feeding it to the network.
teacher_forcing (boolean, optional) – If True, results in an ESN with output layer recursion to the dynamic reservoir.
teacher_scaling (float, optional) – Factor applied to the target signal.
teacher_shift (float, optional) – Additive term applied to the target signal.
out_activation (func, optional) – Output activation function (applied to the readout).
inverse_out_activation (func, optional) – Inverse of the output activation function.
random_state (int or np.rand.RandomState, optional) – Positive integer seed, np.rand.RandomState object, or None to use numpy’s builting RandomState.
silent (boolean, optional) – Suppress messages.
- fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)
Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.
- Parameters:
X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.
Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.
lim_conf (float, optional) – Confidence limit for the detection index.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.
frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.
tune (boolean, optional) – Indicator of automatic hyperparameter tuning.
params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.
params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.
params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.
n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.
- hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)
Performs hyperparameter tuning using the Optuna library.
- Parameters:
params (pandas.DataFrame) –
Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,
list_possibilities2, …],
- ’types’: [str_type1,
str_type2, …]},
index = [str_param1_name, str_param2_name, …])
n_trials (int, optional) – Number of trials in the optimization performed by Optuna.
lim_conf (float, optional) – Confidence limit for the detection index.
percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.
n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
- initweights()
Initializes the weights of the network.
- load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)
Receives parameters from a previously trained model for making predictions and tests without the need for training.
- Parameters:
limSPE (float) – Detection limit and mean of the SPE.
SPE_mean (float) – Mean of the SPE.
count_window_size (int) – Window sizes used in count alarms calculation.
Mux (pandas.Series) – Means of the X variables in the training period.
SDx (pandas.Series) – Standard deviations of the X variables in the training period.
Muy (pandas.Series, optional) – Means of the Y variables in the training period.
SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.
- map_from_X(X, continuation=True)
Apply the learned weights to the network’s reactions to new inputs.
- Parameters:
inputs (numpy.array) – Inputs of shape (N_test_samples x n_inputs)
continuation (boolean, optional) – If True, start the network from the last training state.
- Returns:
Output activation matrix.
- Return type:
numpy.array
- plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)
Plotting the temporal evolution of SPE.
- Parameters:
ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.
train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.
logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).
legends (boolean, optional) – If the graph should display legends.
plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.
- plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)
Plotting the temporal evolution of the predictions along with the respective true values.
- Parameters:
ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.
train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.
X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.
metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.
- pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)
Receives a window of data and prepares it for the model testing.
- Parameters:
X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.
Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.
count_window_size (int, optional) – Window size used in the count alarm calculation.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}
- pre_train(*args, **kwargs)
Receives the data for model training and prepares them for training.
- Parameters:
X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.
Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}
- predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)
Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.
- Parameters:
X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.
Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.
count_window_size (int, optional) – Window size used in count alarm calculation.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}
delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.
- set_hyperparameters(params_dict)
Receives a dict with hyperparameters to be assigned in the model.
- Parameters:
params_dict (dict) – Dictionary with hyperparameter values
- test(redefine_limit=False, delete_testing_data=False)
Analyzes a window of data, applying a model test. Must be called after the pre_test function.
- Parameters:
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.
delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.
- train(lim_conf=0.99, delete_training_data=False)
Performs the model training. Must be called after the pre_train function.
- Parameters:
lim_conf (float, optional) – Confidence limit for the detection index.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
- train_core()
Harvest the network’s reaction to training data, train the readout weights. The result is the output of the network on the training data, using the trained weights.
- class bibmon.PCA(ncomp=0.9)
Bases:
GenericModelPrincipal Component Analysis.
For details on the technique, see https://doi.org/10.3390/pr12020251
- Parameters:
ncomp (int or float) –
- float: number between 0.0 and 1.0 that corresponds to the minimum
fraction of accumulated variance for component selection;
int: defines the number of components.
- fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)
Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.
- Parameters:
X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.
Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.
lim_conf (float, optional) – Confidence limit for the detection index.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.
frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.
tune (boolean, optional) – Indicator of automatic hyperparameter tuning.
params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.
params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.
params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.
n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.
- hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)
Performs hyperparameter tuning using the Optuna library.
- Parameters:
params (pandas.DataFrame) –
Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,
list_possibilities2, …],
- ’types’: [str_type1,
str_type2, …]},
index = [str_param1_name, str_param2_name, …])
n_trials (int, optional) – Number of trials in the optimization performed by Optuna.
lim_conf (float, optional) – Confidence limit for the detection index.
percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.
n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
- load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, S, V, n)
Receives parameters from a previously trained model to perform predictions and tests without the need for training.
- Parameters:
Mux (pandas.Series, optional) – Means of the X variables in the training period.
SDx (pandas.Series, optional) – Standard deviations of the X variables in the training period.
Mux – Means of the Y variables in the training period.
SDx – Standard deviations of the Y variables in the training period.
limSPE (float) – Detection limit and mean of the SPE.
SPE_mean (float) – Mean of the SPE.
count_window_size (int) – Window sizes used in the count alarms calculation.
S (numpy.array) – Specific parameter of PCA.
V (numpy.array) – Specific parameter of PCA.
n (int) – Specific parameter of PCA.
- map_from_X(X)
Receives a data matrix X and returns a matrix of predicted or reconstructed values.
- Parameters:
X (numpy.array) – Window X of data for prediction or reconstruction.
- Returns:
numpy.array
Reconstructed X (or predicted Y).
- plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)
Plotting the temporal evolution of SPE.
- Parameters:
ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.
train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.
logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).
legends (boolean, optional) – If the graph should display legends.
plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.
- plot_cumulative_variance(ax=None)
Plots the cumulative variance.
- Parameters:
ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.
- plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)
Plotting the temporal evolution of the predictions along with the respective true values.
- Parameters:
ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.
train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.
X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.
metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.
- pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)
Receives a window of data and prepares it for the model testing.
- Parameters:
X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.
Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.
count_window_size (int, optional) – Window size used in the count alarm calculation.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}
- pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)
Receives the data for model training and prepares them for training.
- Parameters:
X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.
Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}
- predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)
Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.
- Parameters:
X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.
Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.
count_window_size (int, optional) – Window size used in count alarm calculation.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}
delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.
- set_hyperparameters(params_dict)
Receives a dict with hyperparameters to be assigned in the model.
- Parameters:
params_dict (dict) – Dictionary with hyperparameter values
- test(redefine_limit=False, delete_testing_data=False)
Analyzes a window of data, applying a model test. Must be called after the pre_test function.
- Parameters:
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.
delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.
- train(lim_conf=0.99, delete_training_data=False)
Performs the model training. Must be called after the pre_train function.
- Parameters:
lim_conf (float, optional) – Confidence limit for the detection index.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
- train_core()
The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().
- class bibmon.PreProcess(f_pp=None, a_pp=None, is_Y=False)
Bases:
objectClass used to encapsulate data preprocessing methods.
- Parameters:
f_pp (list, optional) – List containing strings with names of methods to be used in the preprocessing of the train data. The list of methods is shown below.
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform preprocessing of the train data, in the format {‘functionname__argname’: argvalue, …}
is_Y (boolean, optional) – If the data being preprocessed is Y (that is, to be predicted).
Methods:
- Variable selection:
remove_empty_variables(); remove_frozen_variables()
- Missing values imputation:
ffill() remove_observations_with_nan(); replace_nan_with_values()
- Normalization:
back_to_units(); normalize()
- Adding dynamics:
apply_lag(); add_moving_average()
- Noise treatment:
moving_average_filter()
- add_moving_average(df, train_or_test='train', WS=10)
Adding variables filtered by moving average. Attention! Do not confuse with moving_average_filter, in which the original variables are not kept in the dataset.
- Parameters:
df (pandas.DataFrame) – Data to be processed.
train_or_test (string, optional) – Indicates which step the data corresponds to.
WS (int, optional) – Window size of the filter.
- Returns:
pandas.DataFrame
Processed data.
- apply(df, train_or_test='train')
Sequentially applies the preprocessing functions defined during initialization.
- Parameters:
df (pandas.DataFrame) – Data to be processed.
train_or_test (string, optional) – Indicates which step the data corresponds to.
- Returns:
Processed data.
- Return type:
pandas.DataFrame
- apply_lag(df, train_or_test='train', lag=1)
Generation of time-delayed variables.
- Parameters:
df (pandas.DataFrame) – Data to be processed.
train_or_test (string, optional) – Indicates which step the data corresponds to.
lag (int, optional) – Number of delays to be considered.
- Returns:
pandas.DataFrame
Processed data.
- back_to_units(df)
Returns the variables to the original scale, reverting effects of a normalization.
- Parameters:
df (pandas.DataFrame) – Data to be processed.
- Returns:
pandas.DataFrame
Processed data.
- ffill_nan(df, train_or_test='train')
Fills missing (NaN) values with the last valid value. Uses the next valid value if there is no last available.
- Parameters:
df (pandas.DataFrame) – Data to be processed.
train_or_test (string, optional) – Indicates which step the data corresponds to.
- Returns:
Processed data.
- Return type:
pandas.DataFrame
- moving_average_filter(df, train_or_test='train', WS=10)
Moving average noise filter.
- Parameters:
df (pandas.DataFrame) – Data to be processed.
train_or_test (string, optional) – Indicates which step the data corresponds to.
WS (int, optional) – Window size of the filter.
- Returns:
pandas.DataFrame
Processed data.
- normalize(df, train_or_test='train', mode='standard')
Variable normalization.
- Parameters:
df (pandas.DataFrame) – Data to be processed.
train_or_test (string, optional) – Indicates which step the data corresponds to.
mode (string, optional) – Type of normalization (standard, robust, m-robust or s-robust).
- Returns:
pandas.DataFrame
Processed data.
- remove_empty_variables(df, train_or_test='train')
Removes variables with no values.
- Parameters:
df (pandas.DataFrame) – Data to be processed.
train_or_test (string, optional) – Indicates which step the data corresponds to.
- Returns:
Processed data.
- Return type:
pandas.DataFrame
- remove_frozen_variables(df, train_or_test='train', threshold=1e-06)
Removes variables whose variation falls below a given limit.
- Parameters:
df (pandas.DataFrame) – Data to be processed.
train_or_test (string, optional) – Indicates which step the data corresponds to.
threshold (float, optional) – Variance limit to consider a variable as frozen.
- Returns:
Processed data.
- Return type:
pandas.DataFrame
- remove_observations_with_nan(df, train_or_test='train')
Removes observations with missing data (NaN).
- Parameters:
df (pandas.DataFrame) – Data to be processed.
train_or_test (string, optional) – Indicates which step the data corresponds to.
- Returns:
Processed data.
- Return type:
pandas.DataFrame
- replace_nan_with_values(df, train_or_test='train', val=0)
Replaces missing data (NaN) with a predefined value.
- Parameters:
df (pandas.DataFrame) – Data to be processed.
train_or_test (string, optional) – Indicates which step the data corresponds to.
val (int or float) – Value to be used in the replacement.
- Returns:
pandas.DataFrame
Processed data.
- class bibmon.SBM(p=2.0, functional_form='rbf', gamma=1.0, eta=1e-10, train_method='geometrical_median', tau=1e-10, verbose=False)
Bases:
GenericModelSimilarity-based method (SBM).
For details on the technique, see the papers:
Marins et al. (2018) - Improved similarity-based modeling for the classification of rotating-machine failures, http://www02.smt.ufrj.br/~sergioln/papers/IJ28.pdf
Ribeiro (2018) - Similarity-based methods for machine diagnosis,
- Parameters:
p (float, optional) – p-value for the definition of the norm.
functional_form (string, optional) – Functional form to be used in the similarity calculation. rbf = radial basis function; ies = inverse euclidean similarity; iqk = inverse quadratic kernel; exp_kernel = exponential kernel; cauchy_kernel = cauchy kernel.
gamma (float, optional) – Parameter present in the various functional forms of similarity.
eta (float, optional) – Minimum value to be returned in similarity calculations.
train_method (string, optional) – Training method. Options: ‘all_archetypes’ and ‘geometrical_median’.
tau (float, optional) – Similarity threshold.
verbose (boolean, optional) – Whether to print information during execution.
- fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)
Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.
- Parameters:
X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.
Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.
lim_conf (float, optional) – Confidence limit for the detection index.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.
frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.
tune (boolean, optional) – Indicator of automatic hyperparameter tuning.
params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.
params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.
params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.
n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.
- hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)
Performs hyperparameter tuning using the Optuna library.
- Parameters:
params (pandas.DataFrame) –
Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,
list_possibilities2, …],
- ’types’: [str_type1,
str_type2, …]},
index = [str_param1_name, str_param2_name, …])
n_trials (int, optional) – Number of trials in the optimization performed by Optuna.
lim_conf (float, optional) – Confidence limit for the detection index.
percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.
n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
- load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)
Receives parameters from a previously trained model for making predictions and tests without the need for training.
- Parameters:
limSPE (float) – Detection limit and mean of the SPE.
SPE_mean (float) – Mean of the SPE.
count_window_size (int) – Window sizes used in count alarms calculation.
Mux (pandas.Series) – Means of the X variables in the training period.
SDx (pandas.Series) – Standard deviations of the X variables in the training period.
Muy (pandas.Series, optional) – Means of the Y variables in the training period.
SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.
- map_from_X(X)
Receives a data matrix X and returns a matrix of predicted or reconstructed values.
- Parameters:
X (numpy.array) – Window X of data for prediction or reconstruction.
- Returns:
numpy.array
Reconstructed X (or predicted Y).
- plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)
Plotting the temporal evolution of SPE.
- Parameters:
ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.
train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.
logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).
legends (boolean, optional) – If the graph should display legends.
plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.
- plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)
Plotting the temporal evolution of the predictions along with the respective true values.
- Parameters:
ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.
train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.
X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.
metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.
- pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)
Receives a window of data and prepares it for the model testing.
- Parameters:
X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.
Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.
count_window_size (int, optional) – Window size used in the count alarm calculation.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}
- pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)
Receives the data for model training and prepares them for training.
- Parameters:
X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.
Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}
- predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)
Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.
- Parameters:
X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.
Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.
count_window_size (int, optional) – Window size used in count alarm calculation.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}
delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.
- set_hyperparameters(params_dict)
Receives a dict with hyperparameters to be assigned in the model.
- Parameters:
params_dict (dict) – Dictionary with hyperparameter values
- test(redefine_limit=False, delete_testing_data=False)
Analyzes a window of data, applying a model test. Must be called after the pre_test function.
- Parameters:
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.
delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.
- train(lim_conf=0.99, delete_training_data=False)
Performs the model training. Must be called after the pre_train function.
- Parameters:
lim_conf (float, optional) – Confidence limit for the detection index.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
- train_core()
The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().
- bibmon.align_dfs_by_rows(df1, df2)
Aligns DataFrames by rows.
- Parameters:
df1 (pandas.DataFrame) – Original data.
df2 (pandas.DataFrame) – Original data.
- Returns:
new_df1, new_df2 – Processed data.
- Return type:
pandas.DataFrame
- bibmon.comparative_table(models, X_train, X_validation, X_test, Y_train=None, Y_validation=None, Y_test=None, lim_conf=0.99, f_pp_train=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp_train=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, logy=True, metrics=None, X_pred_to_plot=None, count_limit=1, count_window_size=0, fault_start=None, fault_end=None, mask=None, times=True, plot_SPE=True, plot_predictions=True, fit_model=True)
Performs complete monitoring analysis of multiple models and builds comparative result tables.
- Parameters:
models (list of BibMon models) – Models to be considered in the analysis.
X_train (pandas.DataFrame or numpy.array) – Training data X.
X_validation (pandas.DataFrame or numpy.array) – Validation data X.
X_test (pandas.DataFrame or numpy.array) – Test data X.
Y_train (pandas.DataFrame or numpy.array, optional) – Training data Y.
Y_validation (pandas.DataFrame or numpy.array, optional) – Validation data Y.
Y_test (pandas.DataFrame or numpy.array, optional) – Test data Y.
lim_conf (float, optional) – Confidence limit for the detection index.
f_pp_train (list, optional) – List containing strings with names of functions to be used in pre-processing the train data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
a_pp_train (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the train data, in the format {‘functionname__argname’: argvalue, …}.
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the test data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the test data, in the format {‘functionname__argname’: argvalue, …}.
logy (boolean, optional) – If use logarithmic scale in the SPE plots.
metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.
X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to sound.
count_window_size (int, optional) – Window sizes used in count alarm calculation.
fault_start (string, optional) – Start timestamp of the fault.
fault_end (string, optional) – End timestamp of the fault.
mask (numpy.array, optional) – Boolean array indicating the indices where the process is in fault.
times (boolean, optional) – If execution times should be calculated.
plot_SPE (boolean, optional) – If SPE plots should be plotted.
plot_predictions (boolean, optional) – If prediction plots should be plotted.
fit_model (boolean, optional) – If models should be trained.
- Returns:
List with the generated tables (prediction and/or detection).
- Return type:
list of pandas.DataFrames
- bibmon.complete_analysis(model, X_train, X_validation, X_test, Y_train=None, Y_validation=None, Y_test=None, lim_conf=0.99, f_pp_train=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp_train=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, logy=True, metrics=None, X_pred_to_plot=None, count_limit=1, count_window_size=0, fault_start=None, fault_end=None)
Performs a complete monitoring analysis, with train, validation, and test.
- Parameters:
model (BibMon model) – Model to be considered in the analysis.
X_train (pandas.DataFrame or numpy.array) – Training data X.
X_validation (pandas.DataFrame or numpy.array) – Validation data X.
X_test (pandas.DataFrame or numpy.array) – Test data X.
Y_train (pandas.DataFrame or numpy.array, optional) – Training data Y.
Y_validation (pandas.DataFrame or numpy.array, optional) – Validation data Y.
Y_test (pandas.DataFrame or numpy.array, optional) – Test data Y.
lim_conf (float, optional) – Confidence limit for the detection index.
f_pp_train (list, optional) – List containing strings with names of functions to be used in pre-processing the train data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
a_pp_train (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the train data, in the format {‘functionname__argname’: argvalue, …}.
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the test data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the test data, in the format {‘functionname__argname’: argvalue, …}.
logy (boolean, optional) – If use logarithmic scale in the SPE plots.
metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.
X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to sound.
count_window_size (int, optional) – Window sizes used in count alarm calculation.
fault_start (string, optional) – Start timestamp of the fault.
fault_end (string, optional) – End timestamp of the fault.
- bibmon.create_df_with_dates(array, start='2020-01-01 00:00:00', freq='1min')
- Parameters:
array (pandas.DataFrame or numpy.array) – Original data.
start (string, optional) – Start timestamp.
freq (string, optional) – Sampling interval.
- Returns:
df – Processed data.
- Return type:
pandas.DataFrame
- bibmon.create_df_with_noise(array, noise_frac, max_index_for_noise)
Adds artificial measurement noise to data.
- Parameters:
array (pandas.DataFrame or numpy.array) – Original data.
noise_frac (float) – Fraction (between 0 and 1) of the total amplitude of the variable that will be used as the noise standard deviation.
max_index_for_noise (int) – Maximum index to consider the amplitude in the standard deviation calculation.
- Returns:
df – Processed data.
- Return type:
pandas.DataFrame
- bibmon.load_real_data()
Load a sample of real process data. The variables have been anonymized for availability in the library.
- Returns:
Process data.
- Return type:
pandas.DataFrame
- bibmon.load_tennessee_eastman(train_id=0, test_id=0)
Load the ‘Tennessee Eastman Process’ benchmark data.
- Parameters:
train_id (int, optional) – Identifier of the training data. No fault: 0. With faults: 1 to 20.
test_id (int, optional) – Identifier of the test data. No fault: 0. With faults: 1 to 20.
- Returns:
train_df (pandas.DataFrame) – Training data.
test_df (pandas.DataFrame) – Test data.
- class bibmon.sklearnRegressor(regressor, permutation_importance=False)
Bases:
GenericModelInterface for sklearn regressors.
- Parameters:
regressor (any regressor that uses the sklearn interface.) –
- For example:
sklearn.svm.classes.SVR,
sklearn.ensemble.forest.RandomForestRegressor,
sklearn.neural_network.multilayer_perceptron.MLPRegressor,
etc….
permutation_importance (boolean, optional) – Whether permutation variable importance should be calculated.
- fit(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None, lim_conf=0.99, delete_training_data=False, redefine_limit=False, frac_val=0.15, tune=False, params=None, params_types=None, params_possibilities=None, n_trials=20)
Performs the complete pipeline of model training, sequentially executing the ‘pre_train’ and ‘train’ methods.
- Parameters:
X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.
Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}.
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, in the format {‘functionname__argname’: argvalue, …}.
lim_conf (float, optional) – Confidence limit for the detection index.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit using a validation period taken from the training data itself.
frac_val (float, optional) – Fraction of the data used for validation. 0<frac_val<1. Only used if redefine_limit==True.
tune (boolean, optional) – Indicator of automatic hyperparameter tuning.
params (string or list of strings, optional) – Name(s) of the parameter(s) to be tuned.
params_types (string or list of strings) – Type(s) of the parameter(s) to be tuned.
params_possibilities (list, optional) – Possibilities to be tested for each parameter. It must be a list containing the possibilities (in case of only one parameter), or a list containing the lists for each possibility. Possibilities must be provided according to the type of the parameter, as specified in the Optuna library API.
n_trials (int, optional) – Number of iterations in the hyperparameter search optimization.
- hyperparameter_tuning(params, n_trials=20, lim_conf=0.99, percent_validation=0.2, n_splits=None, delete_training_data=False)
Performs hyperparameter tuning using the Optuna library.
- Parameters:
params (pandas.DataFrame) –
Contains the possibilities to be tested and the types of parameters. Must be defined as: pd.DataFrame({‘possibilities’: [list_possibilities1,
list_possibilities2, …],
- ’types’: [str_type1,
str_type2, …]},
index = [str_param1_name, str_param2_name, …])
n_trials (int, optional) – Number of trials in the optimization performed by Optuna.
lim_conf (float, optional) – Confidence limit for the detection index.
percent_validation (float (0<value<1), optional) – Percentage of the data to be separated for use in internal validation, if no cross-validation is performed.
n_splits (int, optional) – Number of sets to be used in cross-validation. If not None, the value of percent_validation is disregarded.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
- load_model(limSPE, SPE_mean, count_window_size, Mux, SDx, Muy=None, SDy=None)
Receives parameters from a previously trained model for making predictions and tests without the need for training.
- Parameters:
limSPE (float) – Detection limit and mean of the SPE.
SPE_mean (float) – Mean of the SPE.
count_window_size (int) – Window sizes used in count alarms calculation.
Mux (pandas.Series) – Means of the X variables in the training period.
SDx (pandas.Series) – Standard deviations of the X variables in the training period.
Muy (pandas.Series, optional) – Means of the Y variables in the training period.
SDy (pandas.Series, optional) – Standard deviations of the Y variables in the training period.
- map_from_X(X)
Receives a data matrix X and returns a matrix of predicted or reconstructed values.
- Parameters:
X (numpy.array) – Window X of data for prediction or reconstruction.
- Returns:
numpy.array
Reconstructed X (or predicted Y).
- plot_SPE(ax=None, train_or_test='train', logy=True, legends=True, plot_alarm_outlier=True)
Plotting the temporal evolution of SPE.
- Parameters:
ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.
train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.
logy (boolean, optional) – Indicates whether the y-axis scale should be logarithmic (True) or linear (False).
legends (boolean, optional) – If the graph should display legends.
plot_alarm_outlier (boolean, optional) – If the alarmOutlier should be plotted.
- plot_importances(n=None, permutation_importance=False)
Plots the permutation importances of the variables.
- Parameters:
n (int, optional) – Maximum number of variables to be plotted.
permutation_importance (boolean, optional) – If permutation importances should be prioritized over coefficients in linear models.
- plot_predictions(ax=None, train_or_test='train', X_pred_to_plot=None, metrics=None)
Plotting the temporal evolution of the predictions along with the respective true values.
- Parameters:
ax (matplotlib.axes._subplots.AxesSubplot, optional) – Axis on which the graph will be plotted.
train_or_test (string, optional) – Indicates whether to plot the graph for ‘train’ or ‘test’.
X_pred_to_plot (string, optional) – In case the model is a reconstruction model (i.e., self.has_Y = False), indicates which column of X to plot along with the prediction.
metrics (list of functions, optional) – Functions for calculating metrics to be displayed in the title of the graph.
- pre_test(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp=None, a_pp=None)
Receives a window of data and prepares it for the model testing.
- Parameters:
X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.
Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.
count_window_size (int, optional) – Window size used in the count alarm calculation.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}
- pre_train(X_train, Y_train=None, f_pp=['remove_empty_variables', 'ffill_nan', 'remove_frozen_variables', 'normalize'], a_pp=None, f_pp_test=['replace_nan_with_values', 'normalize'], a_pp_test=None)
Receives the data for model training and prepares them for training.
- Parameters:
X_train (pandas.DataFrame or numpy.ndarray) – Window of X data used in training.
Y_train (pandas.DataFrame or numpy.ndarray, optional) – Window of Y data used in training.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing the training data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)
f_pp_test (list, optional) – List containing strings with names of functions to be used in pre-processing the testing data (the functions are defined in the PreProcess class, in the BibMon_Tools.py file)
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the training data, in the format {‘functionname__argname’: argvalue, …}
a_pp_test (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the testing data, ] in the format {‘functionname__argname’: argvalue, …}
- predict(X_test, Y_test=None, count_window_size=0, count_limit=1, f_pp='fit', a_pp='fit', delete_testing_data=False, redefine_limit=False)
Performs the complete pipeline of model prediction (testing), sequentially executing the ‘pre_test’ and ‘test’ methods.
- Parameters:
X_test (pandas.DataFrame, pandas.Series or numpy.ndarray) – Window of data or observation X needed to perform the analysis.
Y_test (pandas.DataFrame, pandas.Series or numpy.ndarray, optional) – Window of data or observation Y needed to perform the analysis.
count_window_size (int, optional) – Window size used in count alarm calculation.
count_limit (int, optional) – Limit of points to be considered in the window for the count alarm to trigger.
f_pp (list, optional) – List containing strings with names of functions to be used in pre-processing (the functions are defined in the PreProcess class).
a_pp (dict, optional) – Dictionary containing the parameters to be provided to each function to perform pre-processing of the data, in the format {‘functionname__argname’: argvalue, …}
delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.
- set_hyperparameters(params_dict)
Receives a dict with hyperparameters to be assigned in the model.
- Parameters:
params_dict (dict) – Dictionary with hyperparameter values
- test(redefine_limit=False, delete_testing_data=False)
Analyzes a window of data, applying a model test. Must be called after the pre_test function.
- Parameters:
redefine_limit (boolean, optional) – Indicator of redefinition or not of the detection limit during testing.
delete_testing_data (boolean, optional) – If True, the data is deleted at the end of testing. Useful to save memory.
- train(lim_conf=0.99, delete_training_data=False)
Performs the model training. Must be called after the pre_train function.
- Parameters:
lim_conf (float, optional) – Confidence limit for the detection index.
delete_training_data (boolean, optional) – If True, the data is deleted at the end of training. Useful to save memory.
- train_core()
The core of the training algorithm, that is, all the necessary steps between pre_train() and the calculation of the prediction or reconstruction in training by map_from_X().
- update_importances()
Calculates permutation importances of the variables.
- bibmon.spearmanr_dendrogram(df, figsize=(18, 8))
Generates a dendrogram of Spearman correlations.
- Parameters:
df (pandas.DataFrame) – Dados to be analyzed.
figsize (tuple of ints, optional) – Figure dimensions.
- bibmon.train_val_test_split(data, start_train, end_train, end_validation, end_test, tags_X=None, tags_Y=None)
Separates the data into consecutive portions of train, validation, and test, returning 3 DataFrames. It can also separate into predictor variables (X) and predicted variables (Y), which in this case will return 6 DataFrames.
- Parameters:
data (pandas.DataFrame) – Data to be separated.
start_train (string) – Start timestamp of the train portion.
end_train (string) – End timestamp of the train portion.
end_validation (string) – End timestamp of the validation portion.
end_test (string) – End timestamp of the test portion.
tags_X (list of strings) – Variables to be considered in the X set.
tags_Y (list of strings) – Variables to be considered in the Y set.
- Returns:
Separated data.
- Return type:
pandas.DataFrames