KFold#

class dpeeg.exps.KFold(trainer: BaseClassifier, out_folder: str | None = None, k: int = 5, isolate_testset: bool = True, shuffle: bool = True, seed: int = 42, timestamp: bool = True, verbose: int | str = 'INFO')[source]#

K-Fold cross validation experiment.

The KFold experiment divides the dataset into K non-overlapping subsets (i.e., “folds”) and repeatedly trains and tests the model. The purpose is to reduce the dependence of the model evaluation results on the way the dataset is divided and to improve the stability and reliability of the evaluation results. However, its computational cost is high, especially for large datasets and complex models. It may take a long time to complete the training of all folds.

Two validation methods are provided in the experiment, determined by the parameter isolate_testset. When set to True, it indicates that the test set is independent of the k-fold cross-validation. That is, for each fold, the data is divided into a training set and a validation set to find the optimal parameters for each fold, and then the model is evaluated on an independent dataset. When set to False, it indicates that one fold of data in each fold is used as the test set, and the remaining folds are used to train the model. The average value of all folds’ evaluations is used as the performance metric of the model. The specific experimental method is shown in the figure below, which illustrates a 3-fold cross-validation experiment:

When the training set and test set come from different sessions, setting this parameter is very useful.

Parameters:

trainer (Trainer) – Trainer used for training module on dataset.
out_folder (str, optional) – Store all experimental results in a folder named with the model class name in the specified folder (‘~/dpeeg/out/model/exp/dataset/timestamp’).
k (int, optional) – k of k-Fold.
isolate_testset (bool) – By default, the test set is independent, that is, the k-fold cross- validation at this time only divides the training set and the verification set based on the training set to implement an early stopping mechanism, and finally evaluates on the isolated test set. If False, the test set is for each fold of k-fold cross-validation.
shuffle (bool) – Shuffle before kfold.
seed (int) – Seed of random for review.
timestamp (bool) – Output folders are timestamped.

Notes

If isolate_testset False, please provide the transforms parameter of the run function to avoid data leakage caused by operations such as data augmentation in advance. When set to True, it means that the experiment requires the trainer to support a validation set.