Training

Functions for training models

source

TRAININGCONFIGS

CLASS relax.trainer.TrainingConfigs (n_epochs, batch_size, monitor_metrics=None, seed=42, log_dir=‘log’, logger_name=‘debug’, log_on_step=False, max_n_checkpoints=3)

Configurator of train_model.

Parameters:
  • n_epochs (int) – Number of epochs.
  • batch_size (int) – Batch size.
  • monitor_metrics (Optional[str]) – Monitor metrics used to evaluate the training result after each epoch.
  • seed (int, default=42) – Seed for generating random number.
  • log_dir (str, default=log) – The name for the directory that holds logged data during training.
  • logger_name (str, default=debug) – The name for the directory that holds logged data during training under log directory.
  • log_on_step (bool, default=False) – Log the evaluate metrics at the current step.
  • max_n_checkpoints (int, default=3) – Maximum number of checkpoints stored.

source

TRAIN_MODEL_WITH_STATES

relax.trainer.train_model_with_states (training_module, params, opt_state, data_module, t_configs)

Train models with params and opt_state.


source

TRAIN_MODEL

relax.trainer.train_model (training_module, data_module, t_configs)

Train models.

Parameters:
  • training_module (BaseTrainingModule) – Training module
  • data_module (TabularDataModule) – Data module
  • t_configs (Dict[str, Any] | TrainingConfigs) – Training configurator
Returns:

    (Tuple[hk.Params, optax.OptState])


source

DOWNLOAD_MODEL

relax.module.download_model (data_name)

High-level util function for download trained model.

Parameters:
  • data_name (str) – The name of data

source

LOAD_PRED_MODEL

relax.module.load_pred_model (data_name)

High-level util function for loading trained model.

Parameters:
  • data_name (str) – The name of data
Returns:

    (Tuple[hk.Params, PredictiveTrainingModule])

Pretrain model

import shutil

for data_name in DEFAULT_DATA_CONFIGS.keys():
    datamodule = load_data(data_name = data_name)

    # Fetch the sizes and lr from the configs file
    data_dir = Path(os.getcwd()) / "cf_data" / data_name / "configs.json"
    mlp_configs = load_json(data_dir)['mlp_configs']
    sizes = mlp_configs["sizes"]
    lr = mlp_configs["lr"]
    batch_size = load_json(data_dir)["data_configs"]['batch_size']

    params, opt_state = train_model(
        PredictiveTrainingModule({'sizes': sizes, 'lr': lr}),
        datamodule, t_configs={
            'n_epochs': 10, 'batch_size': batch_size, 'monitor_metrics': 'val/val_loss',
            'max_n_checkpoints': 1, 'logger_name': data_name
        }
    )

    # get the most recent version and the best epoch stored in the version
    version_dir = "log/{data_name}/".format(data_name = data_name) # Obtain the all version
    latest_version = max([os.path.join(version_dir,v) for v in os.listdir(version_dir) if v.startswith("version_")], key=os.path.getmtime)
    epoch = [d for d in os.listdir(latest_version + "/checkpoints/".format(data_name = data_name, version = latest_version)) if d.startswith("epoch")][0] # Obtain the epoch value
    model_dir = latest_version + "/checkpoints/{epoch}/model".format(epoch = epoch)

    # update model to the assets
    shutil.rmtree("assets/{data_name}/model".format(data_name=data_name), ignore_errors=True)
    shutil.copytree(model_dir, "assets/{data_name}/model".format(data_name = data_name))

    # test: save model under cf_data
    shutil.rmtree("cf_data/{data_name}/model".format(data_name=data_name), ignore_errors=True)
    shutil.copytree(model_dir, "cf_data/{data_name}/model".format(data_name = data_name))
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
  0%|          | 0/96 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/96 [00:01<?, ?batch/s, train/train_loss_1=0.158]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.158]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.154]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.138]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.136]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.125]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.123]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.119]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.113]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.11] Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.111]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.106]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.101]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.104]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.107]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0987]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0959]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0987]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0898]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.109] Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0954]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.117] Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0982]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.098] Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.093]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0908]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0933]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.086] Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.101]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0823]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0866]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.089] Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0899]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0892]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0966]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0858]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0853]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0837]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0833]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0794]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0993]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0754]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0733]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0883]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0838]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.094] Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0934]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0807]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0858]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0777]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.084] Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0884]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0826]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0877]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0824]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0726]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0949]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0745]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0831]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0671]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0836]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0839]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.072] Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0805]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0871]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0739]Epoch 0:   1%|1         | 1/96 [00:01<01:57,  1.24s/batch, train/train_loss_1=0.0682]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0682]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0902]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0821]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0739]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0863]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0854]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0758]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0779]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0685]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0686]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0764]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.077] Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0934]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.074] Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0816]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0769]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.067] Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.085]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0774]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0883]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0784]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0864]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0678]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0874]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0842]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0831]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0828]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0803]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0756]Epoch 0:  69%|######8   | 66/96 [00:01<00:00, 67.87batch/s, train/train_loss_1=0.0697]Epoch 0:  69%|######8   | 66/96 [00:02<00:00, 67.87batch/s, train/train_loss_1=0.0881]                                                                                        0%|          | 0/96 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0755]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.073] Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0707]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0751]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0824]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0761]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0893]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0757]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.073] Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0773]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0737]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0754]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.075] Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0694]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0792]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0829]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0819]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0716]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0684]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0799]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0671]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.079] Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0732]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.074] Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.085]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0644]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0767]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0671]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0667]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0768]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0629]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0694]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0798]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0775]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0696]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0735]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.064] Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0775]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0706]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0745]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0717]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0684]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0692]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0637]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0601]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0733]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0687]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0811]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0696]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0677]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0867]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0749]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0645]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0681]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0645]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0608]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0744]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0659]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0713]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0713]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0662]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0719]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0848]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0738]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0644]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0771]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0641]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0692]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0607]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0635]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0792]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0684]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0677]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0579]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0653]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0681]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0694]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0603]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0658]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0715]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0652]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0648]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0618]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0605]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0601]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.062] Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0538]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0749]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0698]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0658]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.07]  Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0733]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0543]Epoch 1:  65%|######4   | 62/96 [00:00<00:00, 618.03batch/s, train/train_loss_1=0.0828]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0826]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0676]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0633]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0667]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0619]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0653]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0746]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0654]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0738]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0664]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0756]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0692]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0635]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0568]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0618]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0607]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0705]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0713]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0578]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0567]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0595]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0636]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0683]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0765]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0667]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0658]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0685]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0732]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0603]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0639]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0588]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0684]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0688]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0628]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0671]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0669]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0629]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0694]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0715]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0655]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0689]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0504]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0557]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0685]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0682]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0765]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0577]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0508]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0487]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0557]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0772]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0561]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0755]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.063] Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0674]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0802]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0618]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0717]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0642]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0642]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0724]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0563]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0578]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0725]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0606]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0652]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0517]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0595]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0643]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0595]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0697]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0634]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0641]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0658]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0598]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0666]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0722]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0616]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0639]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0602]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0684]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0616]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0671]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0707]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0527]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0542]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0672]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0601]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0714]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0648]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0744]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0606]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0588]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0581]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0702]Epoch 2:  62%|######2   | 60/96 [00:00<00:00, 594.34batch/s, train/train_loss_1=0.0871]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0662]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0593]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0675]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0625]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0574]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0572]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0599]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0623]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0631]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0604]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.064] Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0695]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0746]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0606]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.065] Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0607]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0691]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0657]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0638]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0663]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0693]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0615]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0749]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0606]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0639]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0543]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0565]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0597]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0524]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0674]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0646]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0574]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0633]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0601]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0697]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.061] Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0605]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0722]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0625]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0633]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0697]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0697]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.073] Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0777]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0679]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0552]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0645]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0607]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0506]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0684]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0553]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0577]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0599]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0637]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0571]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0533]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0586]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0666]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0543]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0492]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.056] Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0617]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0617]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0625]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0582]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0552]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0652]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0625]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.072] Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0666]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0583]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0587]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0569]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0687]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0667]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0565]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0705]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0721]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.068] Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0636]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.056] Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0638]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.062] Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0673]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0621]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.067] Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0672]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0647]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0628]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0631]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0588]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0561]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0694]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0608]Epoch 3:  67%|######6   | 64/96 [00:00<00:00, 631.97batch/s, train/train_loss_1=0.0562]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0544]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0588]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0633]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0704]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0617]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.059] Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0596]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0551]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0599]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.059] Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0574]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0631]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0741]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0564]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0609]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0606]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.064] Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0658]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0607]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0555]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0567]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0647]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0657]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0536]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0592]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0575]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0705]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0488]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.072] Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.069]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0682]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0696]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0665]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0668]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0603]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.051] Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0608]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0535]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.059] Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0648]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0763]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0678]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0658]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0731]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0596]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0731]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0729]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0634]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0694]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0612]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0638]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0621]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.06]  Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0646]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0574]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0628]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0639]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0639]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0615]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0703]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0664]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.06]  Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0648]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0559]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0682]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0543]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0622]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0678]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.056] Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0574]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0578]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0557]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0558]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0533]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0581]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0586]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0673]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0545]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0574]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0646]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0589]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0671]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0607]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0622]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0785]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0615]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0624]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.064] Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0685]Epoch 4:  67%|######6   | 64/96 [00:00<00:00, 639.52batch/s, train/train_loss_1=0.0439]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0673]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.069] Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0548]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0601]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.058] Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0634]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0604]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0651]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0629]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0644]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0637]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0653]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0557]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.062] Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0697]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.056] Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0495]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0592]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0646]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0698]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0602]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0691]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0592]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0642]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0556]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0663]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0625]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.056] Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0629]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0489]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.062] Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0651]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0561]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0567]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0551]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0587]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0627]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0692]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.064] Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0603]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0678]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0559]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0583]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0575]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0617]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0601]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0637]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0601]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0562]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0688]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0538]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0568]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0649]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0675]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0633]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0658]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0544]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0684]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0591]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0675]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0675]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0654]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0684]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0601]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0578]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0719]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0655]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0649]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0529]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0642]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0626]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0646]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0633]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0592]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0596]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0592]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0638]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0664]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0659]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0676]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0703]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0584]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0506]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0662]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0522]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0559]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0619]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0481]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0656]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0558]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.058] Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.069]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0612]Epoch 5:  66%|######5   | 63/96 [00:00<00:00, 623.19batch/s, train/train_loss_1=0.0525]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0604]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0558]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0566]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0531]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0696]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0574]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0608]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0647]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0733]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0609]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0625]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0518]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0566]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0616]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0658]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0666]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0648]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0617]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0626]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0602]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0586]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0556]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0545]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0556]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0607]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.05]  Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0564]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0477]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0638]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0594]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0659]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0732]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0548]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0654]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0502]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0547]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.06]  Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.062]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0672]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.054] Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0693]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0608]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0605]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0675]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0522]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0613]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0639]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0573]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0514]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0599]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0609]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.059] Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0636]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0682]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0679]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.056] Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0583]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0698]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.06]  Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.052]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0682]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0709]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0709]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0518]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0613]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0667]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0572]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0631]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0697]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0557]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0677]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0761]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0581]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0545]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0638]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0489]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0572]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0697]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0609]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0745]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0707]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0564]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0612]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.063] Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0652]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0573]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0682]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0626]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0519]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0551]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0431]Epoch 6:  70%|######9   | 67/96 [00:00<00:00, 660.58batch/s, train/train_loss_1=0.0678]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0562]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0731]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0526]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0536]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0548]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0568]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0619]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0652]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0734]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0645]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0627]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0583]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0498]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0555]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0642]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0625]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0607]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0585]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0594]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0668]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0688]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0629]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0646]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0606]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0628]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0691]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0751]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0574]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0713]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0525]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0502]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0626]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0633]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0492]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0608]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0542]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0486]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0586]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0579]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0588]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0709]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0665]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0639]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0549]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0679]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0622]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0604]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0698]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.054] Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0606]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0696]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.057] Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0624]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0662]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0601]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0645]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0657]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.047] Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0591]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.057] Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0623]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0538]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0623]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0553]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0553]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0633]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0696]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0569]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0584]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0613]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0584]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0556]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0689]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.062] Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0618]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0535]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0531]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0632]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.075] Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0601]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0733]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0588]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0626]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0652]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0532]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0622]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0535]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0475]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0646]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0705]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0509]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0528]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0509]Epoch 7:  70%|######9   | 67/96 [00:00<00:00, 664.57batch/s, train/train_loss_1=0.0603]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0527]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0598]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0681]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.049] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.063]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0647]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0488]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.055] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0598]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0631]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0648]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0576]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0528]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0624]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0587]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0666]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0665]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0463]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0664]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.054] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0599]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0591]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0672]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.064] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0539]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0549]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0545]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0659]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0557]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0739]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0665]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0578]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0608]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0545]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0529]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0615]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0682]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0658]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0606]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0534]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0569]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0614]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0688]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0503]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0599]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0645]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.073] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0605]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0602]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.058] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0649]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.057] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0551]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.055] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0596]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0633]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.065] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0567]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0613]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0613]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0572]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0626]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0565]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0591]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0637]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0676]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0602]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0718]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0521]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0637]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0569]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.064] Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0674]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0587]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0595]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0726]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0716]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0634]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0618]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.064] Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0586]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0524]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0554]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0612]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0616]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.054] Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0525]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0717]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0785]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0615]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0527]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.059] Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0464]Epoch 8:  65%|######4   | 62/96 [00:00<00:00, 617.42batch/s, train/train_loss_1=0.0529]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0552]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0628]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.055] Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0559]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0591]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0676]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0664]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0538]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0515]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0551]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0649]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.07]  Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0635]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0593]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0701]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0616]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0563]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0582]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0651]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0624]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0616]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0626]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0602]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0645]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0546]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0566]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.054] Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0631]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0596]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.059] Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0686]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.059] Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0533]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0598]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0504]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0707]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0672]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.071] Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0561]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0585]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0668]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0566]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0666]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0601]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0622]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0605]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0587]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0578]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0581]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0633]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0561]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0482]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0518]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0584]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.058] Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0633]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0551]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0606]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0488]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0488]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0654]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0669]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0607]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0643]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0656]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.062] Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0462]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0634]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0518]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.057] Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0613]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0565]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0639]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0576]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0733]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0557]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.066] Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0634]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0625]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0612]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0586]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.058] Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0615]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0589]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0513]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0626]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0628]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0591]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0637]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.067] Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0612]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0573]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0547]Epoch 9:  65%|######4   | 62/96 [00:00<00:00, 615.51batch/s, train/train_loss_1=0.0706]Epoch 9: 100%|##########| 96/96 [00:00<00:00, 569.03batch/s, train/train_loss_1=0.0706]
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
  0%|          | 0/31 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/31 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/31 [00:01<?, ?batch/s, train/train_loss_1=0.135]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.135]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.132]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.123]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.126]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.124]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.123]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.123]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.126]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.123]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.122]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.123]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.12] Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.122]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.119]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.124]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.119]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.119]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.121]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.116]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.118]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.123]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.121]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.124]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.119]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.117]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.113]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.118]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.119]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.121]Epoch 0:   3%|3         | 1/31 [00:01<00:34,  1.14s/batch, train/train_loss_1=0.116]Epoch 0:   3%|3         | 1/31 [00:02<00:34,  1.14s/batch, train/train_loss_1=0.116]Epoch 0: 100%|##########| 31/31 [00:02<00:00, 14.90batch/s, train/train_loss_1=0.116]                                                                                       0%|          | 0/31 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.119]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.12] Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.118]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.118]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.118]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.122]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.113]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.116]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.117]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.114]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.123]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.115]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.119]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.116]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.114]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.116]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.114]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11] Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.117]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.115]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.118]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11] Epoch 1:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.108]                                                                              0%|          | 0/31 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.117]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11] Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.115]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.113]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.115]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11] Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11] Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0993]Epoch 2:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.119]                                                                               0%|          | 0/31 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.116]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.113]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.106]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11] Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.1]  Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0995]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.113] Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.094]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.1]  Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0928]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.117] Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 3:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0999]                                                                               0%|          | 0/31 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0997]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0994]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0987]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105] Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.116]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0938]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.118]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.115]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.097]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0969]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0968]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0984]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11]  Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0958]Epoch 4:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102]                                                                               0%|          | 0/31 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0953]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0958]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0985]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104] Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0914]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0965]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0998]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105] Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0957]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11]  Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0964]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0983]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0954]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0992]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.1]   Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.09]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0922]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.113] Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0991]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.113] Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.115]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0909]Epoch 5:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0972]                                                                               0%|          | 0/31 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0982]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.115] Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0972]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.1]   Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0939]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0989]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0984]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.116] Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.106]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.106]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0988]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0963]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.109] Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0974]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.1]   Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0963]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.096] Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0862]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0966]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0985]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0978]Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111] Epoch 6:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0908]                                                                               0%|          | 0/31 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.106]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0964]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.092]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.095]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0998]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0955]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.112] Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0941]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.112] Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.093]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0993]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.094] Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0942]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0985]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.098] Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0994]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 7:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107]                                                                              0%|          | 0/31 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0951]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.109] Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.106]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0972]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104] Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0905]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.1]   Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0923]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0951]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.098]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0967]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0928]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0998]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105] Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.106]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0942]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.104] Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0968]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0944]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.1]   Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0937]Epoch 8:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0958]                                                                               0%|          | 0/31 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0943]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0994]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0973]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0989]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0977]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0948]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0991]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105] Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.1]  Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0989]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0983]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0912]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.107] Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0984]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.111] Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0908]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0914]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0951]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0961]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0982]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0953]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0933]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0836]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.11]  Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.0964]Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 9:   0%|          | 0/31 [00:00<?, ?batch/s, train/train_loss_1=0.117]Epoch 9: 100%|##########| 31/31 [00:00<00:00, 575.46batch/s, train/train_loss_1=0.117]
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
  0%|          | 0/96 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/96 [00:01<?, ?batch/s, train/train_loss_1=0.127]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.127]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.13] Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.127]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.121]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.127]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.123]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.129]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.126]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.127]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.125]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.123]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.125]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.124]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.122]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.123]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.12] Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.122]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.121]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.118]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.12] Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.122]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.121]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.121]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.117]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.121]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.121]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.114]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.116]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.114]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.117]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.122]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.118]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.117]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.116]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.112]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.111]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.118]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.118]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.112]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.11] Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.111]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.115]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.109]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.112]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.113]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.113]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.108]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.113]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.106]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.102]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.106]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.104]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.102]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.108]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.105]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.0984]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.103] Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.105]Epoch 0:   1%|1         | 1/96 [00:01<01:45,  1.11s/batch, train/train_loss_1=0.0919]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0919]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0942]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0993]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0995]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.101] Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.104]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0979]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.097] Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.107]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0906]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0939]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0969]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0998]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0957]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.094] Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0956]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0905]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0938]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0971]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0882]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0867]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0958]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0905]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0864]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0788]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0858]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0863]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0868]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0837]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0874]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0812]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0903]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0885]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0849]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.084] Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0843]Epoch 0:  61%|######1   | 59/96 [00:01<00:00, 66.70batch/s, train/train_loss_1=0.0761]Epoch 0:  61%|######1   | 59/96 [00:02<00:00, 66.70batch/s, train/train_loss_1=0.0858]Epoch 0: 100%|##########| 96/96 [00:02<00:00, 38.35batch/s, train/train_loss_1=0.0858]                                                                                        0%|          | 0/96 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0858]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0778]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0773]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0859]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0856]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0758]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0832]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0816]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0794]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0814]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0783]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0699]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.078] Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0717]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0726]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0706]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0695]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0655]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0678]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0726]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0665]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0837]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0727]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0687]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0904]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0675]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0741]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0708]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0734]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0639]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0654]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0589]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0645]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0721]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0645]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0711]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0651]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0695]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.067] Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0622]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0639]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0685]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0673]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.069] Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0524]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0563]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0652]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0657]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0523]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0607]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.067] Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0553]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0598]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.061] Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0676]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0635]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0637]Epoch 1:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0611]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0595]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.057] Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.053]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0665]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0654]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0628]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0606]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0578]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0568]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0585]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0657]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0535]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0546]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0536]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.056] Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0613]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0446]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0571]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0523]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0551]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0597]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0567]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0552]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0444]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0373]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0521]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0478]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0579]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0629]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0578]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0552]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0669]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0598]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0551]Epoch 1:  64%|######3   | 61/96 [00:00<00:00, 608.37batch/s, train/train_loss_1=0.0502]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0573]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0523]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0542]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0511]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0546]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0557]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0494]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0423]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0532]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0411]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.047] Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0492]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0462]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0577]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0523]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.043] Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.046]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0482]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0634]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0584]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0499]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0519]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.055] Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0357]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0471]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0399]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0523]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0592]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0503]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0542]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0551]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0471]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0439]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0449]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.043] Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0486]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0519]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0388]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.048] Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0473]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0477]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0551]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0554]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0593]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.045] Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0466]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0433]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.039] Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.04] Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0501]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0417]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0491]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0473]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.043] Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0412]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0615]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0531]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0521]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0327]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0446]Epoch 2:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0549]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0549]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0404]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0565]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0519]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0471]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0496]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0441]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0511]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0389]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0479]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0466]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0605]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0543]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0508]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0519]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0421]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0462]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0538]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0375]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.039] Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0453]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0376]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0459]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0572]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0394]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0416]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0435]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0397]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0379]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0517]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0353]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0477]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0515]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.04]  Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0474]Epoch 2:  64%|######3   | 61/96 [00:00<00:00, 603.58batch/s, train/train_loss_1=0.0403]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0438]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.038] Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.037]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.05] Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0497]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0458]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0295]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.044] Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0405]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0381]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0456]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0467]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0358]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0484]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0395]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0421]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0435]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.041] Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0476]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0406]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0432]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0409]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0336]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0395]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0373]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0461]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0383]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0474]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0424]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0389]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0496]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0391]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0374]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0542]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0537]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0308]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0402]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0369]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0429]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0442]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.046] Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0293]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0488]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.036] Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0468]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0325]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0408]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0467]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0419]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0335]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0536]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0362]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0403]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0448]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0317]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0362]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0426]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0416]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0441]Epoch 3:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0385]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0385]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0387]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0347]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0498]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0417]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0334]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0429]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0286]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0469]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0331]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0433]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0405]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0415]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0391]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.035] Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0395]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0387]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0389]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0401]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0334]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0395]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.034] Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0418]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0319]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0353]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0451]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0421]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0418]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0401]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0476]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.043] Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0313]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0336]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0449]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0382]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0334]Epoch 3:  62%|######2   | 60/96 [00:00<00:00, 593.03batch/s, train/train_loss_1=0.0291]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0311]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0365]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0356]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0397]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0441]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0399]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0397]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0429]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0498]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0328]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0275]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0413]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0416]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0361]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0521]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0338]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0486]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0401]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0404]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0398]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0473]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0302]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.037] Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0449]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0388]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0517]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0319]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0421]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.033] Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0421]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0359]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0403]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0301]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0453]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0371]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0351]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0386]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0399]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0399]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.043] Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0279]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0422]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0442]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0417]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0321]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0397]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0321]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.045] Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0434]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0314]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0357]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0471]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0504]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0279]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0341]Epoch 4:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0406]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0406]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0349]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0319]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0499]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.036] Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0346]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0392]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0318]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0373]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0297]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0306]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.044] Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0307]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0307]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.042] Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0285]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0376]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0336]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0351]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0316]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0283]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.03]  Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0389]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0392]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0374]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0288]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0313]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.033] Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0383]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0277]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0327]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0387]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0364]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0402]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0278]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0393]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0332]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.035] Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0344]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0292]Epoch 4:  58%|#####8    | 56/96 [00:00<00:00, 558.51batch/s, train/train_loss_1=0.0464]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0364]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0344]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0344]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0305]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0273]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0311]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0242]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0479]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0382]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0314]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0378]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.024] Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0426]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0277]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0379]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0416]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0469]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0365]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.04]  Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0319]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0326]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0265]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0461]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0417]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0241]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0355]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0319]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0275]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0223]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.041] Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0279]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0405]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0337]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0285]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0456]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0335]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0258]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0326]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0321]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0365]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0346]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0383]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0415]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0247]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0363]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0363]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.037] Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0307]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0455]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0468]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0311]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0281]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0381]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0489]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0251]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0294]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0288]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0332]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0403]Epoch 5:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0346]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0346]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0469]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0407]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0411]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0332]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0316]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0282]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0315]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0383]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0368]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0316]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.024] Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0353]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.046] Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0437]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0357]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0256]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0278]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0289]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0354]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0404]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0272]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0295]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0386]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0375]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0353]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0274]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0365]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0288]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0411]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0441]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0289]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0381]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0321]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0281]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0275]Epoch 5:  62%|######2   | 60/96 [00:00<00:00, 597.18batch/s, train/train_loss_1=0.0456]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0238]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0312]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0341]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0384]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0291]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0493]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0411]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0227]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0334]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0353]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.024] Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0309]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0367]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0383]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0316]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0299]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.023] Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0303]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0353]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0357]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.03]  Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0374]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0316]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0399]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0266]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0262]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0438]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0334]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0414]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0294]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0334]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.038] Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.033]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0384]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0334]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0381]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0402]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0391]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0255]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0401]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0325]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0338]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0306]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0376]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0358]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.029] Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0301]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0345]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.04]  Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0322]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.024] Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0372]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0322]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0239]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0304]Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.036] Epoch 6:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0264]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0264]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0229]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0334]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0344]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0321]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0287]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0267]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0239]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0285]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0355]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0331]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0337]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0305]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0355]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0279]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0449]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0334]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0341]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0335]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0343]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0388]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0259]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0412]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0356]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0244]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0315]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0372]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0402]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0305]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0289]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0299]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0433]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0356]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0267]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0279]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0329]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0336]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0282]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0386]Epoch 6:  59%|#####9    | 57/96 [00:00<00:00, 569.90batch/s, train/train_loss_1=0.0487]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0308]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0297]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.028] Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0395]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0422]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0245]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0274]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0399]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0355]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0398]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.026] Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0283]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0369]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0303]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.027] Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0323]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0344]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0267]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0317]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0257]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0354]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0401]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0259]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0359]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0373]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.026] Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0384]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0327]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.028] Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0276]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0342]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0273]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0309]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0312]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0193]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0367]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0311]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0263]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0338]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0323]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0342]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0267]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0365]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0373]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0309]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0229]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.036] Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0208]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0228]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0361]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0236]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0469]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0307]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0286]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0334]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0329]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0375]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0344]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0186]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0377]Epoch 7:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0275]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0275]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0254]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0282]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0279]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0229]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0367]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0346]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0391]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0289]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0343]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0275]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0235]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0366]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0305]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0286]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0302]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0305]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0348]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0365]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0226]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0309]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0379]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.029] Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0337]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0392]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0349]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0247]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0285]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0296]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0256]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0298]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0367]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0364]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0229]Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.033] Epoch 7:  64%|######3   | 61/96 [00:00<00:00, 606.23batch/s, train/train_loss_1=0.0243]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0348]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0362]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0299]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0375]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0276]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0301]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0254]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.032] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0339]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0262]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0268]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0265]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0362]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0282]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0309]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0349]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0369]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0273]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.029] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0376]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0423]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0289]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.024] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0172]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0333]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0261]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0306]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.031] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0414]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0255]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0178]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0295]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0326]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0242]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0335]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0284]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0313]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0209]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0335]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0247]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0441]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0318]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0313]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0302]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0314]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0302]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0336]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0309]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0453]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0396]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0334]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0386]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0361]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0384]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0274]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.031] Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0319]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0389]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0268]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0236]Epoch 8:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0275]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0275]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0306]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0364]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0271]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0313]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0316]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0406]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.025] Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0235]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0361]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0252]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0272]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0346]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0326]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0286]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0331]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0302]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0267]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0365]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0327]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.027] Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0346]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0297]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0308]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0345]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0239]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0375]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0372]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0327]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0393]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0347]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0266]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0355]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0273]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0314]Epoch 8:  64%|######3   | 61/96 [00:00<00:00, 602.37batch/s, train/train_loss_1=0.0293]                                                                                         0%|          | 0/96 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0252]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0246]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0264]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0294]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0287]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0379]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0313]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0244]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0301]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0272]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0323]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0342]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0269]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0435]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0369]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0271]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0333]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.026] Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0255]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0235]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0256]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0265]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0219]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0254]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0306]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0338]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0319]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0358]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0285]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0246]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0307]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0239]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0321]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0305]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0282]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0238]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0363]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0267]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0223]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0383]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0307]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0312]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0386]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0303]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0212]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0193]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0323]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0332]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0348]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0302]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0411]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0259]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0265]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0427]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0265]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0248]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0284]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0389]Epoch 9:   0%|          | 0/96 [00:00<?, ?batch/s, train/train_loss_1=0.0252]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0252]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0286]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0382]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0289]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0279]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0288]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0279]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0229]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0334]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0309]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.023] Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0247]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.035] Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0224]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0286]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0377]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.039] Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0303]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0299]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0238]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0371]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0375]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0304]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0372]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0341]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0333]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0306]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0266]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0331]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0328]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0273]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0338]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0248]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0284]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0297]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0208]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.0321]Epoch 9:  61%|######1   | 59/96 [00:00<00:00, 584.12batch/s, train/train_loss_1=0.033] Epoch 9: 100%|##########| 96/96 [00:00<00:00, 528.02batch/s, train/train_loss_1=0.033]
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
  0%|          | 0/88 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/88 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/88 [00:01<?, ?batch/s, train/train_loss_1=0.143]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.143]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.136]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.13] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.123]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.12] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.114]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.11] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.11]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.104]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.101]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0964]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.113] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.094]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0965]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0958]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.095] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.101]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0947]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0947]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.109] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0898]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0977]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0973]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.108] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0934]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0965]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0847]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.087] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0969]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.104] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0935]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.105] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.103]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0946]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0967]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0892]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0775]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0837]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0882]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0783]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.095] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.078]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0828]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.09]  Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0864]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0844]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0876]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0921]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0848]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0986]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.082] Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0854]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0848]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0944]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0921]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0963]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0895]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0912]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0847]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0789]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0826]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0825]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0957]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0796]Epoch 0:   1%|1         | 1/88 [00:01<01:36,  1.10s/batch, train/train_loss_1=0.0712]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0712]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0822]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0873]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0972]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0886]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0895]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0784]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0925]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0716]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0845]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0896]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0938]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0821]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0788]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0818]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0829]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0819]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0915]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.082] Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0826]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0888]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0929]Epoch 0:  74%|#######3  | 65/88 [00:01<00:00, 74.09batch/s, train/train_loss_1=0.0832]Epoch 0:  74%|#######3  | 65/88 [00:02<00:00, 74.09batch/s, train/train_loss_1=0.0837]                                                                                        0%|          | 0/88 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0793]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0821]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0911]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0934]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0898]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0905]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0991]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0813]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0838]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0832]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0786]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0872]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0891]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0923]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.082] Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0929]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0899]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0842]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0708]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0819]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0852]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0709]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0778]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0893]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0755]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0789]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0792]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0766]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0849]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.076] Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0854]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0785]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0969]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0829]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0811]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0859]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0831]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0793]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0812]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0788]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0949]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0829]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0804]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0795]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.085] Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0772]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0738]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0908]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0783]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0836]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0877]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0773]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.085] Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0849]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0779]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.079] Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0803]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0821]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0855]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0835]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.082] Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.088]Epoch 1:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0769]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0886]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0785]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0946]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0784]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0793]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0844]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0819]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0777]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0695]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0919]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0659]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0848]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0875]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.071] Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0737]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0785]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0719]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0862]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.086] Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.068]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0752]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0842]Epoch 1:  74%|#######3  | 65/88 [00:00<00:00, 648.91batch/s, train/train_loss_1=0.0672]                                                                                         0%|          | 0/88 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0806]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0703]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0858]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0888]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0802]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0878]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0694]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0735]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0862]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0846]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0781]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0731]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0906]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0732]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0872]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0759]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0814]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0789]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0874]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0728]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0841]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0609]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0846]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0732]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0799]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0833]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0799]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0795]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0663]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0649]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0817]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0855]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0845]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0744]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0808]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0778]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0695]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0781]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0827]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0851]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0789]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.068] Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0817]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0897]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0895]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0815]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0881]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0963]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0926]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0825]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0679]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0704]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0793]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0827]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0735]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0748]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0791]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0726]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0732]Epoch 2:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0705]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0705]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0823]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0707]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0654]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.077] Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0789]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.075] Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0819]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0802]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0858]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0907]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0822]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0856]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0795]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0756]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0703]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0774]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0804]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.079] Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0717]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0662]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0756]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0629]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0732]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0685]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0653]Epoch 2:  70%|#######   | 62/88 [00:00<00:00, 619.52batch/s, train/train_loss_1=0.0778]                                                                                         0%|          | 0/88 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0852]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0754]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0716]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0876]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0862]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0875]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0787]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0779]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0834]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0802]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0826]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0885]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0648]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0846]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0898]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0751]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0666]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0818]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0657]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0733]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0798]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0713]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0695]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0764]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0642]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0774]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0739]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0754]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0818]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0826]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0788]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0702]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.077] Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0795]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0817]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0677]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.073] Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0757]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0719]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0672]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0585]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0803]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0766]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0848]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0851]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0849]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0767]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0743]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0607]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0729]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.076] Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0779]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0635]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0821]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0908]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0849]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0674]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0626]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0781]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0724]Epoch 3:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0906]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0906]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0802]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0808]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0661]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0584]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.074] Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0725]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0831]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0752]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0681]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0772]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0826]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0793]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0801]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0672]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0692]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0707]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0546]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0638]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0756]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.072] Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0814]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0722]Epoch 3:  74%|#######3  | 65/88 [00:00<00:00, 640.90batch/s, train/train_loss_1=0.0844]                                                                                         0%|          | 0/88 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0818]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0773]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0607]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0639]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0821]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0705]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0791]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0622]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0612]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0853]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0801]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0658]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0649]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0816]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0758]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0707]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0754]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.073] Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0781]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0731]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0791]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.072] Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0736]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0612]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0729]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0671]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0697]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0718]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0846]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0727]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0774]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0691]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0785]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0798]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0813]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.071] Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0687]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0788]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0746]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.076] Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0682]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0735]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0853]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0645]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0696]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0849]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0799]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0716]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0885]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0824]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0674]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0621]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0623]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0709]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0805]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0721]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0762]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0814]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0685]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0856]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.069] Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0794]Epoch 4:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0655]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0655]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0717]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0868]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.055] Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.076]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0829]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0632]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.071] Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0756]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.079] Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.072]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0814]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0625]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0698]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0911]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0688]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0723]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0709]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0748]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0806]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0623]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0642]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0746]Epoch 4:  74%|#######3  | 65/88 [00:00<00:00, 645.90batch/s, train/train_loss_1=0.0687]                                                                                         0%|          | 0/88 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0835]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0614]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0808]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.07]  Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0789]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0639]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0696]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.072] Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0693]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0643]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0612]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0715]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0784]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0791]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0664]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0656]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0846]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0871]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0804]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0761]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.067] Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0759]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0734]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0813]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0714]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0688]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.079] Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0773]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0678]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0585]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0768]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0669]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0622]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0806]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0743]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0658]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0802]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0726]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0592]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0672]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0707]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0768]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0722]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0706]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0677]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0708]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0796]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0726]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0667]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0578]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0794]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0665]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0635]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0737]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0588]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0733]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0728]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.075] Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0895]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0843]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0741]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0688]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0896]Epoch 5:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0716]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0716]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.079] Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0766]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0787]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0795]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0845]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0752]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0724]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0721]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0801]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0684]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0743]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0779]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0808]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0744]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0671]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0572]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0774]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0617]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0691]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0819]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0683]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0787]Epoch 5:  74%|#######3  | 65/88 [00:00<00:00, 649.48batch/s, train/train_loss_1=0.0834]                                                                                         0%|          | 0/88 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0668]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0853]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0827]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0712]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.065] Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.074]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0699]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0727]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0776]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0824]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.074] Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0713]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0849]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.076] Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0647]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0761]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0705]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0666]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0694]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0722]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0663]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0657]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0654]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.084] Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0736]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.067] Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0693]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0741]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0615]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0624]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0865]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0825]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0728]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0669]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0666]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0698]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0644]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0717]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0795]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.075] Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0643]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0748]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.066] Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0701]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0773]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0877]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0689]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0797]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0836]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0732]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0774]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0668]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0727]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0688]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.059] Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0681]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0767]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0681]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0828]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0772]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0652]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0506]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0624]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0838]Epoch 6:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0803]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0803]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0824]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.072] Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0631]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0733]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.07]  Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0755]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0754]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0691]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0663]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0806]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0784]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0817]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0692]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0556]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0742]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0673]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0747]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0681]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0696]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0843]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0769]Epoch 6:  75%|#######5  | 66/88 [00:00<00:00, 653.74batch/s, train/train_loss_1=0.0877]                                                                                         0%|          | 0/88 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.078]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0765]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0697]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0691]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0682]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0772]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0854]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.06]  Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0734]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0735]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0636]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0761]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0786]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0764]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0659]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0751]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0879]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0837]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0679]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0651]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0723]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0783]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0773]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0776]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0796]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0637]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0675]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0683]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0703]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0662]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0599]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0554]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0742]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0673]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0841]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0654]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0874]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0715]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.07]  Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0544]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0685]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.068] Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0692]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.074] Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0697]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0739]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0663]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0816]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0793]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0692]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0772]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0805]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0667]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0696]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0711]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.059] Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0703]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0723]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0688]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0759]Epoch 7:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0684]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0684]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0756]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0735]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0799]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0562]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0716]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.066] Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0722]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0758]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0787]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0738]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0739]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0692]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.072] Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0677]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.073] Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0726]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0705]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0694]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0718]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.078] Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0799]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0691]Epoch 7:  74%|#######3  | 65/88 [00:00<00:00, 642.88batch/s, train/train_loss_1=0.0802]                                                                                         0%|          | 0/88 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0732]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0699]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0569]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0707]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0687]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0731]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0717]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0749]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0745]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0783]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0709]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0651]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0694]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0715]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0856]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0722]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0778]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0879]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0748]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0689]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0627]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0715]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0665]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0662]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0667]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0775]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.081] Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0706]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0852]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0706]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.072] Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0796]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0731]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0749]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0887]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0594]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0746]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0825]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0766]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.076] Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.073]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0646]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0806]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0688]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0756]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0666]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0763]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0728]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0635]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0692]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0678]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0619]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0741]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0502]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0851]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.07]  Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0715]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0789]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0611]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0741]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0629]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0829]Epoch 8:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0831]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0831]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0762]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0767]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.078] Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0873]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.078] Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0672]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0719]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0607]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0648]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0605]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0762]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0742]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0602]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0611]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0753]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0601]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0576]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0787]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0678]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0733]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0824]Epoch 8:  75%|#######5  | 66/88 [00:00<00:00, 652.66batch/s, train/train_loss_1=0.0531]                                                                                         0%|          | 0/88 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0744]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0652]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0771]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0688]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0742]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0684]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0694]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0648]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0777]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0741]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0711]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0547]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0677]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.063] Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0663]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0672]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0736]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0696]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0717]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0814]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0787]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0596]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0621]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0893]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0716]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.069] Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0813]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0805]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0712]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0558]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0657]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0691]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0708]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0659]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0872]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0717]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0742]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0774]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0856]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0661]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0798]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0659]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0779]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0764]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0681]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0691]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0675]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.072] Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0704]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0558]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0579]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0706]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0587]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0658]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0718]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0689]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0808]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0776]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0718]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0793]Epoch 9:   0%|          | 0/88 [00:00<?, ?batch/s, train/train_loss_1=0.0631]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0631]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0666]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0672]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0774]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0711]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0891]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0589]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0751]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0825]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0631]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0646]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0772]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0693]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0723]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0686]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0579]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0629]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0767]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0749]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0855]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.084] Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0827]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0679]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0835]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0772]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0776]Epoch 9:  70%|#######   | 62/88 [00:00<00:00, 613.00batch/s, train/train_loss_1=0.0869]Epoch 9: 100%|##########| 88/88 [00:00<00:00, 536.30batch/s, train/train_loss_1=0.0869]
  0%|          | 0/14 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/14 [00:01<?, ?batch/s, train/train_loss_1=0.136]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.136]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.129]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.121]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.127]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.115]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.111]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.115]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.106]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.131]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.114]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.124]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.124]Epoch 0:   7%|7         | 1/14 [00:01<00:13,  1.06s/batch, train/train_loss_1=0.114]Epoch 0:   7%|7         | 1/14 [00:02<00:13,  1.06s/batch, train/train_loss_1=0.138]Epoch 0: 100%|##########| 14/14 [00:02<00:00,  7.59batch/s, train/train_loss_1=0.138]                                                                                       0%|          | 0/14 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.119]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.1]  Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.121]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.113]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0894]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0994]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0998]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0917]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0934]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0959]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.071]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0782]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0771]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0737]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0908]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0723]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0949]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0716]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0876]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0578]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0625]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0585]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.086]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0631]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0718]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0402]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0555]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.055] Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0527]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0695]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0861]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0708]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0511]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.041] Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0651]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0402]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0664]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.04]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0405]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0261]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0688]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0443]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0642]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0677]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.051] Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0476]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.058] Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0322]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0829]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0456]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0147]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0356]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0393]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0349]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0427]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0495]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0487]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.031] Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0398]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0348]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0335]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0344]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0543]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0167]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.04]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0353]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0314]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0487]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0218]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0414]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0316]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0547]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0464]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0789]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0394]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0575]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0444]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0266]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0169]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0412]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.04]  Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0511]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.023] Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0272]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0469]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0376]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0291]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0449]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0253]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0225]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0433]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0379]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0245]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0559]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.045] Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0394]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0189]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.033] Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0354]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0277]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0358]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0504]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0284]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0242]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0464]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.02]  Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0314]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0263]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0262]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0301]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0457]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0573]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0381]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0312]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.00837]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.00966]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.027]  Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.031]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0196]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.033] Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0509]Epoch 9: 100%|##########| 14/14 [00:00<00:00, 944.10batch/s, train/train_loss_1=0.0509]
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
  0%|          | 0/16 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/16 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.173]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.173]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.119]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.131]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.116]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.121]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.0977]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.118] Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.104]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.1]  Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.107]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.103]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.0889]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.133] Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.108]Epoch 0:   6%|6         | 1/16 [00:00<00:14,  1.02batch/s, train/train_loss_1=0.0993]Epoch 0:   6%|6         | 1/16 [00:02<00:14,  1.02batch/s, train/train_loss_1=0.141] Epoch 0: 100%|##########| 16/16 [00:02<00:00,  8.85batch/s, train/train_loss_1=0.141]                                                                                       0%|          | 0/16 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0872]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0991]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0854]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0584]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0998]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.117] Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0761]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.121] Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0884]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0942]Epoch 1:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0387]                                                                               0%|          | 0/16 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0828]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0838]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.107] Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0938]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0941]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0641]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0861]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0834]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.105] Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0735]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0735]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0822]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0732]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0764]Epoch 2:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0421]                                                                               0%|          | 0/16 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0773]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0699]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0761]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0662]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0919]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0685]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0536]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.073] Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0768]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0807]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0647]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0498]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0932]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0938]Epoch 3:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0672]                                                                               0%|          | 0/16 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0767]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0841]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0376]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0553]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0603]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0665]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0787]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0459]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0626]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0528]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0428]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.034] Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.035]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.053]Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.07] Epoch 4:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0257]                                                                               0%|          | 0/16 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0697]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0585]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0521]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0256]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0797]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.03]  Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0545]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0477]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0312]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0438]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0419]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0845]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0572]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0456]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0569]Epoch 5:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0323]                                                                               0%|          | 0/16 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.032]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.047]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0648]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0521]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0501]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0305]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0248]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0566]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0594]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0227]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0552]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0708]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0444]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0469]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0408]Epoch 6:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0155]                                                                               0%|          | 0/16 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0337]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0505]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0355]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0541]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0216]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0588]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.058] Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.04] Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0597]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0384]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0525]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0177]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0379]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0151]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0365]Epoch 7:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0297]                                                                               0%|          | 0/16 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0557]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0282]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0369]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0303]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0248]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0353]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0452]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0458]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0467]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0391]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0267]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0576]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0428]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0373]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0304]Epoch 8:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0221]                                                                               0%|          | 0/16 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.066]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0292]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0635]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0227]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0278]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0239]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0459]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0494]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0494]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0294]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0219]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0207]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0146]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0188]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.0157]Epoch 9:   0%|          | 0/16 [00:00<?, ?batch/s, train/train_loss_1=0.129] Epoch 9: 100%|##########| 16/16 [00:00<00:00, 884.96batch/s, train/train_loss_1=0.129]
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
  0%|          | 0/11 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/11 [00:01<?, ?batch/s, train/train_loss_1=0.156]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.20s/batch, train/train_loss_1=0.156]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.20s/batch, train/train_loss_1=0.138]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.20s/batch, train/train_loss_1=0.141]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.20s/batch, train/train_loss_1=0.135]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.20s/batch, train/train_loss_1=0.124]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.20s/batch, train/train_loss_1=0.126]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.20s/batch, train/train_loss_1=0.127]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.20s/batch, train/train_loss_1=0.12] Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.20s/batch, train/train_loss_1=0.127]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.20s/batch, train/train_loss_1=0.111]Epoch 0:   9%|9         | 1/11 [00:02<00:11,  1.20s/batch, train/train_loss_1=0.122]Epoch 0: 100%|##########| 11/11 [00:02<00:00,  5.54batch/s, train/train_loss_1=0.122]                                                                                       0%|          | 0/11 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.115]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.123]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.131]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.123]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.117]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.119]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.113]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.106]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.121]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0913]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.11] Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.119]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.117]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.118]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0937]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.115]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0903]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0806]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0838]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.104] Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0972]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0815]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.11]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.1] Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0902]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0783]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0868]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0863]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0928]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.109] Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.101]                                                                              0%|          | 0/11 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0899]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.075] Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0887]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.11]  Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.126]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0979]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.111] Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0756]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0752]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0798]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0778]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0721]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0895]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.107] Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0664]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0976]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.091] Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.08] Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0844]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0866]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0826]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0624]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0828]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.111] Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0775]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0782]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0641]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.065]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0739]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.074]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0927]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0816]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0646]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0782]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0771]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.066] Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0668]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0631]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0742]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0767]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0586]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.069] Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0685]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0687]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.121] Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0803]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0675]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0674]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0638]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0687]Epoch 9: 100%|##########| 11/11 [00:00<00:00, 723.87batch/s, train/train_loss_1=0.0687]
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
  0%|          | 0/12 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/12 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/12 [00:01<?, ?batch/s, train/train_loss_1=0.147]Epoch 0:   8%|8         | 1/12 [00:01<00:12,  1.12s/batch, train/train_loss_1=0.147]Epoch 0:   8%|8         | 1/12 [00:01<00:12,  1.12s/batch, train/train_loss_1=0.126]Epoch 0:   8%|8         | 1/12 [00:01<00:12,  1.12s/batch, train/train_loss_1=0.119]Epoch 0:   8%|8         | 1/12 [00:01<00:12,  1.12s/batch, train/train_loss_1=0.112]Epoch 0:   8%|8         | 1/12 [00:01<00:12,  1.12s/batch, train/train_loss_1=0.117]Epoch 0:   8%|8         | 1/12 [00:01<00:12,  1.12s/batch, train/train_loss_1=0.121]Epoch 0:   8%|8         | 1/12 [00:01<00:12,  1.12s/batch, train/train_loss_1=0.101]Epoch 0:   8%|8         | 1/12 [00:01<00:12,  1.12s/batch, train/train_loss_1=0.114]Epoch 0:   8%|8         | 1/12 [00:01<00:12,  1.12s/batch, train/train_loss_1=0.109]Epoch 0:   8%|8         | 1/12 [00:01<00:12,  1.12s/batch, train/train_loss_1=0.119]Epoch 0:   8%|8         | 1/12 [00:01<00:12,  1.12s/batch, train/train_loss_1=0.123]Epoch 0:   8%|8         | 1/12 [00:02<00:12,  1.12s/batch, train/train_loss_1=0.119]Epoch 0: 100%|##########| 12/12 [00:02<00:00,  6.05batch/s, train/train_loss_1=0.119]                                                                                       0%|          | 0/12 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.124]Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0921]Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0982]Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.114] Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.145]Epoch 1:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0696]                                                                               0%|          | 0/12 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0961]Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0895]Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.108] Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0888]Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.111] Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0837]Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.108] Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.12] Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 2:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.105]                                                                              0%|          | 0/12 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0938]Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.11] Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0973]Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0894]Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0921]Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0966]Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.11] Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 3:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0981]                                                                               0%|          | 0/12 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0987]Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.082] Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0702]Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0849]Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0962]Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.118] Epoch 4:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0757]                                                                               0%|          | 0/12 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.09]Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0875]Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0966]Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0675]Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0768]Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0976]Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.127] Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0804]Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0905]Epoch 5:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0964]                                                                               0%|          | 0/12 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0979]Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0934]Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0912]Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0842]Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.08]  Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0872]Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0848]Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.079] Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0792]Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.125] Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0884]Epoch 6:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0898]                                                                               0%|          | 0/12 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0929]Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0981]Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0913]Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0971]Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0856]Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.105] Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0928]Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0733]Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0863]Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.083] Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 7:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0914]                                                                               0%|          | 0/12 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0941]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0699]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0676]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0838]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.084] Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0841]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0811]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0958]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0796]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0812]Epoch 8:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0676]                                                                               0%|          | 0/12 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0941]Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0828]Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.109] Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0728]Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0775]Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0934]Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.083] Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.087]Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0929]Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.094] Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.0488]Epoch 9:   0%|          | 0/12 [00:00<?, ?batch/s, train/train_loss_1=0.057] Epoch 9: 100%|##########| 12/12 [00:00<00:00, 757.28batch/s, train/train_loss_1=0.057]
  0%|          | 0/14 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/14 [00:01<?, ?batch/s, train/train_loss_1=0.126]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.126]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.127]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.125]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.124]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.123]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.121]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.122]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.12] Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.121]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.118]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.118]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.116]Epoch 0:   7%|7         | 1/14 [00:01<00:14,  1.12s/batch, train/train_loss_1=0.119]Epoch 0:   7%|7         | 1/14 [00:02<00:14,  1.12s/batch, train/train_loss_1=0.112]Epoch 0: 100%|##########| 14/14 [00:02<00:00,  6.83batch/s, train/train_loss_1=0.112]                                                                                       0%|          | 0/14 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.113]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0977]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0964]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0976]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0988]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0944]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0937]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0864]Epoch 1:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0934]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0806]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0864]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0771]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0774]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.078] Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.069]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0706]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0691]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0727]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0664]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0664]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0591]Epoch 2:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0623]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0573]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0542]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0498]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0557]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0526]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0584]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0468]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0601]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0629]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.062] Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0525]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0608]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0525]Epoch 3:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0492]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0496]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0602]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0435]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0498]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0428]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0484]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0608]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0465]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0493]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0452]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0407]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0467]Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.05]  Epoch 4:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0313]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0493]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0317]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0482]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0299]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0393]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0397]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0409]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.05]  Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0395]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0471]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0456]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0422]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0358]Epoch 5:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0582]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0472]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0449]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0363]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0361]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0287]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0407]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0342]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0541]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0458]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.042] Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0332]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0415]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0383]Epoch 6:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0426]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.032]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0367]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.039] Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0322]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0332]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0254]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0483]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0419]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0365]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0306]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0304]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0402]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0335]Epoch 7:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0444]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0416]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0429]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0419]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0301]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0392]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0343]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0328]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0341]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0348]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.025] Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0326]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0358]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0393]Epoch 8:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0426]                                                                               0%|          | 0/14 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0331]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0336]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0359]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0369]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0296]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0311]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0319]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0323]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0282]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0338]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0268]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0323]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0345]Epoch 9:   0%|          | 0/14 [00:00<?, ?batch/s, train/train_loss_1=0.0338]Epoch 9: 100%|##########| 14/14 [00:00<00:00, 548.25batch/s, train/train_loss_1=0.0338]
  0%|          | 0/8 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/8 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/8 [00:01<?, ?batch/s, train/train_loss_1=0.151]Epoch 0:  12%|#2        | 1/8 [00:01<00:07,  1.12s/batch, train/train_loss_1=0.151]Epoch 0:  12%|#2        | 1/8 [00:01<00:07,  1.12s/batch, train/train_loss_1=0.127]Epoch 0:  12%|#2        | 1/8 [00:01<00:07,  1.12s/batch, train/train_loss_1=0.107]Epoch 0:  12%|#2        | 1/8 [00:01<00:07,  1.12s/batch, train/train_loss_1=0.0926]Epoch 0:  12%|#2        | 1/8 [00:01<00:07,  1.12s/batch, train/train_loss_1=0.0833]Epoch 0:  12%|#2        | 1/8 [00:01<00:07,  1.12s/batch, train/train_loss_1=0.0707]Epoch 0:  12%|#2        | 1/8 [00:01<00:07,  1.12s/batch, train/train_loss_1=0.0707]Epoch 0:  12%|#2        | 1/8 [00:02<00:07,  1.12s/batch, train/train_loss_1=0.0788]Epoch 0: 100%|##########| 8/8 [00:02<00:00,  3.83batch/s, train/train_loss_1=0.0788]                                                                                      0%|          | 0/8 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/8 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.062]Epoch 1:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0457]Epoch 1:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.051] Epoch 1:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.056]Epoch 1:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0512]Epoch 1:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0334]Epoch 1:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0339]Epoch 1:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0378]                                                                              0%|          | 0/8 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/8 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0359]Epoch 2:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0358]Epoch 2:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0415]Epoch 2:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0422]Epoch 2:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0412]Epoch 2:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0308]Epoch 2:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0301]Epoch 2:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.044]                                                                              0%|          | 0/8 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/8 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0364]Epoch 3:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0283]Epoch 3:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0336]Epoch 3:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0373]Epoch 3:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0282]Epoch 3:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0391]Epoch 3:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0312]Epoch 3:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0401]                                                                              0%|          | 0/8 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/8 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0407]Epoch 4:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0209]Epoch 4:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0417]Epoch 4:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0312]Epoch 4:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0355]Epoch 4:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0281]Epoch 4:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0431]Epoch 4:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0204]                                                                              0%|          | 0/8 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/8 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0318]Epoch 5:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0383]Epoch 5:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.021] Epoch 5:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0362]Epoch 5:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0186]Epoch 5:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0381]Epoch 5:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0405]Epoch 5:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0338]                                                                              0%|          | 0/8 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/8 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0199]Epoch 6:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.024] Epoch 6:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0424]Epoch 6:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0198]Epoch 6:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0433]Epoch 6:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.044] Epoch 6:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0328]Epoch 6:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0239]                                                                              0%|          | 0/8 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/8 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0257]Epoch 7:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0254]Epoch 7:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0463]Epoch 7:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0222]Epoch 7:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.04]  Epoch 7:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0238]Epoch 7:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0294]Epoch 7:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0441]                                                                              0%|          | 0/8 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/8 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0276]Epoch 8:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0282]Epoch 8:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0199]Epoch 8:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0415]Epoch 8:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0327]Epoch 8:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0455]Epoch 8:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0255]Epoch 8:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0232]                                                                              0%|          | 0/8 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/8 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0314]Epoch 9:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0263]Epoch 9:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0365]Epoch 9:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0277]Epoch 9:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0399]Epoch 9:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0279]Epoch 9:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0359]Epoch 9:   0%|          | 0/8 [00:00<?, ?batch/s, train/train_loss_1=0.0232]Epoch 9: 100%|##########| 8/8 [00:00<00:00, 561.45batch/s, train/train_loss_1=0.0232]
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
  0%|          | 0/7 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/7 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/7 [00:01<?, ?batch/s, train/train_loss_1=0.171]Epoch 0:  14%|#4        | 1/7 [00:01<00:08,  1.38s/batch, train/train_loss_1=0.171]Epoch 0:  14%|#4        | 1/7 [00:01<00:08,  1.38s/batch, train/train_loss_1=0.147]Epoch 0:  14%|#4        | 1/7 [00:01<00:08,  1.38s/batch, train/train_loss_1=0.136]Epoch 0:  14%|#4        | 1/7 [00:01<00:08,  1.38s/batch, train/train_loss_1=0.127]Epoch 0:  14%|#4        | 1/7 [00:01<00:08,  1.38s/batch, train/train_loss_1=0.118]Epoch 0:  14%|#4        | 1/7 [00:01<00:08,  1.38s/batch, train/train_loss_1=0.112]Epoch 0:  14%|#4        | 1/7 [00:02<00:08,  1.38s/batch, train/train_loss_1=0.129]Epoch 0: 100%|##########| 7/7 [00:02<00:00,  3.26batch/s, train/train_loss_1=0.129]                                                                                     0%|          | 0/7 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/7 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.114]Epoch 1:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 1:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.11] Epoch 1:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.115]Epoch 1:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 1:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.094]Epoch 1:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.105]                                                                             0%|          | 0/7 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/7 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.116]Epoch 2:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0976]Epoch 2:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.108] Epoch 2:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.106]Epoch 2:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 2:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 2:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.132]                                                                             0%|          | 0/7 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/7 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.115]Epoch 3:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 3:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 3:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 3:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0983]Epoch 3:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0878]Epoch 3:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0899]                                                                              0%|          | 0/7 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/7 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 4:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0976]Epoch 4:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.092] Epoch 4:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 4:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0937]Epoch 4:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 4:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0908]                                                                              0%|          | 0/7 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/7 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.096]Epoch 5:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0983]Epoch 5:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 5:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0868]Epoch 5:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.104] Epoch 5:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0901]Epoch 5:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.125]                                                                              0%|          | 0/7 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/7 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0924]Epoch 6:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 6:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0939]Epoch 6:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0893]Epoch 6:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 6:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0971]Epoch 6:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.071]                                                                              0%|          | 0/7 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/7 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0913]Epoch 7:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0845]Epoch 7:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.093] Epoch 7:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0826]Epoch 7:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0866]Epoch 7:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0901]Epoch 7:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0857]                                                                              0%|          | 0/7 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/7 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0895]Epoch 8:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0865]Epoch 8:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.083] Epoch 8:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0878]Epoch 8:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0883]Epoch 8:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0897]Epoch 8:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0795]                                                                              0%|          | 0/7 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/7 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0908]Epoch 9:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0827]Epoch 9:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0872]Epoch 9:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0898]Epoch 9:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0759]Epoch 9:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.0781]Epoch 9:   0%|          | 0/7 [00:00<?, ?batch/s, train/train_loss_1=0.1]   Epoch 9: 100%|##########| 7/7 [00:00<00:00, 639.63batch/s, train/train_loss_1=0.1]
  0%|          | 0/11 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/11 [00:01<?, ?batch/s, train/train_loss_1=0.128]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.13s/batch, train/train_loss_1=0.128]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.13s/batch, train/train_loss_1=0.119]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.13s/batch, train/train_loss_1=0.118]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.13s/batch, train/train_loss_1=0.14] Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.13s/batch, train/train_loss_1=0.123]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.13s/batch, train/train_loss_1=0.121]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.13s/batch, train/train_loss_1=0.119]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.13s/batch, train/train_loss_1=0.121]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.13s/batch, train/train_loss_1=0.117]Epoch 0:   9%|9         | 1/11 [00:01<00:11,  1.13s/batch, train/train_loss_1=0.117]Epoch 0:   9%|9         | 1/11 [00:02<00:11,  1.13s/batch, train/train_loss_1=0.116]Epoch 0: 100%|##########| 11/11 [00:02<00:00,  4.74batch/s, train/train_loss_1=0.116]                                                                                       0%|          | 0/11 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.113]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.113]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.109]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.106]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.112]Epoch 1:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.105]                                                                              0%|          | 0/11 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0998]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0992]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0992]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0979]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0994]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0827]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0937]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 2:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0957]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0966]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0796]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0987]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0892]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0852]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0829]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0928]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0896]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0928]Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.083] Epoch 3:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0973]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0911]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0808]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0798]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0972]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0788]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.087] Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0739]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.086] Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0829]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0798]Epoch 4:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0851]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0757]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0857]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0881]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0725]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0755]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0768]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0646]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0926]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0685]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0705]Epoch 5:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0749]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0656]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0712]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0778]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0823]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0731]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0765]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0715]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0687]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.082] Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0731]Epoch 6:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0722]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0697]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0754]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0776]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.053] Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0654]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0659]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.073] Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0715]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0705]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0634]Epoch 7:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0773]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0684]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0638]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0752]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0677]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0508]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0609]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0663]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0719]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0755]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0636]Epoch 8:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0578]                                                                               0%|          | 0/11 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0624]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0639]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0665]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0494]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0589]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0586]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0796]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.069] Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0689]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0529]Epoch 9:   0%|          | 0/11 [00:00<?, ?batch/s, train/train_loss_1=0.0545]Epoch 9: 100%|##########| 11/11 [00:00<00:00, 225.54batch/s, train/train_loss_1=0.0545]
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
  0%|          | 0/21 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/21 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/21 [00:01<?, ?batch/s, train/train_loss_1=0.171]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.171]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.159]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.136]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.132]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.111]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.108]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.107]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.109]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.106]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.102]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.103]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.115]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.0953]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.0971]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.099] Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.0867]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.102] Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.0946]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.0914]Epoch 0:   5%|4         | 1/21 [00:01<00:22,  1.13s/batch, train/train_loss_1=0.0918]Epoch 0:   5%|4         | 1/21 [00:02<00:22,  1.13s/batch, train/train_loss_1=0.0912]Epoch 0: 100%|##########| 21/21 [00:02<00:00, 10.16batch/s, train/train_loss_1=0.0912]                                                                                        0%|          | 0/21 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0912]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.107] Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0924]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0838]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0866]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0864]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0932]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0936]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0846]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0864]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0858]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0804]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0943]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0848]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0768]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0871]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0811]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0907]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0775]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0981]Epoch 1:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0734]                                                                               0%|          | 0/21 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0931]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0782]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0811]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0829]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.082] Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0786]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0902]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.081] Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0788]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0774]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0774]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0931]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0843]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0903]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.085] Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0734]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0829]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0672]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0837]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0859]Epoch 2:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0861]                                                                               0%|          | 0/21 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0772]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.081] Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0813]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0726]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0807]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0836]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0751]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0869]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0864]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0865]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.083] Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0751]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0787]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0766]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0695]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0853]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0866]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0825]Epoch 3:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0738]                                                                               0%|          | 0/21 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0842]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0658]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.087] Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0665]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0787]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0773]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0828]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0848]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.086] Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0793]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0857]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0712]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0776]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0723]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0749]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0891]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0662]Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.081] Epoch 4:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0892]                                                                               0%|          | 0/21 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.073]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0673]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0714]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0767]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0706]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0645]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0802]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0729]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0721]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0772]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0845]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0804]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0791]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0813]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0871]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0819]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0838]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.087] Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0725]Epoch 5:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0822]                                                                               0%|          | 0/21 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0805]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0795]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0713]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0698]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0832]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0914]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0803]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0726]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0648]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0805]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0819]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0774]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0728]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.077] Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.077]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0788]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0648]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0773]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0763]Epoch 6:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0742]                                                                               0%|          | 0/21 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0628]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0728]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0822]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0752]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0823]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.074] Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0787]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.076] Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0675]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0838]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0784]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0827]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0767]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0653]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.069] Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0833]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0801]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0736]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0815]Epoch 7:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0655]                                                                               0%|          | 0/21 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0715]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.071] Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0773]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0689]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0802]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0681]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0673]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0863]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0693]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0644]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0715]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0763]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0871]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0857]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0745]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0861]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0643]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0807]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0623]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0657]Epoch 8:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0776]                                                                               0%|          | 0/21 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0794]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0644]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0696]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0866]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0786]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0727]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0684]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0632]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0681]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.074] Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0859]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0713]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0716]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0759]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0736]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0781]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0722]Epoch 9:   0%|          | 0/21 [00:00<?, ?batch/s, train/train_loss_1=0.0746]Epoch 9: 100%|##########| 21/21 [00:00<00:00, 555.71batch/s, train/train_loss_1=0.0746]
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
  0%|          | 0/655 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/655 [00:00<?, ?batch/s]Epoch 0:   0%|          | 0/655 [00:01<?, ?batch/s, train/train_loss_1=0.141]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.141]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.127]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.126]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.122]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.12] Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.121]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.126]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.118]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.122]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.126]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.129]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.123]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.126]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.126]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.12] Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.12]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.124]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.12] Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.115]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.113]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.114]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.118]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.111]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.12] Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.118]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.115]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.108]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.117]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.107]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.124]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.115]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.122]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.116]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.106]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.116]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.123]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.117]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.117]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.126]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.103]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.119]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.108]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.108]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.111]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.107]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.115]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.112]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.115]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.108]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.09] Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.113]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.107]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.119]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.115]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.121]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.118]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.109]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.129]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.115]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.109]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.109]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.102]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.115]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.107]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.109]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.113]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.109]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.117]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.119]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.106]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.117]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.117]Epoch 0:   0%|          | 1/655 [00:01<12:08,  1.11s/batch, train/train_loss_1=0.119]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.119]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.125]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.12] Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.116]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.106]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.113]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.116]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.112]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.117]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.11] Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.111]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.113]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.124]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.0973]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.107] Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.118]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.111]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.125]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.113]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.104]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.108]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.114]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.105]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.122]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.112]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.116]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.0984]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.112] Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.108]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.105]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.12] Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.105]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.122]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.107]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.105]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.101]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.115]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.106]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.103]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.11] Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.11]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.11]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.105]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.109]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.113]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.105]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.108]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.103]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.107]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.104]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.111]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.117]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.116]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.102]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.117]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.11] Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.111]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.114]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.112]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.102]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.116]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.103]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.0992]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.0977]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.117] Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.0951]Epoch 0:  11%|#1        | 73/655 [00:01<00:07, 82.53batch/s, train/train_loss_1=0.102] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.102]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.108]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.125]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.106]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.0963]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.111] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.106]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.118]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.114]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.119]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.103]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.104]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.118]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.103]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.118]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.117]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.119]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.11] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.107]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.104]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.103]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.107]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.113]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.111]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.108]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.109]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.0988]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.111] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.103]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.104]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.0934]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.103] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.11] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.102]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.105]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.0934]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.101] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.102]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.103]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.0883]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.112] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.106]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.122]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.109]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.11] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.115]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.0981]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.106] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.101]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.109]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.103]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.0999]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.1]   Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.104]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.112]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.105]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.0977]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.101] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.103]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.0938]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.104] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.0951]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.104] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.106]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.112]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.109]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.102]Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.11] Epoch 0:  21%|##1       | 139/655 [00:01<00:03, 162.81batch/s, train/train_loss_1=0.103]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.103]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0913]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.111] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0866]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.111] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.105]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0963]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0982]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.099] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0968]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.117] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.115]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.111]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.112]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.111]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.114]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0993]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.111] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.108]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.114]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.101]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.105]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.109]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.107]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0969]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.113] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.113]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.115]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0991]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.111] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.111]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.111]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0959]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.105] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.109]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0985]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.12]  Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.103]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.108]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.108]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.11] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.104]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.117]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.105]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0989]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.103] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.1]  Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.109]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.11] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.108]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.113]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.106]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.108]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.113]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.102]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.1]  Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.1]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.102]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0998]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.108] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0929]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.109] Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.1]  Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.121]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.101]Epoch 0:  32%|###1      | 207/655 [00:01<00:01, 248.24batch/s, train/train_loss_1=0.0981]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0981]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.11]  Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0987]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.104] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.103]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.115]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0995]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.114] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.117]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.108]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.102]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.1]  Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.108]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.104]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.103]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.101]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0983]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.108] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.111]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.111]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0973]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.117] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0969]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0982]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.113] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0943]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.122] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.106]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.111]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.104]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0806]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.11]  Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.115]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.101]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.104]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0902]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0963]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.105] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0863]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0964]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.096] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0976]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.109] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0944]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.104] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0951]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.102] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.105]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0944]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0983]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0987]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0964]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.109] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.102]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0955]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.11]  Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0978]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.104] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0958]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0992]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.11]  Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.1] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0987]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0934]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.111] Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.104]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.103]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.101]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.111]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.103]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.113]Epoch 0:  42%|####1     | 272/655 [00:01<00:01, 324.81batch/s, train/train_loss_1=0.0989]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0989]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.102] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0925]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0917]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.111] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0935]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.108] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.105]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.109]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0908]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0974]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0864]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.106] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.103]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0972]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0967]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0946]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0972]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.1]   Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.105]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0966]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.11]  Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.117]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.11] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0984]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.106] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0936]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.101] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0958]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.105] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0865]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.118] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.106]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.106]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.107]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.109]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0891]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.103] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.119]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.106]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.106]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.1]  Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.113]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0987]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.109] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0991]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.108] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0911]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0939]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.112] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0897]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.104] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.092]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.103]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.108]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0851]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.097] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0871]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.113] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.107]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.107]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.088]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0932]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0952]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.0913]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.106] Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.109]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.097]Epoch 0:  52%|#####2    | 343/655 [00:01<00:00, 406.53batch/s, train/train_loss_1=0.11] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.11]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.106]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0937]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0972]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.103] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.104]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.12] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0898]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.107] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.105]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.116]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.102]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.106]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.106]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.104]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0902]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.103] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.11] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0959]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.119] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.104]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.111]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.103]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.111]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0932]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0849]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.102] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.107]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.104]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.102]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.119]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0989]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.103] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0853]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.115] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.11] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0988]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.108] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.107]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.101]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.101]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.125]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.107]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0988]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.102] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0956]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.113] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.111]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.101]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0961]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.103] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0986]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.102] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0973]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.103] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.1]  Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0981]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0915]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0916]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.104] Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.101]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.105]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.112]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.117]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.103]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0961]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.0957]Epoch 0:  63%|######2   | 411/655 [00:01<00:00, 469.89batch/s, train/train_loss_1=0.103] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.103]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0953]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0868]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0917]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.106] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.114]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0963]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0995]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.102] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0976]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0912]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.111] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.103]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0968]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.103] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.106]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0999]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.101] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.103]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.094]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.092]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0962]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0915]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0972]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.116] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.112]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.1]  Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0938]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.1]   Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0967]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0888]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.103] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.11] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0876]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.103] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0968]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.105] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.09] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.101]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.101]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.097]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.107]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.109]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.112]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.113]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.101]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0932]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.102] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.101]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.113]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.112]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0973]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0957]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.11]  Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.103]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0885]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.102] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.115]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.101]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.104]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0973]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0977]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.114] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0991]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.1]   Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0807]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.116] Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.117]Epoch 0:  73%|#######2  | 478/655 [00:01<00:00, 519.48batch/s, train/train_loss_1=0.0973]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0973]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.102] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.109]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.101]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.11] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.114]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0997]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0882]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.094] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.107]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.12] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0938]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.101] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.112]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.107]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0966]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.103] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.096]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0901]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.107] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.095]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.106]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0932]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.095] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.103]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.107]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.106]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0948]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0994]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.112] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.112]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.109]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0929]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.109] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0909]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0999]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.101] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.084]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.102]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.108]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.101]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0942]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0925]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.121] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.104]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.103]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0921]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.11]  Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0887]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.102] Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.112]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0998]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.0884]Epoch 0:  83%|########3 | 546/655 [00:01<00:00, 560.82batch/s, train/train_loss_1=0.104] Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.104]Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.0818]Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.0962]Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.0989]Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.099] Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.106]Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.102]Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.111]Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.0947]Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.12]  Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.0988]Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.0972]Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.102] Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.0771]Epoch 0:  83%|########3 | 546/655 [00:02<00:00, 560.82batch/s, train/train_loss_1=0.112] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.112]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0968]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.108] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.11] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.102]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0982]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.102] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.099]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0985]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0988]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.108] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.101]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0898]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0966]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0986]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.101] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.116]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.117]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.1]  Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0849]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.103] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0928]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.102] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.103]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0988]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0994]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.109] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0856]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0905]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.104] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0889]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0979]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.111] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.106]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0841]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0893]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0912]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.104] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.0953]Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.106] Epoch 0:  94%|#########3| 614/655 [00:02<00:00, 591.88batch/s, train/train_loss_1=0.108]Epoch 0:  94%|#########3| 614/655 [00:03<00:00, 591.88batch/s, train/train_loss_1=0.0848]                                                                                           0%|          | 0/655 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0806]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.108] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.096]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0963]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0962]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0973]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0975]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.105] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.095]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0852]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0874]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0999]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.104] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0954]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0941]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0951]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.107] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0964]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.109] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0741]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.098] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0931]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0989]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.111] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0869]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.112] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0971]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.122] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0951]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0938]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0934]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0903]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0929]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0986]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0945]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0895]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.105] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0922]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0934]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0984]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0918]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0992]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0987]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0983]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0908]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0915]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0958]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0966]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.109] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0968]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.121] Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.105]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0989]Epoch 1:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.118] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.118]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.106]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.09] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.087]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.106]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0874]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0946]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0866]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0989]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0926]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.102] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0955]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0993]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0858]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0952]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0972]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.108] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.101]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0803]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.135] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0987]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.109] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0943]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.105] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.101]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.102]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.101]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.103]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0973]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.103] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0947]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0985]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0848]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0918]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.115] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.113]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.104]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.107]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0942]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0931]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.102] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.105]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0798]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0994]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0987]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.101] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0952]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0949]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.104] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0972]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0825]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.1]   Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0866]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0979]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.101] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.112]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0917]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0957]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.087] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.108]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0934]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0746]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.108] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.108]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.102]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.116]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0972]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.108] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.107]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0838]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.106] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.0902]Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.107] Epoch 1:  11%|#1        | 73/655 [00:00<00:00, 718.60batch/s, train/train_loss_1=0.101]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.101]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0981]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.105] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0924]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0991]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.102] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.103]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0983]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.116] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.107]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0873]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.11]  Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.102]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.109]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0996]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.104] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.102]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0951]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.11]  Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0971]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0933]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.095] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0953]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0903]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0875]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0983]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0938]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0898]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.11]  Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.105]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.107]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.091]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.102]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.111]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0955]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.103] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.102]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.09] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0937]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.101] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.104]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0909]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.112] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0981]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.109] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.109]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.1]  Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.108]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0839]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0926]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0846]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.103] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0994]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0949]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0994]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.108] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0936]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0943]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0824]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.1]   Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0886]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.107] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0988]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0931]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.109] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.105]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.108]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0961]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.111] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0965]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.106] Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0934]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0938]Epoch 1:  22%|##2       | 146/655 [00:00<00:00, 723.00batch/s, train/train_loss_1=0.0945]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0945]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.096] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.097]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.102]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0943]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.1]   Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.119]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0899]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0831]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.106] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.105]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0937]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0854]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0806]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0962]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0927]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.109] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0925]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0823]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.107] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0897]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.098] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.108]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.099]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.102]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0929]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.102] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.091]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.101]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.091]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0921]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0941]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.111] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.101]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.109]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0834]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.099] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0772]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.091] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0932]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.114] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.105]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.082]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.102]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0974]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.104] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0954]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0901]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0926]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.093] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.102]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.104]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.093]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0939]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0921]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.118] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0866]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0879]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0893]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0941]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.108] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0875]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.122] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0785]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.09]  Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.11]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0916]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.094] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0971]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.109] Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0849]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0989]Epoch 1:  33%|###3      | 219/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0995]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0995]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.109] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0916]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.1]   Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.102]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.104]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.11] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.109]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0997]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0925]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.098] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.102]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0812]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0898]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.105] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.101]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0902]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0943]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0897]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0882]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0978]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.109] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0965]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.109] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0924]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.125] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0965]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.106] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0962]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.097] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0984]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0985]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0986]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0987]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.107] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.102]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0993]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0917]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0937]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.106] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.103]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0857]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.103] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0963]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.106] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.108]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0853]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0783]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0979]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.106] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.1]  Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0949]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.106] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0924]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0924]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0933]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0919]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.104] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.084]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.104]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0953]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0806]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0948]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0865]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.095] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0967]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.102] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0923]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0979]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.107] Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.105]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0926]Epoch 1:  44%|####4     | 291/655 [00:00<00:00, 715.01batch/s, train/train_loss_1=0.0952]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0952]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.111] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.114]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0913]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.102] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.107]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0951]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.102] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0834]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0916]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0808]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.1]   Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0925]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.122] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0945]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.113] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.115]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.107]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.11] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.102]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.105]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.101]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0936]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.101] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0985]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0937]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0933]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.103] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0944]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.105] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0973]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.104] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0967]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.102] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.092]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.112]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0851]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0983]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.095] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.11] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.113]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.1]  Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.105]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.106]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.106]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0882]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.104] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0888]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0825]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.1]   Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0963]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.091] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0931]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0963]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0978]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.102] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0962]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.107] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0855]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0986]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0971]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.106] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.105]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.112]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.117]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0948]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.085] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0944]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.104] Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.101]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.108]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0864]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0992]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0967]Epoch 1:  55%|#####5    | 363/655 [00:00<00:00, 711.17batch/s, train/train_loss_1=0.0875]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0875]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0912]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.104] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0981]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0849]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0989]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.103] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.104]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.101]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0929]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.104] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.101]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.104]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0868]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.1]   Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0825]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.116] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0925]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0874]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.092] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.105]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0961]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0991]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.115] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0886]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0986]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0992]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.106] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.106]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0859]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0879]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0935]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.106] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.095]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.09] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.101]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.114]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0991]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0909]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0929]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0862]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.103] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0933]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0995]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.112] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0895]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0951]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.108] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.08] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.09]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0949]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.082] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0925]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.107] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0993]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0973]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0957]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0944]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.114] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0975]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.103] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0962]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0982]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0973]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0841]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0998]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0995]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0969]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.108] Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.101]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0946]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0942]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0858]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0798]Epoch 1:  67%|######6   | 437/655 [00:00<00:00, 719.35batch/s, train/train_loss_1=0.0993]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0993]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0978]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0966]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.101] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.101]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0948]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0898]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0925]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.103] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0938]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0919]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.101] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.11] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.107]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0932]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.102] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.089]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0952]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.101] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.1]  Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0865]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0977]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0932]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.101] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0874]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.096] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.103]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0966]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0823]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.1]   Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0988]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0804]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0977]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.127] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0934]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.105] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0949]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0926]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0899]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.106] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0915]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0925]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0808]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.102] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0949]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0914]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0968]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0854]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0878]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0982]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0994]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0933]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0938]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.101] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0849]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0843]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.106] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.107]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0779]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0954]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.103] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.113]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0937]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.102] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0926]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.09]  Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0976]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0977]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0864]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0909]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0811]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.117] Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0922]Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.1]   Epoch 1:  78%|#######8  | 511/655 [00:00<00:00, 723.33batch/s, train/train_loss_1=0.0837]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0837]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0882]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.079] Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.087]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.102]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0975]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0959]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.11]  Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.101]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0944]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0982]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0853]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.1]   Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0819]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0835]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.105] Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.108]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0965]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.108] Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0993]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0849]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0964]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0977]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0967]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0798]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0859]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.102] Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.104]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.107]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0788]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0887]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0936]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0865]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.107] Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0849]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0989]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0903]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.104] Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.106]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0911]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0833]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.104] Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0977]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0873]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.11]  Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0898]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0991]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.11]  Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0956]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0944]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0842]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0926]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0924]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0902]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0917]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0813]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0977]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.105] Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.103]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.099]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0706]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0852]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.112] Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0914]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.094] Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.108]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0907]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0853]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0966]Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.103] Epoch 1:  89%|########9 | 585/655 [00:00<00:00, 727.77batch/s, train/train_loss_1=0.0972]                                                                                           0%|          | 0/655 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0996]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0963]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0903]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0928]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0907]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0963]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0777]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.108] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.108]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0899]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0966]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0785]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.095] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0929]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0898]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0959]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0931]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0965]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0851]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.097] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0825]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.104] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0887]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0945]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0943]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0889]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0874]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.1]   Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0951]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0967]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0902]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.108] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0899]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0894]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0843]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0973]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.089] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0973]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0967]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0971]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.094] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0835]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0945]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0878]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0897]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0848]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0824]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0883]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0912]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0812]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0969]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0981]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0883]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.099] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.11] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0841]Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 2:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.106]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.106]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.113]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0949]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.101] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.102]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0887]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.103] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0944]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0915]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0964]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.108] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.101]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.101]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0864]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.101] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.101]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.101]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0967]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.106] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0907]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0998]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0974]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0847]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0948]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.092] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.099]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0884]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0965]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0881]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0864]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0936]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0918]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0838]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0967]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.102] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.108]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0858]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.103] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0903]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0928]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0847]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0903]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0943]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0955]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0947]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0848]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0896]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0863]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0795]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0965]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.098] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0889]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0982]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0802]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.104] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.102]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0883]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0895]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0957]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0976]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0946]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0716]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0767]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0887]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0957]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0861]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0933]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.101] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0884]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.105] Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0909]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0892]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0988]Epoch 2:  10%|#         | 67/655 [00:00<00:00, 668.55batch/s, train/train_loss_1=0.0899]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0899]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0959]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.096] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0935]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0974]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.122] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.11] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0954]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.103] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0846]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.104] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0955]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0963]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.102] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0984]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0918]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0915]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0884]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0975]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0946]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0805]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.103] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.087]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0879]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0881]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0983]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.1]   Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0801]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0826]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0924]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.102] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.102]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0948]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0978]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.102] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0903]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0969]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0955]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0931]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0897]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0909]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0989]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0995]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0835]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0954]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0875]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0964]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.103] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0882]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.082] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.098]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0831]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0956]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0882]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.105] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0963]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.103] Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0884]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0898]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0858]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0941]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0992]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0972]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0923]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0998]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0911]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0769]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0901]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0812]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0791]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.1]   Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0827]Epoch 2:  21%|##1       | 140/655 [00:00<00:00, 702.81batch/s, train/train_loss_1=0.0936]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0936]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0949]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0926]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0949]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0913]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.102] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0904]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.089] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0877]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0984]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.106] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0893]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0796]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0927]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.1]   Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0988]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.115] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0832]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0962]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0934]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0895]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0905]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0887]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0868]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.1]   Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.111]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.107]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0946]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.102] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0952]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.103] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0932]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0926]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0863]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0967]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.11]  Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.094]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.094]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0921]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0844]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.091] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.102]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0948]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.104] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.11] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0974]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.11]  Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0952]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0968]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.103] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.102]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.109]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.109]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0949]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0952]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0924]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.118] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0956]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0958]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.104] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0815]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.104] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0843]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.1]   Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0986]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0971]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0994]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0955]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0802]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.0893]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.109] Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.105]Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.1]  Epoch 2:  32%|###2      | 212/655 [00:00<00:00, 706.98batch/s, train/train_loss_1=0.107]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.107]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0774]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0997]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0905]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.102] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.089]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0955]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.106] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0972]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0922]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.104] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0971]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.101] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0779]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.104] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.109]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.105]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0965]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.106] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.094]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0947]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0879]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.098] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.102]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.104]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0998]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0807]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.101] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.104]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0965]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0846]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0971]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0768]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0833]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.101] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0919]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0969]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0916]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0961]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0947]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0963]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0898]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0826]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0933]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0825]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0961]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.111] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.101]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0958]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0876]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0835]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0799]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.084] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0817]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0858]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0939]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0845]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.107] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.113]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0985]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.114] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.113]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0917]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0908]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.101] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.11] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.075]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.11] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0996]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0951]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0978]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.103] Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0876]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0965]Epoch 2:  44%|####3     | 285/655 [00:00<00:00, 713.63batch/s, train/train_loss_1=0.0948]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0948]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.091] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.103]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0875]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0816]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0969]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0814]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0915]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0792]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0978]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0924]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0984]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0927]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.1]   Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.106]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0859]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0936]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0835]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.093] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.101]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.117]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0906]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0889]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0984]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0923]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0913]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.106] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0816]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0945]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0873]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0923]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0909]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0931]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0958]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0928]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0998]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0941]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.101] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0993]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0914]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0853]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0936]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0972]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.109] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.085]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0912]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0966]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0977]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.102] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0939]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0976]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.088] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0924]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.101] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0941]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0958]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0897]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0852]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.093] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0991]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.092] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0942]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0943]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0944]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0933]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.108] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.115]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0986]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.103] Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.102]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0747]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0824]Epoch 2:  55%|#####4    | 359/655 [00:00<00:00, 719.48batch/s, train/train_loss_1=0.0914]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0914]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0846]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0976]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.108] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0997]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0809]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0877]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0877]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.096] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0876]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0882]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.116] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.11] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0955]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.089] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0964]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0816]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0893]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.102] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0948]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0951]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.078] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0975]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0888]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0897]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0804]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.1]   Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.112]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.105]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0976]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.106] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0967]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0936]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0936]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0841]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0939]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0987]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0959]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.086] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0924]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.099] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0938]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0843]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0812]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0791]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0972]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0947]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.117] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.096]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.101]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0897]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0823]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0949]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0991]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0938]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.094] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0944]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0939]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0957]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0879]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0948]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0907]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0935]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0781]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0944]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.103] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0954]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.094] Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.104]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.104]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0845]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.0967]Epoch 2:  66%|######5   | 431/655 [00:00<00:00, 716.22batch/s, train/train_loss_1=0.102] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.102]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.105]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.106]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0847]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0911]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.101] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0766]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0899]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0981]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0995]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0877]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.083] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0834]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0996]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0993]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0966]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.1]   Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0855]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0908]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0919]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0938]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.111] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0922]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0887]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0973]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.101] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0976]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0956]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0946]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.104] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0909]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.106] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.103]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.106]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.103]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0882]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0977]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.077] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0862]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.091] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0936]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.109] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.101]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.115]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.092]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.076]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.109]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0803]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.1]   Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0967]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0971]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0911]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0936]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0926]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.113] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0937]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0846]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0846]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0831]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0932]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0948]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0713]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0957]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.106] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0991]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.102] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0968]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0935]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.108] Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.116]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0886]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0924]Epoch 2:  77%|#######6  | 503/655 [00:00<00:00, 709.93batch/s, train/train_loss_1=0.0945]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0945]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0863]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.088] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.109]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0796]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0921]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0952]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0877]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.097] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0842]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.109] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0703]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0932]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0856]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0997]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0876]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0872]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.105] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0925]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.101] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0904]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0996]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0919]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0878]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0803]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0965]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0866]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0821]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.1]   Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0947]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.091] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.105]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0997]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0764]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0887]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.105] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0918]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0897]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.087] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0888]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.107] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.104]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0905]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0779]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0839]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0874]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0972]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0985]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.093] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0935]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.106] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.097]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.112]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0955]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.102] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.11] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0841]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.103] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0961]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0804]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.107] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0903]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0808]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0901]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.109] Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0951]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0927]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0948]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0939]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0816]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0878]Epoch 2:  88%|########7 | 575/655 [00:00<00:00, 709.68batch/s, train/train_loss_1=0.0781]Epoch 2:  99%|#########8| 646/655 [00:00<00:00, 705.96batch/s, train/train_loss_1=0.0781]Epoch 2:  99%|#########8| 646/655 [00:00<00:00, 705.96batch/s, train/train_loss_1=0.0898]Epoch 2:  99%|#########8| 646/655 [00:00<00:00, 705.96batch/s, train/train_loss_1=0.0946]Epoch 2:  99%|#########8| 646/655 [00:00<00:00, 705.96batch/s, train/train_loss_1=0.0853]Epoch 2:  99%|#########8| 646/655 [00:00<00:00, 705.96batch/s, train/train_loss_1=0.0991]Epoch 2:  99%|#########8| 646/655 [00:00<00:00, 705.96batch/s, train/train_loss_1=0.0861]Epoch 2:  99%|#########8| 646/655 [00:00<00:00, 705.96batch/s, train/train_loss_1=0.0929]Epoch 2:  99%|#########8| 646/655 [00:00<00:00, 705.96batch/s, train/train_loss_1=0.0871]Epoch 2:  99%|#########8| 646/655 [00:00<00:00, 705.96batch/s, train/train_loss_1=0.0923]Epoch 2:  99%|#########8| 646/655 [00:00<00:00, 705.96batch/s, train/train_loss_1=0.0816]                                                                                           0%|          | 0/655 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0884]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0854]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0753]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.101]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0858]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0829]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0985]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.097] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0874]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0836]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0979]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.095] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0985]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0944]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.094] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0842]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0921]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0977]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0965]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0884]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0828]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.09]  Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0966]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0877]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0909]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.104] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0974]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0791]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0965]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0984]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0839]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0956]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.095] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.111]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0802]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0935]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0751]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0997]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0956]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0868]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.095] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0902]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0844]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0994]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.1]   Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0866]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0925]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0929]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0871]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0995]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0873]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0976]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0792]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0837]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0818]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.106] Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0884]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0864]Epoch 3:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0921]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0921]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0914]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0857]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0923]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0901]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0935]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0847]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0986]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.106] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0978]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0893]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0946]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.106] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0892]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0972]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0859]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0863]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0773]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0937]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0983]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0927]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0939]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0925]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.079] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.108]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0986]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0942]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.104] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.109]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.102]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0928]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.083] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0798]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0812]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0921]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.095] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0851]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0826]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0907]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.116] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.079]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0877]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.105] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0909]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0888]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0934]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.098] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.103]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0827]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0778]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.104] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0949]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.101] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0918]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0914]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0947]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.088] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0812]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0963]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0854]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0998]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.103] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0894]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0821]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0901]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0922]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0997]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.096] Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0995]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0985]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0981]Epoch 3:  10%|9         | 65/655 [00:00<00:00, 648.56batch/s, train/train_loss_1=0.0986]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0986]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0924]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0936]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0882]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.108] Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.104]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.102]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.086]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0984]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.094] Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.1]  Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0861]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0889]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0928]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0851]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.099] Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0927]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0821]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0859]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.09]  Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0901]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.102] Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0924]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0914]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0858]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0946]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.103] Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0915]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0851]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.107] Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0911]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0997]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0813]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0953]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0872]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0846]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.104] Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.104]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0918]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0815]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0987]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0988]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0912]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0935]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0937]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.1]   Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0818]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0872]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0882]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.1]   Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0951]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0767]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0984]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0886]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0839]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.102] Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0858]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0802]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.108] Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.102]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0849]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.102] Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0953]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0937]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0911]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0923]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.0822]Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.083] Epoch 3:  21%|##        | 136/655 [00:00<00:00, 679.78batch/s, train/train_loss_1=0.093]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.093]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0847]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0891]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0969]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0851]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0889]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0728]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0904]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0855]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0934]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0896]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0702]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0892]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0883]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0862]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0913]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.102] Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.096]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0835]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0838]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.105] Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0814]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0849]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.103] Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0822]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0817]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0996]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0966]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0923]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0933]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0755]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.104] Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0851]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0976]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0851]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0748]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.103] Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0853]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0994]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.087] Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0849]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0773]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0704]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.11]  Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0917]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0939]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0832]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0786]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0893]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0935]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0752]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0893]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0797]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.109] Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0884]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0904]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.104] Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.1]  Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0988]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0816]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0967]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.104] Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0913]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0982]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0947]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0966]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.103] Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0889]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0874]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0832]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0858]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0819]Epoch 3:  31%|###1      | 204/655 [00:00<00:00, 620.28batch/s, train/train_loss_1=0.0861]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0861]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.103] Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0891]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0869]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0897]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0974]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0884]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0975]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0959]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0867]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0996]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.1]   Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0834]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0874]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.1]   Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0856]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0694]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0859]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0963]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0924]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.116] Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0859]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0967]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0981]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.104] Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.104]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0837]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.104] Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0971]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.104] Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0872]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0886]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0911]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0933]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0931]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0991]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.103] Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0813]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0885]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0892]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0876]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.101] Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0795]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0958]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0959]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0972]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.102] Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.082]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0914]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0889]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0993]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.103] Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0823]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0881]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.091] Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0866]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0849]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0945]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0807]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0938]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0952]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0823]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0789]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0984]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0903]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0835]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0916]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0837]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0897]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0986]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0835]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.113] Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0842]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0937]Epoch 3:  42%|####2     | 276/655 [00:00<00:00, 654.70batch/s, train/train_loss_1=0.0824]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0824]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0913]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0835]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0972]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0729]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0781]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.105] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.113]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0985]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0776]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.111] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0784]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0872]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0964]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0937]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0935]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.106] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0942]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0947]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0903]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.11]  Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0959]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0903]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0921]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.095] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0937]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0921]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0862]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.091] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0788]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0991]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0925]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0913]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0822]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0938]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.103] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0984]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.103] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.104]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0809]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0844]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0828]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.106] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0914]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0916]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0745]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0837]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0797]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0802]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0758]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.102] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.095]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0913]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0955]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.112] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0957]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0927]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.102] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0877]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0961]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0821]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0994]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0931]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0955]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0894]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0844]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.104] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0988]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0909]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.098] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0779]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.0827]Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.092] Epoch 3:  53%|#####3    | 350/655 [00:00<00:00, 682.83batch/s, train/train_loss_1=0.102]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.102]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.092]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0777]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0767]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.103] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0879]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0916]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0876]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0907]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.109] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0827]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0864]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0859]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0925]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0869]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.099] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0904]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0977]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0884]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0862]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0947]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.089] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.107]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0947]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.109] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0973]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0847]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0874]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0903]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0929]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0913]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.093] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0903]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.104] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0927]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0905]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.083] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.11] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.11]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.087]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.092]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.08] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.1] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0964]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.105] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0845]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0961]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0926]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0843]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0983]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0858]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0904]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0934]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.114] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0813]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0936]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0936]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0964]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.09]  Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0912]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0997]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.1]   Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0921]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0942]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.113] Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0919]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0877]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0877]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0814]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0986]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0885]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0874]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0975]Epoch 3:  65%|######4   | 423/655 [00:00<00:00, 695.78batch/s, train/train_loss_1=0.0864]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0864]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0871]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0853]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0915]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0979]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.11]  Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0935]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0965]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0838]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0952]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0974]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0731]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.108] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0813]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0857]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0833]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0842]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0869]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.107] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0884]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0919]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0782]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0847]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0931]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.104] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0904]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0913]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0986]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0957]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.104] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0829]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0878]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.105] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0901]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0992]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0959]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0937]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0964]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0943]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0953]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0906]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0923]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0899]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0894]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.103] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0935]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0885]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0867]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.103] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0928]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.069] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0983]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0904]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0769]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.106] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0981]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0934]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.101] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0849]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.102] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0806]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0772]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.095] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0848]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0966]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0992]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.101] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0815]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0821]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.102] Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.108]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0837]Epoch 3:  76%|#######5  | 496/655 [00:00<00:00, 704.31batch/s, train/train_loss_1=0.0831]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0831]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.093] Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0817]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0985]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0946]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0929]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.11]  Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0837]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.085] Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0908]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0918]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0796]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0986]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0874]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0985]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0933]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0812]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0907]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.1]   Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0762]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.094] Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0921]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0792]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.111] Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0834]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0911]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0963]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0989]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0898]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0734]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0972]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.101] Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0926]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0845]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.101] Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0743]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0852]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.115] Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.102]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0804]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0885]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0861]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0797]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0845]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0858]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0842]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0901]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0941]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0973]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0942]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0915]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0873]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.103] Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.104]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0797]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0951]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.102] Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0802]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0863]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0882]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.093] Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0816]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.09]  Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0879]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0829]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.086] Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.1]  Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0955]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.08]  Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0827]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0826]Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.12]  Epoch 3:  87%|########6 | 568/655 [00:00<00:00, 708.91batch/s, train/train_loss_1=0.0954]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0954]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0899]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.1]   Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0933]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0784]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0814]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.102] Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0899]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0883]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0958]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0797]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0972]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0896]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0919]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0955]Epoch 3:  98%|#########7| 640/655 [00:00<00:00, 680.83batch/s, train/train_loss_1=0.0826]                                                                                           0%|          | 0/655 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.1]  Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.076]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0947]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0932]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0905]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0896]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0911]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0999]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0974]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0826]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0748]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0789]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0959]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0982]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0971]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0892]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.1]   Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0928]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0885]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.071] Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.088]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0889]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0903]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0802]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.108] Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0985]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0863]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.094] Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0862]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0803]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0924]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0889]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0808]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0879]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0835]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0916]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0962]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.105] Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0801]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0903]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0892]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0923]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0968]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0895]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0803]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.114] Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0859]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0805]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0831]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0963]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.107] Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0883]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.079] Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0716]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0816]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0866]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0976]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0842]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0923]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0989]Epoch 4:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0859]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0859]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0848]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.101] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0998]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0964]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0804]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.103] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0835]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0858]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0953]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0847]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0937]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0987]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0908]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0891]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.099] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0873]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0897]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0928]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0876]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0852]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0935]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0887]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0918]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0837]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0813]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0967]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0863]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.103] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.1]  Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0813]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.097] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0916]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0837]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.103] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0928]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.089] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.1]  Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0892]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0826]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0854]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0886]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0826]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0879]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0879]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0772]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.092] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0925]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0832]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.1]   Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0972]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0877]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0963]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.107] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0825]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0926]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0883]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.073] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0936]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0835]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0867]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.088] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0927]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.118] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.11] Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0907]Epoch 4:  10%|9         | 64/655 [00:00<00:00, 639.03batch/s, train/train_loss_1=0.0848]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0848]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.085] Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0948]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0795]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0868]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.1]   Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0809]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.095] Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0929]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.09]  Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0894]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0899]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0983]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0981]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0922]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0868]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0888]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0783]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0862]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0767]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0931]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0857]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0906]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0897]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0808]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.102] Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0916]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0986]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.084] Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.093]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0959]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0636]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0729]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.09]  Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0989]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0851]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0967]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0908]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.09]  Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0711]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0896]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0987]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.105] Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.107]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0909]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0883]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0792]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0923]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0911]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0897]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0852]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0772]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0934]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0919]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.104] Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.102]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.1]  Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0894]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0907]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0833]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0936]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0872]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.089] Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.104]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0816]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0771]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0903]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0885]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0909]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0888]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0821]Epoch 4:  20%|#9        | 130/655 [00:00<00:00, 647.58batch/s, train/train_loss_1=0.0797]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0797]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.093] Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0862]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0777]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0943]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0877]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0945]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0958]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0862]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0877]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0962]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0781]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0797]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0914]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0805]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.103] Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0792]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0826]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0743]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0829]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0953]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0921]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0971]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0801]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.089] Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0832]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0816]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0966]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0936]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0943]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.105] Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0845]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0784]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.081] Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0977]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0952]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0888]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0821]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.102] Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0887]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0942]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0921]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0647]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0868]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0982]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0957]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0969]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0937]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0896]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0939]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0887]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.103] Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0865]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0792]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0941]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0872]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0855]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0994]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0863]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0899]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0899]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0967]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0903]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0823]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.09]  Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0934]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.102] Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0806]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0974]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0952]Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.076] Epoch 4:  31%|###       | 201/655 [00:00<00:00, 674.40batch/s, train/train_loss_1=0.0839]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0839]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0805]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0958]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0923]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0761]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0848]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0888]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0967]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0938]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.101] Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.105]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0809]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0936]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0796]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.079] Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0904]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0928]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.108] Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0972]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0935]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0933]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0895]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0943]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0815]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0836]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.085] Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.103]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0907]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0857]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0797]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0871]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0865]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0885]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0921]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.098] Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0967]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0875]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.101] Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0828]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0983]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0875]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0907]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0774]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0968]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0833]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0829]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0879]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0821]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.117] Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0962]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0897]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0875]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0975]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0772]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0832]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0909]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.105] Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.104]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0819]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.105] Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0979]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0911]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0835]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.103] Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0985]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.093] Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0903]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0903]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0924]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0882]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.1]   Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0877]Epoch 4:  42%|####1     | 272/655 [00:00<00:00, 685.87batch/s, train/train_loss_1=0.0985]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0985]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0913]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0874]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0878]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0992]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.105] Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0904]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0861]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0997]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0778]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0793]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0827]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0986]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0994]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.1]   Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.083]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.104]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0822]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.086] Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0929]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0875]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.101] Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0909]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0903]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0905]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0982]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0945]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0903]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0759]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0843]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0887]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0799]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0894]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.105] Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.1]  Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0907]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0994]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.101] Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.095]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0877]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.103] Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0998]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0905]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.08]  Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0825]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0865]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0942]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0873]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0937]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.101] Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.1]  Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0978]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0997]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0941]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.08]  Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0927]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0944]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0911]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0838]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.084] Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0845]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0881]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0738]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0854]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0891]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0935]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0949]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0923]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0833]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0845]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.108] Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.0741]Epoch 4:  53%|#####2    | 344/655 [00:00<00:00, 695.11batch/s, train/train_loss_1=0.104] Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.104]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0834]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0954]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0982]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0915]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0856]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0968]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0986]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0885]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0991]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0927]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0862]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0916]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.079] Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.089]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0812]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0854]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0974]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0815]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0893]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0896]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0962]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0866]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0829]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0831]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0971]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.081] Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0762]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0947]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0865]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0964]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0977]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0947]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0828]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0873]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.108] Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0873]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0982]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0957]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0805]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0871]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.086] Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0866]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0842]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0909]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0933]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0929]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0853]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.104] Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0773]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0798]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0893]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0825]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0836]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0969]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0912]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0768]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.101] Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.089]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0842]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.109] Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0868]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0895]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0892]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0901]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0914]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0904]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0769]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0869]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0958]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0782]Epoch 4:  64%|######3   | 416/655 [00:00<00:00, 701.11batch/s, train/train_loss_1=0.0857]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0857]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0853]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0853]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0895]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0909]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0802]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0954]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0822]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0861]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0901]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0824]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0914]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.108] Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0811]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0946]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0925]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.105] Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.076]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0829]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.105] Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0869]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.097] Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0932]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0837]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.104] Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0883]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0944]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0948]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0898]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0823]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0857]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0841]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0801]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0906]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0962]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0939]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0868]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0822]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0849]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.095] Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0826]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0866]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.086] Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.079]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0935]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.085] Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0809]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0871]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0872]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0837]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0888]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0919]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0826]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0911]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0878]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0801]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0897]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0838]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0865]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0842]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.101] Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0881]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0833]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0784]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0827]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0909]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0979]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0905]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0874]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.092] Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0876]Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.111] Epoch 4:  74%|#######4  | 487/655 [00:00<00:00, 673.90batch/s, train/train_loss_1=0.0852]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0852]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0731]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0799]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0894]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.102] Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.081]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.111]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0908]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.079] Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0974]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0902]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0877]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0826]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0944]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0843]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0962]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0736]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.111] Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0938]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.096] Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0806]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0689]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.092] Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0751]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0909]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0906]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.11]  Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.08]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0928]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0869]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0895]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0769]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0877]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0916]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0847]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0871]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0886]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0922]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0968]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0937]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0885]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.095] Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0933]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.086] Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0843]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0931]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0952]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0828]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.104] Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.105]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.089]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.093]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0806]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0797]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0775]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0647]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0823]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0905]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0914]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0789]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0909]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0847]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0861]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0972]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0891]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.104] Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0961]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0819]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0808]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0873]Epoch 4:  85%|########5 | 559/655 [00:00<00:00, 685.42batch/s, train/train_loss_1=0.0879]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0879]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.105] Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0797]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0981]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.112] Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0946]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.104] Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.105]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0953]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0856]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.102] Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.109]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0833]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0974]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0756]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0906]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0967]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0982]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0795]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0866]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0806]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0806]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.081] Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0852]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0813]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0952]Epoch 4:  96%|#########6| 629/655 [00:00<00:00, 688.98batch/s, train/train_loss_1=0.0791]                                                                                           0%|          | 0/655 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0923]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0818]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0867]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0867]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0832]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0991]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0834]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.104] Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.104]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0951]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0976]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0829]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.112] Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0873]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0815]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0981]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0866]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0736]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0956]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0942]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0968]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0908]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0915]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0898]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0971]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0855]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0938]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0942]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0837]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0828]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0873]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0736]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0889]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0908]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0888]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0836]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0862]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.094] Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0999]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.085] Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0883]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0945]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0764]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0946]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0974]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0845]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0798]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0888]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0822]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0775]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0899]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0938]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0748]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0874]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0903]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0949]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0817]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0931]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0816]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0969]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0828]Epoch 5:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.104] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.104]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.082]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0855]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.102] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0787]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.085] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0764]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.113] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0982]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0803]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0815]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.086] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0952]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0924]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0786]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0958]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.107] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0904]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.09]  Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0874]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0986]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0879]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.091] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0867]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0898]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0909]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0909]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.082] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0966]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0888]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0897]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0827]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0742]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0908]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0894]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0944]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0882]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0832]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0947]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0911]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0965]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.075] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0876]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.089] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0802]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0826]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0998]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0884]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0922]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.103] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0861]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0998]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0819]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.088] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0788]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0845]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0986]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0808]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0927]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0875]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0888]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0904]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0854]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.089] Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.101]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0834]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0928]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0893]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0899]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0954]Epoch 5:  10%|#         | 66/655 [00:00<00:00, 656.57batch/s, train/train_loss_1=0.0754]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0754]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0993]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0886]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0788]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.111] Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0968]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.105] Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0971]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0792]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0876]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0973]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.107] Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.072]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0962]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0867]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0747]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0919]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0921]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.081] Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0887]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.075] Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0876]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0942]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0929]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0778]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0739]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.102] Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0887]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0762]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0937]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0919]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0992]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.08]  Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0819]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0918]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0684]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0909]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0804]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0734]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0792]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0928]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0798]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0915]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0901]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0907]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0729]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0974]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0984]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0713]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.088] Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0836]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0786]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0868]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.1]   Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.103]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0891]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0946]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.108] Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.081]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0786]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0935]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0806]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.089] Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.103]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0983]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0802]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0821]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0849]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0883]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0918]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0811]Epoch 5:  21%|##        | 136/655 [00:00<00:00, 676.79batch/s, train/train_loss_1=0.0825]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0825]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.082] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0943]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0712]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.103] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0913]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.101] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0856]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0952]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0657]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0871]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0909]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0854]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.091] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.109]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0856]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0797]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0893]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0927]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0915]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0932]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0885]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0898]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0861]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0799]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0875]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.088] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0855]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0812]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0787]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0884]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0893]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0907]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0846]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0932]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0894]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0924]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0815]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.068] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0778]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0796]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0739]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0905]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0832]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0988]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.102] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0947]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0775]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0916]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0933]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0931]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0924]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.101] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.11] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.088]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.098]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0806]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.106] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0896]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0839]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0676]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0921]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0855]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.095] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0887]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0825]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0935]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.104] Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0834]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0736]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0992]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0924]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0959]Epoch 5:  32%|###1      | 207/655 [00:00<00:00, 688.02batch/s, train/train_loss_1=0.0911]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0911]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0797]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.104] Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0778]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0845]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0915]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0936]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0845]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0807]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0848]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0907]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0847]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0926]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0737]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0886]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0842]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0976]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0846]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0898]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.101] Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0949]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0726]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0928]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0945]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0689]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0806]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0939]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0907]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0871]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0945]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0684]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0881]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0926]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0914]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0985]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0868]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0916]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0847]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0785]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0905]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0878]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0862]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0842]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0756]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0699]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0879]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0801]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0925]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0782]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0689]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0973]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.1]   Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0907]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0931]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0821]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0721]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0862]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0784]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0765]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0861]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0768]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0835]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0883]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0891]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.107] Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0753]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0694]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.101] Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0921]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0801]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0949]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0854]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.0966]Epoch 5:  43%|####2     | 280/655 [00:00<00:00, 701.22batch/s, train/train_loss_1=0.102] Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.102]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0921]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0794]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0893]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0853]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0934]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.101] Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0805]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0827]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0809]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.101] Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0775]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0914]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.083] Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0908]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0795]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0745]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0802]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0857]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.104] Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0887]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0798]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0874]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0869]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0905]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0843]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0805]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0777]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0882]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0975]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0976]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0978]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0898]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.094] Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0939]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0858]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0883]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0808]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0863]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0967]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0862]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0878]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0884]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0923]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0975]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0914]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0838]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.094] Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0871]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0939]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0955]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.093] Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0921]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0653]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0845]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0937]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0917]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0865]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0726]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0855]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0969]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0852]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0748]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0882]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0885]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0897]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0985]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0885]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0917]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.106] Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.107]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0904]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0938]Epoch 5:  54%|#####3    | 353/655 [00:00<00:00, 711.00batch/s, train/train_loss_1=0.0745]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0745]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0915]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0998]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0947]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0788]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0814]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.109] Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0865]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0712]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0834]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0899]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0784]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0879]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0982]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0632]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0867]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.117] Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0808]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0913]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0892]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0958]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0833]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0823]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.1]   Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0932]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.101] Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0765]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0867]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0997]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.071] Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0823]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0779]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0916]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.094] Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.094]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0851]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.094] Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0941]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0724]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.08]  Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0885]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0912]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0971]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.067] Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0949]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0852]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0985]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0864]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0933]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0835]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0929]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0996]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0903]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0877]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.087] Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.101]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0942]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0952]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.092] Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0975]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0938]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0794]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0998]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0907]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0811]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.113] Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.101]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0838]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0832]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0945]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.085] Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0787]Epoch 5:  65%|######5   | 426/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0961]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0961]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0864]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.111] Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0858]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0992]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.106] Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.09] Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0904]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0844]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0857]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0961]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0802]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0758]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0794]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0874]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.078] Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0939]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0932]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0892]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.085] Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0793]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0943]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0788]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0797]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0937]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0818]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0936]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.106] Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0953]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0824]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0757]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0931]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0706]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0954]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0938]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0785]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0843]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0919]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0857]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0854]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0967]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0914]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0874]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0953]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0869]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0927]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0829]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0871]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0949]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0814]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0949]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.102] Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0969]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.092] Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0869]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0838]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0818]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0939]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.102] Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0877]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0898]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0866]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0944]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0784]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.088] Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0888]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0861]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0825]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0834]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0824]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0829]Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.107] Epoch 5:  76%|#######6  | 498/655 [00:00<00:00, 715.99batch/s, train/train_loss_1=0.0856]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0856]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0885]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0948]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.109] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0926]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.077] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.109]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0822]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0814]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0889]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0864]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0843]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0859]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.102] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.1]  Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0882]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.087] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0823]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.113] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0914]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0741]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0823]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.084] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.101]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0924]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0875]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0857]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0959]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0692]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0909]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.102] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0812]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0909]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0948]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0866]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0784]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0996]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.07]  Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.1] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0918]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0909]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0695]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.106] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0911]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0955]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0871]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0871]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0853]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.101] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0961]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0915]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0836]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0942]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0869]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0878]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.104] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0866]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0836]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.103] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0912]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0849]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.08]  Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0805]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0985]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0793]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0684]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0764]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0847]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0971]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0849]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0979]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0843]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0877]Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.111] Epoch 5:  87%|########7 | 570/655 [00:00<00:00, 716.96batch/s, train/train_loss_1=0.0691]Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.0691]Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.0733]Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.103] Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.0852]Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.082] Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.0947]Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.0889]Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.0783]Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.0759]Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.0898]Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.0845]Epoch 5:  98%|#########8| 644/655 [00:00<00:00, 721.83batch/s, train/train_loss_1=0.0937]                                                                                           0%|          | 0/655 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0953]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0842]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0854]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0994]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0976]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.084] Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0833]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0817]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0949]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0825]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0894]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0795]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0834]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0848]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0945]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0917]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0952]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.089] Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0707]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0942]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0781]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0977]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0925]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0798]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0704]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0719]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.105] Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0863]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0806]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0894]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0891]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0945]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0794]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0873]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0925]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0975]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.069] Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0873]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.082] Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.086]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0892]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0849]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.077] Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0892]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0914]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0864]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0963]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0857]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0849]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0952]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0801]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0786]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0912]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0885]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0877]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0787]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0762]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0965]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0863]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0874]Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.094] Epoch 6:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.083]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.083]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0904]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0793]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0813]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0869]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0883]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0877]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0735]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.102] Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0796]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0907]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0835]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0978]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0847]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0868]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0948]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0848]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0911]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0883]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0885]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0943]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0895]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0815]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0918]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0922]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0798]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0811]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0781]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0823]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0655]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0963]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0833]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0904]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0936]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.105] Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0807]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0854]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0851]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0906]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0939]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0767]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0832]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0848]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0789]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0843]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0852]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.084] Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0945]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.081] Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0705]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0841]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0896]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.084] Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0886]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0932]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0866]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0984]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.1]   Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0909]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0909]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.109] Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0779]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.101] Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0917]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0832]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0945]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0998]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0936]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0886]Epoch 6:  10%|9         | 65/655 [00:00<00:00, 645.78batch/s, train/train_loss_1=0.0878]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0878]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0873]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0854]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0834]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0804]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0871]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0872]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.101] Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.117]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.085]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0842]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0761]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0934]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0823]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.087] Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0892]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0907]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.103] Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0906]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0769]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0945]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0886]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0862]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0743]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0914]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0806]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0848]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0918]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0986]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.095] Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0736]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0819]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0882]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0853]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0828]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.106] Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0984]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0807]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0928]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0961]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0867]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.087] Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0929]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0899]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0818]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0993]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0824]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.105] Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0781]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0942]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.086] Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0759]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0848]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0923]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0799]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0771]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0814]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0881]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0909]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0875]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.118] Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0682]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0872]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0989]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0748]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0876]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0887]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.11]  Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0852]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0935]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0671]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0787]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0701]Epoch 6:  20%|##        | 134/655 [00:00<00:00, 671.22batch/s, train/train_loss_1=0.0903]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0903]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.086] Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0941]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0833]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0931]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0752]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0806]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0892]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0848]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0847]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0776]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0833]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.077] Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0846]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0737]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0831]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0934]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.103] Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0969]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0984]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.084] Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0841]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.097] Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0789]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0845]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0847]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.102] Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0946]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0865]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0872]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0887]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0877]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0777]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0884]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0957]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0779]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0873]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0877]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0887]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0874]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0801]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0932]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0791]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0796]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0775]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0849]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0917]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0847]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0852]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0858]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0938]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0651]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0826]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0978]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0996]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.094] Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0881]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0879]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0846]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0702]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0767]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0762]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0632]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0915]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0852]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.108] Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0941]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0938]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0864]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0937]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.102] Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0908]Epoch 6:  32%|###1      | 207/655 [00:00<00:00, 697.47batch/s, train/train_loss_1=0.0831]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0831]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0751]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0876]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0885]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0976]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0885]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0953]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0927]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0861]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0966]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0916]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0872]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0743]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0953]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0884]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0808]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.083] Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0898]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0882]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0818]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0931]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0962]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0884]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.095] Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0689]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0833]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0946]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0961]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0829]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0769]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.088] Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0768]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0851]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0787]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0811]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.08]  Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0824]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.104] Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.102]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0825]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0737]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0783]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.096] Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0724]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0921]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0957]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0884]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0812]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0749]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0793]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.101] Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0905]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0744]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0876]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0995]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0772]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.101] Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0786]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0814]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.089] Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0806]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0751]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0962]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0996]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0975]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0769]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0923]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.104] Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0913]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0877]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0963]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0958]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0711]Epoch 6:  43%|####2     | 279/655 [00:00<00:00, 704.85batch/s, train/train_loss_1=0.0736]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0736]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.1]   Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0822]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0772]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0865]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0875]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0975]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0848]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0852]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.074] Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.09] Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0871]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0977]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0854]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.086] Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.087]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0949]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0928]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0901]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0783]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0838]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.097] Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0821]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0948]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0928]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0904]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0938]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0751]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0798]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0858]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0972]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.08]  Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0852]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0841]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0895]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0887]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0905]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0894]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0842]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0861]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0827]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0836]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.091] Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0789]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0938]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0923]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0841]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0913]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0941]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0929]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0798]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.108] Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.087]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0814]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0797]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0827]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0914]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0828]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0709]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0754]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0838]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0903]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.1]   Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0879]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0873]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0899]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0852]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.078] Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0854]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0959]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0752]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0785]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0889]Epoch 6:  54%|#####3    | 352/655 [00:00<00:00, 711.45batch/s, train/train_loss_1=0.0947]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0947]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0942]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0693]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0881]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0893]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0981]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0905]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0947]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0903]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.102] Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0937]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0961]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0913]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0848]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0925]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0921]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0913]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0822]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.091] Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0923]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0783]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0855]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0908]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0948]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0719]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0806]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0784]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0965]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.105] Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0908]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0991]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0996]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0826]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.105] Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0889]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0857]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0871]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0802]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0851]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0855]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0918]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0763]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0892]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.102] Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.082]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0819]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0884]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0816]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0869]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0976]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0777]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0857]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0968]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.1]   Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0894]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0916]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0803]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0903]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0954]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0827]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.109] Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0774]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0913]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0895]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0805]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.102] Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0793]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0931]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0774]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0911]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0842]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.0793]Epoch 6:  65%|######4   | 425/655 [00:00<00:00, 714.47batch/s, train/train_loss_1=0.098] Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.098]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0869]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0979]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.11]  Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0907]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0858]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0855]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0725]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0752]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0962]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0848]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0847]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0745]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0668]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0873]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0964]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0905]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0755]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0834]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0939]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0935]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0892]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0902]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0884]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0856]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0835]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0776]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.076] Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0868]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0841]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0952]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0974]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0953]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.093] Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0952]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0871]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0968]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0675]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0993]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0769]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.08]  Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.095]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0908]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0908]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0786]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0902]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0924]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0771]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0861]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0916]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0901]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0959]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0902]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0878]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0831]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0908]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0806]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0869]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0792]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0854]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0826]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0892]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0875]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0967]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0913]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0782]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0933]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.105] Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0968]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0823]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.092] Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.082]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0905]Epoch 6:  76%|#######5  | 497/655 [00:00<00:00, 715.30batch/s, train/train_loss_1=0.0888]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0888]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0898]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.103] Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0922]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0894]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0838]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0961]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0979]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0859]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0767]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0851]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0765]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0788]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0833]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0915]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0876]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0876]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.096] Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0828]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.066] Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0999]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0918]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0907]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0882]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0816]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.091] Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0918]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0866]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0877]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0867]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0919]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0911]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.109] Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0783]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0979]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0705]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0927]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0911]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0939]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.086] Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0838]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0893]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0692]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0852]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0979]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0992]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0921]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0876]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.101] Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0909]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0962]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0654]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.089] Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.074]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0974]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.088] Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0802]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0967]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0812]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0795]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0747]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0853]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0951]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0793]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.079] Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0837]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0921]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0901]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0924]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0893]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0883]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.0874]Epoch 6:  87%|########7 | 570/655 [00:00<00:00, 717.45batch/s, train/train_loss_1=0.102] Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.102]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0789]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0961]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0726]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0792]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0927]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0756]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0934]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.107] Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0905]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0938]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0738]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0854]Epoch 6:  98%|#########8| 642/655 [00:00<00:00, 698.73batch/s, train/train_loss_1=0.0807]                                                                                           0%|          | 0/655 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.076]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.103]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0825]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0919]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0964]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0925]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0579]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0894]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0798]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.076] Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0945]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0787]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0832]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0916]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0702]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0779]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0762]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0991]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0948]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.093] Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.101] Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0942]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0942]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0941]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.083] Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0735]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.086] Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0885]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0805]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0803]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0932]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0813]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0802]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0721]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0937]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0885]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0998]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0764]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.1]   Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0787]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0931]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0887]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.08]  Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.114]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.107]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0785]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0812]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0999]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0762]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0804]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0817]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.087] Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.092]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0965]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.082] Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0888]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.104] Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0965]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0954]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0705]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0938]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.084] Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0899]Epoch 7:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0962]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0962]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.101] Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0939]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0758]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0797]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0906]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0906]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0907]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0819]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0838]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0956]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0851]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.082] Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0871]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0811]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0826]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.086] Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0772]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0796]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0748]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0996]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0887]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0853]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0817]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.101] Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0953]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0813]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0892]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0739]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0872]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0819]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.083] Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0917]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0676]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0747]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0924]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0902]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0826]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0811]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0872]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0796]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0888]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0959]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0762]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0822]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0868]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0862]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0949]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0885]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.106] Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0912]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0886]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0898]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0835]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0712]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0905]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0852]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0844]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0766]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0834]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0867]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0936]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0776]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0866]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0806]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0909]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0662]Epoch 7:  10%|#         | 67/655 [00:00<00:00, 665.41batch/s, train/train_loss_1=0.0774]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0774]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0861]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0781]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0718]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0882]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0833]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0839]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0948]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.097] Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0814]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0897]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0868]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.073] Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.067]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0802]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0901]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0847]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0888]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.085] Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0727]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0905]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0857]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0979]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0829]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0718]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.089] Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0859]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0877]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.08]  Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0871]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0924]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.095] Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.1]  Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0775]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0933]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0746]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0775]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0874]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0867]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.088] Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0943]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.104] Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.087]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0792]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0885]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0864]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0805]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0787]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0898]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0836]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.104] Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0965]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0878]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0888]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0879]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0734]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0775]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0861]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.078] Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0903]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.104] Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0887]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0937]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0845]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.102] Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0789]Epoch 7:  20%|##        | 134/655 [00:00<00:00, 657.85batch/s, train/train_loss_1=0.0877]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0877]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0959]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0847]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0894]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0944]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0761]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0897]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0917]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0969]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0902]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0839]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0875]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0702]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0833]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.102] Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0974]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0713]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0951]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.087] Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0719]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0651]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0759]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0753]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.086] Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.083]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.109]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.088]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0911]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.079] Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0994]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0798]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0827]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0858]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.1]   Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0924]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0838]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0976]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.086] Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0945]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0914]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0758]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0967]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0807]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0886]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0832]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0904]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0729]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.082] Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0872]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0785]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0839]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0941]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0969]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0807]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0912]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0789]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0984]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0871]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0749]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0898]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0876]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.104] Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0815]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.094] Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.103]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0821]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0893]Epoch 7:  31%|###       | 200/655 [00:00<00:00, 658.79batch/s, train/train_loss_1=0.0962]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0962]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0971]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.093] Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0936]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0814]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0871]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0961]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0837]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0892]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.082] Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0854]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0814]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0922]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0826]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0839]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0738]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0872]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0933]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0759]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0776]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0722]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0667]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0777]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0772]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.101] Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0951]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0778]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0979]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0999]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0816]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0885]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0879]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0927]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0742]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0967]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0879]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.096] Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0866]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0862]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0847]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0885]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0833]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0897]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0723]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0875]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.09]  Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0905]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0811]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0941]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0834]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.101] Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0765]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0853]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0897]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0838]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0899]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0805]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0947]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.078] Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0962]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0935]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0862]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0932]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0806]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0926]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0826]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0865]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.083] Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0839]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0749]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.0897]Epoch 7:  41%|####      | 267/655 [00:00<00:00, 660.50batch/s, train/train_loss_1=0.091] Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.091]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.104]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.103]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0915]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0771]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0867]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.106] Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.084]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0832]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.091] Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0895]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0899]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.102] Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0797]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0884]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0835]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0794]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.083] Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0789]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.107] Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0957]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0838]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.092] Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0834]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.103] Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0999]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0702]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0849]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0977]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0878]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0885]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0987]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0859]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0783]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0834]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0806]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0759]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0881]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0874]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0907]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0929]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0877]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.08]  Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0857]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.087] Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0835]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0836]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0821]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0872]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.085] Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0865]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0709]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0982]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0883]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0847]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0873]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0894]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0793]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0713]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.08]  Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0912]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0856]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0964]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0891]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0819]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0873]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.102] Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0841]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0693]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0837]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0828]Epoch 7:  52%|#####1    | 338/655 [00:00<00:00, 677.39batch/s, train/train_loss_1=0.0913]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0913]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0805]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0747]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0862]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0701]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.079] Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0812]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0823]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0838]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0956]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0892]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0811]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0848]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0871]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0756]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0916]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0858]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0845]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.104] Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.105]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0925]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.079] Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0977]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0819]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.078] Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0901]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0841]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0747]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0863]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0776]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0799]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0965]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0745]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0963]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0845]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0751]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0883]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0882]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0909]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.083] Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0842]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0744]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0951]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0952]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.088] Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.082]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0925]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.083] Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.105]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0932]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0892]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0984]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0883]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.089] Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0809]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0814]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0734]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0862]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0863]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.081] Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0841]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0625]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0885]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0963]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0987]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0828]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0795]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0733]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0798]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.078] Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0895]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0912]Epoch 7:  62%|######2   | 409/655 [00:00<00:00, 687.81batch/s, train/train_loss_1=0.0955]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0955]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0834]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0794]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0723]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.101] Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0824]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0934]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0835]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0864]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0793]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0837]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.086] Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0837]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0899]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0865]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0939]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0867]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0951]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.079] Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0783]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0741]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0804]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0905]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0909]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0769]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0791]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0807]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0903]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0832]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0819]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0864]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0845]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0825]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0782]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0843]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0689]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0843]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0935]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.087] Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0936]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0905]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0765]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0893]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0919]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0789]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0931]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0764]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0882]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.092] Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0956]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0739]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0929]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0801]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0834]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0929]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.101] Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0833]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0849]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0973]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0897]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0788]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.084] Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0856]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0887]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.087] Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0797]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0854]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0792]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0812]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0734]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.088] Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.081]Epoch 7:  73%|#######3  | 481/655 [00:00<00:00, 698.14batch/s, train/train_loss_1=0.0834]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0834]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0796]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0875]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0789]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0734]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0938]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0828]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0785]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0914]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0836]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.083] Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0791]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0947]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0865]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0862]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.081] Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0905]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0859]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.107] Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0826]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0728]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.105] Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0782]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.068] Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0849]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0835]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0729]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0803]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0856]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0906]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0951]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0812]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0978]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0749]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0964]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0887]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0843]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0833]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0843]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0797]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0906]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0877]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0888]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0761]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0855]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0707]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0969]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0893]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.074] Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.076]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0894]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0847]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.073] Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0734]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.103] Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.092]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0766]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0835]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0731]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0805]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0808]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0922]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0919]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.093] Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0867]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0868]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0832]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0887]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0781]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.086] Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0802]Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.086] Epoch 7:  84%|########4 | 553/655 [00:00<00:00, 703.01batch/s, train/train_loss_1=0.0796]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0796]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0861]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0863]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0898]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0934]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0852]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0849]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0888]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0949]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0924]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.097] Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.101]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0823]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0912]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0903]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0989]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0812]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0858]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0862]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0758]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.085] Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0853]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0756]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0807]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.104] Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0932]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.103] Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0891]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0829]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0915]Epoch 7:  95%|#########5| 625/655 [00:00<00:00, 706.33batch/s, train/train_loss_1=0.0782]                                                                                           0%|          | 0/655 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.076]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0884]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0912]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.105] Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0878]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0947]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.091] Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0907]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0872]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0962]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.074] Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0819]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0814]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0965]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.073] Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0872]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0769]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0732]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0993]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.089] Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0986]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0756]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0842]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.077] Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0811]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.064] Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0913]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0966]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0876]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0754]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0776]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0864]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0823]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0943]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0884]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0978]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0884]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0796]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0801]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0818]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0905]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0788]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0864]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0958]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0879]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0903]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0879]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.082] Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.094]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0782]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0847]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0857]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.081] Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0869]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0956]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0826]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0811]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0835]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0888]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0885]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0881]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0817]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.077] Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0873]Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.075] Epoch 8:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0772]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0772]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.103] Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0862]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0799]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0921]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0913]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0975]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0741]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0694]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0825]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0758]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0796]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0758]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.086] Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0924]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0764]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0781]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0847]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0916]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0925]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0975]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.084] Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.101]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0763]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0869]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.101] Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0981]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0914]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0785]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0817]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0909]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.083] Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0861]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0791]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0839]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0951]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0891]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0858]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0819]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0843]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0689]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0886]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0845]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0749]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0767]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0769]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0719]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0915]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0877]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0734]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0893]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0864]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0892]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0892]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0915]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.086] Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0893]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0843]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0835]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0705]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0901]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0926]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.076] Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0909]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0913]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0945]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0979]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0877]Epoch 8:  10%|#         | 67/655 [00:00<00:00, 665.99batch/s, train/train_loss_1=0.0888]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0888]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.105] Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0878]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0847]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0983]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.088] Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0868]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0886]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0834]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0802]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0861]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0802]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.102] Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0762]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0848]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0822]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0932]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0854]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0849]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0838]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.095] Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0958]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0822]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0838]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0907]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0919]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0807]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0864]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0919]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0899]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0873]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0773]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0807]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0795]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0827]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.09]  Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.09]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0893]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.075] Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0847]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0757]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0913]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0937]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0911]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0853]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0763]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0787]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0756]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0965]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0937]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0826]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0911]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.072] Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0893]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0931]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0758]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0789]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0823]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0941]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0961]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0869]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0935]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0823]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0889]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0924]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0883]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0835]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0937]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0878]Epoch 8:  21%|##        | 135/655 [00:00<00:00, 670.13batch/s, train/train_loss_1=0.0871]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0871]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0956]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0878]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0778]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0871]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0942]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.08]  Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.089]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.103]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0844]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0801]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0741]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0959]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0754]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0883]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0916]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0917]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0942]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0818]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0853]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.077] Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0871]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0946]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0888]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.096] Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0832]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0909]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0779]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0686]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0837]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0772]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0838]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0852]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0805]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0871]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0844]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0955]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0837]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0936]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0864]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0876]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0715]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.085] Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0865]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.083] Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0903]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0929]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.071] Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0883]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0885]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0906]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0777]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.085] Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0944]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0814]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0905]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0835]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0855]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0799]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.085] Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0747]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0785]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0844]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0912]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0889]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0741]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0865]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0875]Epoch 8:  31%|###1      | 204/655 [00:00<00:00, 675.19batch/s, train/train_loss_1=0.0832]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0832]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0713]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0819]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0825]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0875]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0884]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0972]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0952]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0828]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.105] Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0914]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0882]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0789]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0798]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.1]   Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0906]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0869]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0853]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0922]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0897]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0889]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0801]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0869]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0908]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0883]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0807]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0843]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0847]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0848]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0858]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0855]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0991]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0835]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0843]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.081] Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.091]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0734]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0822]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0968]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0914]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0746]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0873]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0967]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.072] Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0844]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0919]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0925]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0924]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0943]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0896]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0848]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.089] Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.078]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0854]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0871]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0765]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0772]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0807]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0734]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0767]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0857]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.085] Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.103]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.092]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0861]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0803]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0833]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0816]Epoch 8:  42%|####1     | 272/655 [00:00<00:00, 668.86batch/s, train/train_loss_1=0.0846]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0846]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.097] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0884]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0775]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0751]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.078] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0958]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0924]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0868]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0916]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0825]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0843]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0748]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0944]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0846]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0843]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0891]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0894]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0901]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.082] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0887]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0854]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0925]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0841]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0713]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0889]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0807]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.085] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0803]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0912]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0923]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.092] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0808]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0966]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0799]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0893]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0883]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.087] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.087]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0818]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.083] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0816]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0854]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0801]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0949]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0795]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0903]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.079] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0903]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0773]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0782]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0858]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.102] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0799]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0763]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0852]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0858]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.102] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0818]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0798]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.068] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0888]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.08]  Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0818]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0757]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.081] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0861]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.101] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0883]Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.091] Epoch 8:  52%|#####1    | 340/655 [00:00<00:00, 671.34batch/s, train/train_loss_1=0.0901]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0901]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0857]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.106] Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0934]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0786]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0888]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0815]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0955]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0971]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0809]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0806]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0788]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0912]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.076] Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0895]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0806]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0861]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0834]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0869]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0863]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0796]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0794]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.112] Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0952]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.105] Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0712]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0983]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0665]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.081] Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0825]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0767]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0956]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.081] Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0804]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.082] Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0825]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0762]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0759]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0869]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0815]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0821]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0896]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0807]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0832]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0865]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0989]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0723]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.101] Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0763]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.099] Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0832]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0861]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0789]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0741]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0815]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0876]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0895]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0742]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0694]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0806]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0736]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0773]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0864]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0851]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0834]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0852]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0831]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0979]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0924]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.086] Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0811]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0785]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0801]Epoch 8:  63%|######2   | 410/655 [00:00<00:00, 680.61batch/s, train/train_loss_1=0.0681]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0681]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0723]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0785]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0816]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.101] Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0856]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.104] Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0914]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0909]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0844]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0709]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0772]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0959]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.089] Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0935]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0667]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.102] Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0926]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0857]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0929]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0858]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.112] Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0866]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.08]  Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0928]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0836]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0869]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0933]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0889]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0793]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.071] Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0857]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0855]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0969]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0863]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0661]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0754]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0983]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0802]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0925]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0753]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0795]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0877]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.098] Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0842]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0988]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0846]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0851]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0846]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0941]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0727]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.104] Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0928]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0957]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0847]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0832]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.08]  Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0776]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0944]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0817]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0997]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0789]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.065] Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0841]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0807]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0803]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0747]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0871]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0902]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0895]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0874]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0853]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0767]Epoch 8:  74%|#######3  | 483/655 [00:00<00:00, 693.75batch/s, train/train_loss_1=0.0963]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0963]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0914]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0939]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0815]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0947]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0784]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0936]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0827]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0835]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0792]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.104] Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0928]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0895]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0856]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0873]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0816]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0816]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0911]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0883]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.101] Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0956]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0813]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0826]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0669]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0812]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0987]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0849]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.11]  Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0847]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.101] Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0879]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.106] Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0865]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0894]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0874]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0911]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0863]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0782]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0908]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0856]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0966]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0923]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0864]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0833]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.112] Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0769]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0854]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0973]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0733]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0847]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0777]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0928]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0913]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0865]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0896]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0798]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0816]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0967]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0754]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0865]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0808]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0888]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0873]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.084] Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0882]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0693]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0869]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0876]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0852]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0926]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.081] Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0835]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0797]Epoch 8:  85%|########4 | 556/655 [00:00<00:00, 704.79batch/s, train/train_loss_1=0.0793]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0793]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0928]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0696]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.105] Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0811]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0902]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0874]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0946]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0804]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0809]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0885]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0828]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.08]  Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0912]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0979]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0798]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0858]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0862]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0803]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0864]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0747]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0834]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0941]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0953]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0801]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0803]Epoch 8:  96%|#########6| 629/655 [00:00<00:00, 712.21batch/s, train/train_loss_1=0.0699]                                                                                           0%|          | 0/655 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0977]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0908]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0965]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0873]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0923]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0758]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0747]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0788]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.103] Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0849]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.102] Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0902]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0916]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0863]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.079] Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0934]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0852]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0822]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0832]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0887]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0908]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0795]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0761]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0701]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0651]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.098] Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0857]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.086] Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0784]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0712]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0683]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0984]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0767]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0811]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0896]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0857]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0852]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0878]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0978]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0896]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0896]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0811]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0787]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0817]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0927]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0822]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0866]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0814]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0783]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0859]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0825]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0772]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0754]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0899]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0889]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0798]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0776]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0814]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0904]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0863]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0898]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0817]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0892]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0809]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0851]Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.085] Epoch 9:   0%|          | 0/655 [00:00<?, ?batch/s, train/train_loss_1=0.0849]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0849]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0884]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0893]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0871]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0992]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0951]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0891]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0873]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0804]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0929]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0985]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0946]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0883]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0784]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0873]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0921]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0814]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0935]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0885]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0828]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0777]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.075] Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0913]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0967]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0927]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0898]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0857]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0803]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0849]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0787]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0882]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0876]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0852]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0869]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0839]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.097] Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0839]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0921]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0858]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.103] Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0852]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0935]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0811]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0688]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0813]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0832]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0916]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.102] Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0751]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0843]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.107] Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0894]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0926]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0978]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0832]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0987]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0819]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0793]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0853]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0667]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0888]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.11]  Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0883]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0695]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0756]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0902]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0947]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.0787]Epoch 9:  10%|#         | 67/655 [00:00<00:00, 664.54batch/s, train/train_loss_1=0.103] Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.103]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0784]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0868]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0941]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0723]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0998]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0831]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0804]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.093] Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0818]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0749]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0888]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0715]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0767]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0787]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.093] Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0856]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0951]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0996]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0687]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.103] Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0892]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0796]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0934]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0977]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0999]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.109] Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0882]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0726]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0773]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0784]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0868]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0786]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0918]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0935]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.102] Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0864]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0721]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0778]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0792]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0898]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0928]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0704]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0909]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0921]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0821]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0835]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0679]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0803]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.071] Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0973]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0817]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0806]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0929]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0746]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0837]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0874]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0852]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0739]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0822]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0927]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0765]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0837]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0904]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0943]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0892]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0872]Epoch 9:  21%|##        | 135/655 [00:00<00:00, 667.06batch/s, train/train_loss_1=0.0912]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0912]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0887]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0782]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0857]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0772]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0942]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0787]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0889]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0977]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0841]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0804]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0731]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0873]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0807]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0882]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.106] Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0771]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.1]   Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0776]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0755]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0821]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0956]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0946]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0973]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0886]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0893]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0679]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0715]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0914]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0888]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0704]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0861]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0988]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0776]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0768]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0896]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0758]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.08]  Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0821]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0771]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0799]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.083] Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0919]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0915]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0686]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0862]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.082] Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0989]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0771]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0806]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0848]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0916]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.084] Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0938]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0727]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0902]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0838]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.087] Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0897]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0785]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0887]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0746]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.09]  Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0973]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0931]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.106] Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0842]Epoch 9:  31%|###       | 202/655 [00:00<00:00, 662.64batch/s, train/train_loss_1=0.0888]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0888]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0744]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0717]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0796]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0906]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0811]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0912]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0944]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0969]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0835]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0799]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0752]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0917]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0815]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0826]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0735]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0901]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0903]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0881]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0809]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0672]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0955]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0931]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0836]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.101] Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0878]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.073] Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0795]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0993]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.098] Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0834]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0936]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0752]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0812]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0776]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0877]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0813]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.107] Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0947]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0984]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0944]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0849]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0874]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0865]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0703]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0821]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0798]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0808]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0846]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0766]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0785]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0967]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.108] Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0786]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0801]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0883]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0802]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0829]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0955]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0844]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.087] Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0778]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0722]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0858]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0859]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0957]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0633]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.103] Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.0763]Epoch 9:  41%|####1     | 269/655 [00:00<00:00, 663.42batch/s, train/train_loss_1=0.087] Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.087]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0784]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.069] Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0912]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0944]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0875]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0871]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0729]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0798]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.089] Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0738]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0707]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0841]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0816]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.085] Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0879]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0733]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0912]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0916]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0845]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0712]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0811]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0925]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0658]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.101] Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.106]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0948]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0777]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0903]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0905]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0798]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0857]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.085] Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.085]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0812]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0799]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0772]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0735]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0793]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0981]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0939]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0874]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0773]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0886]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0856]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0865]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.084] Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0806]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0957]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.09]  Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0782]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0947]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0804]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0664]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0769]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0903]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0703]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0951]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0747]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0967]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0824]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0857]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0806]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0779]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0682]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0858]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0829]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0775]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.07]  Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0694]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0897]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.102] Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0894]Epoch 9:  52%|#####1    | 338/655 [00:00<00:00, 671.28batch/s, train/train_loss_1=0.0842]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0842]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0791]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0854]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0795]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0893]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0856]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0804]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0842]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0984]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0753]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0662]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0928]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0694]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0719]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0788]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0776]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0737]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0906]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.102] Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.103]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0672]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0926]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0785]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0875]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0809]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0758]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0835]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0761]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0852]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0899]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0752]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0975]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0743]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0759]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.103] Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0954]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0696]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0895]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0652]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0915]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0842]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0773]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.087] Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0844]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0869]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0942]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.08]  Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0731]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0909]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0878]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0762]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0758]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0851]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0861]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0874]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0829]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0852]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0792]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0817]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0814]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0964]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0889]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.1]   Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0859]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0876]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.091] Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0937]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0913]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.071] Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0802]Epoch 9:  63%|######2   | 411/655 [00:00<00:00, 689.93batch/s, train/train_loss_1=0.0935]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0935]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0926]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0983]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0849]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0854]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0951]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0826]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0844]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0885]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0959]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0866]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0939]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0728]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0783]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0777]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0962]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0837]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0925]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.078] Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0858]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0691]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.102] Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0709]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0891]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0843]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0927]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0893]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0819]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0913]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0882]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.106] Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0906]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0985]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0839]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0993]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0886]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.082] Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0896]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0825]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0983]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0913]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0984]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.09]  Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0735]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0902]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.101] Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0901]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0858]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0724]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0843]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.084] Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.08] Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0737]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0937]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0809]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.091] Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.094]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0762]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0885]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0829]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0851]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0905]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0791]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0829]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0956]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0829]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.107] Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0855]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0864]Epoch 9:  73%|#######3  | 481/655 [00:00<00:00, 689.15batch/s, train/train_loss_1=0.0806]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0806]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0793]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0821]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0896]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.079] Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0833]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0857]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0915]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0961]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0804]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0941]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0924]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0906]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0826]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0838]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0943]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.097] Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0868]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0881]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0919]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0759]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0855]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0741]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0837]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0796]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0828]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0841]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0871]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0891]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0859]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0729]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0837]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0874]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0858]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0881]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0849]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0963]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0964]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0937]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.102] Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0822]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0803]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0868]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0914]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.083] Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.075]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0826]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0816]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0924]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0874]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0745]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0833]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0879]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0703]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0723]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0941]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0772]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0755]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0895]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0993]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0851]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0978]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0792]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.106] Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0773]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0904]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0773]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0777]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0765]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0838]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0824]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.102] Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0801]Epoch 9:  84%|########3 | 550/655 [00:00<00:00, 682.85batch/s, train/train_loss_1=0.0715]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0715]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0825]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0657]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0835]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0993]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0823]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0825]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0718]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.081] Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0811]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0774]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0868]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0827]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0764]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.087] Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0902]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0768]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0874]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0894]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0972]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0896]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0904]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0861]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.103] Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0956]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.089] Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0949]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0862]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0863]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0838]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0872]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0921]Epoch 9:  95%|#########5| 623/655 [00:00<00:00, 696.75batch/s, train/train_loss_1=0.0672]Epoch 9: 100%|##########| 655/655 [00:01<00:00, 647.74batch/s, train/train_loss_1=0.0672]

Test

from sklearn.metrics import accuracy_score
log = {"name":[], "accuracy":[]}
for data_name in DEFAULT_DATA_CONFIGS.keys():
    datamodule = load_data(data_name = data_name)
    params, module = load_pred_model(data_name)
    x,y_true = datamodule.test_dataset[:]
    y_pred = module.pred_fn(x = x, params = params, rng_key = random.PRNGKey(0))
    assert y_pred.shape == (x.shape[0],1)

    # calculate accuracy
    y_pred = y_pred > 0.5
    y_pred = y_pred.astype(int)
    accuracy = accuracy_score(y_true,y_pred)

    log["name"].append(data_name)
    log["accuracy"].append(accuracy)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(

Random Forest

from sklearn.ensemble import RandomForestClassifier

log["rfc accuracy"] = []
for data_name in DEFAULT_DATA_CONFIGS.keys():
    rfc = RandomForestClassifier(random_state=0)
    datamodule = load_data(data_name = data_name)
    X_train, y_train = datamodule.train_dataset[:]
    rfc.fit(X_train, y_train)
    X_test, y_test = datamodule.test_dataset[:]
    y_pred = rfc.predict(X_test)

    # calculate accuracy
    y_pred = y_pred > 0.5
    y_pred = y_pred.astype(int)
    accuracy = accuracy_score(y_test,y_pred)
    log["rfc accuracy"].append(accuracy)

pd.DataFrame.from_dict(log)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
<ipython-input-1-a2d5158b0ca2>:8: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  rfc.fit(X_train, y_train)
name accuracy rfc accuracy
0 adult 0.824100 0.806166
1 heloc 0.702868 0.719312
2 oulad 0.926494 0.940361
3 credit 0.813333 0.813467
4 cancer 0.909091 0.916084
5 student_performance 0.901840 0.920245
6 titanic 0.816143 0.802691
7 german 0.756000 0.756000
8 spam 0.933970 0.943527
9 ozone 0.933754 0.949527
10 qsar 0.848485 0.863636
11 bioresponse 0.763326 0.801706
12 churn 0.806360 0.781942
13 road 0.751870 0.790738

Examples

A siimple example to train a predictive model.

from relax.data import TabularDataModule, load_data
from relax.module import PredictiveTrainingModule, PredictiveModelConfigs
datamodule = load_data('adult')

params, opt_state = train_model(
    PredictiveTrainingModule({'sizes': [50, 10, 50], 'lr': 0.003}), 
    datamodule, t_configs={
        'n_epochs': 10, 'batch_size': 256, 'monitor_metrics': 'val/val_loss'
    }
)
/Users/chuck/opt/anaconda3/envs/relax/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py:828: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
Epoch 9: 100%|██████████| 96/96 [00:00<00:00, 377.38batch/s, train/train_loss_1=0.0706]