Your browser was unable to load all of the resources. They may have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.

Classification model not getting trained #8

#1

@msaisumanth @w4nderlust
I tried both the approaches - used the YAML file suggested by @msaisumanth and increased the data points to 10,000. But my model has not really improved in performance. Still, the predictions are all 5.

I also changed the model from multi-class classification (5 classes) to binary classification, but still, it is classifying all as label 5.

I am using the following code -

!pip install https://github.com/uber/ludwig/archive/master.zip
!pip install --upgrade scikit-image
!python -m spacy download en

#upload the training dataset
from google.colab import files
uploaded = files.upload()

#upload the dataset for prediction
from google.colab import files
uploaded = files.upload()

#upload the Model defintion file
from google.colab import files
uploaded = files.upload()

# Running the reduced model using YAML file
!ludwig train --data_csv data_4900_2_Class.csv --model_definition_file model_definition.yaml

# Predicting for the output using the trained model
!ludwig predict --only_predictions --data_csv data_4900to5000_2_Class.csv --model_path results/experiment_run_3/model

The training log is given below for reference -


| |_ _ | | __ ()_ _
| | || / \ V V / / _ |
|
|_,_,|_/_/|_, |
|
__/
ludwig v0.1.0 - Train

Experiment name: experiment
Model name: run
Output path: results/experiment_run_3

ludwig_version: '0.1.0'
command: ('/usr/local/bin/ludwig train --data_csv data_4900_2_Class.csv '
'--model_definition_file model_definition.yaml')
dataset_type: 'data_4900_2_Class.csv'
model_definition: { 'combiner': {'type': 'concat'},
'input_features': [ { 'embedding_size': 28,
'encoder': 'parallel_cnn',
'filter_size': 1,
'level': 'word',
'name': 'review_text',
'num_filters': 28,
'tied_weights': None,
'type': 'text'}],
'output_features': [ { 'dependencies': [],
'loss': { 'class_distance_temperature': 0,
'class_weights': 1,
'confidence_penalty': 0,
'distortion': 1,
'labels_smoothing': 0,
'negative_samples': 0,
'robust_lambda': 0,
'sampler': None,
'type': 'softmax_cross_entropy',
'unique': False,
'weight': 1},
'name': 'rating',
'reduce_dependencies': 'sum',
'reduce_input': 'sum',
'top_k': 3,
'type': 'category'}],
'preprocessing': { 'bag': { 'fill_value': '',
'format': 'space',
'lowercase': 10000,
'missing_value_strategy': 'fill_with_const',
'most_common': False},
'binary': { 'fill_value': 0,
'missing_value_strategy': 'fill_with_const'},
'category': { 'fill_value': ' ',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 10000},
'force_split': False,
'image': {'missing_value_strategy': 'backfill'},
'numerical': { 'fill_value': 0,
'missing_value_strategy': 'fill_with_const'},
'sequence': { 'fill_value': '',
'format': 'space',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 20000,
'padding': 'right',
'padding_symbol': ' ',
'sequence_length_limit': 256,
'unknown_symbol': ' '},
'set': { 'fill_value': '',
'format': 'space',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 10000},
'split_probabilities': (0.7, 0.1, 0.2),
'stratify': None,
'text': { 'char_format': 'characters',
'char_most_common': 70,
'char_sequence_length_limit': 1024,
'fill_value': '',
'lowercase': True,
'missing_value_strategy': 'fill_with_const',
'padding': 'right',
'padding_symbol': ' ',
'unknown_symbol': ' ',
'word_format': 'space_punct',
'word_most_common': 20000,
'word_sequence_length_limit': 256},
'timeseries': { 'fill_value': '',
'format': 'space',
'missing_value_strategy': 'fill_with_const',
'padding': 'right',
'padding_value': 0,
'timeseries_length_limit': 256}},
'training': { 'batch_size': 128,
'bucketing_field': None,
'decay': False,
'decay_rate': 0.96,
'decay_steps': 10000,
'dropout_rate': 0.0,
'early_stop': 5,
'epochs': 5,
'gradient_clipping': None,
'increase_batch_size_on_plateau': 0,
'increase_batch_size_on_plateau_max': 512,
'increase_batch_size_on_plateau_patience': 5,
'increase_batch_size_on_plateau_rate': 2,
'learning_rate': 0.1,
'learning_rate_warmup_epochs': 5,
'optimizer': { 'beta1': 0.9,
'beta2': 0.999,
'epsilon': 1e-08,
'type': 'adam'},
'reduce_learning_rate_on_plateau': 0,
'reduce_learning_rate_on_plateau_patience': 5,
'reduce_learning_rate_on_plateau_rate': 0.5,
'regularization_lambda': 0,
'regularizer': 'l2',
'staircase': False,
'validation_field': 'combined',
'validation_measure': 'loss'}}

Using full raw csv, no hdf5 and json file with the same name have been found
Building dataset (it may take a while)
Loading NLP pipeline
Writing dataset
Writing train set metadata with vocabulary
Training set: 3466
Validation set: 456
Test set: 976
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.

+----------+
¦ TRAINING ¦
+----------+

2019-02-26 11:09:15.745520: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-02-26 11:09:15.745805: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5646580 executing computations on platform Host. Devices:
2019-02-26 11:09:15.745846: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,

Epoch 1
Training: 100% 28/28 [00:05<00:00, 4.82it/s]
Evaluation train: 100% 28/28 [00:01<00:00, 16.65it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 18.47it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 17.47it/s]
Took 8.1904s
+----------------------------------------------+
¦ rating ¦ loss ¦ accuracy ¦ hits_at_k ¦
¦----------+--------+------------+-------------¦
¦ train ¦ 1.1489 ¦ 0.4204 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ vali ¦ 1.1208 ¦ 0.4386 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ test ¦ 1.1473 ¦ 0.4150 ¦ 1.0000 ¦
+----------------------------------------------+
Validation loss on combined improved, model saved

Epoch 2
Training: 100% 28/28 [00:05<00:00, 5.04it/s]
Evaluation train: 100% 28/28 [00:01<00:00, 17.10it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 18.67it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 17.48it/s]
Took 7.8721s
+----------------------------------------------+
¦ rating ¦ loss ¦ accuracy ¦ hits_at_k ¦
¦----------+--------+------------+-------------¦
¦ train ¦ 0.6740 ¦ 0.6030 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ vali ¦ 0.6753 ¦ 0.5899 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ test ¦ 0.6663 ¦ 0.6096 ¦ 1.0000 ¦
+----------------------------------------------+
Validation loss on combined improved, model saved

Epoch 3
Training: 100% 28/28 [00:05<00:00, 5.05it/s]
Evaluation train: 100% 28/28 [00:01<00:00, 16.84it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 18.90it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 17.44it/s]
Took 7.8884s
+----------------------------------------------+
¦ rating ¦ loss ¦ accuracy ¦ hits_at_k ¦
¦----------+--------+------------+-------------¦
¦ train ¦ 0.6747 ¦ 0.6030 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ vali ¦ 0.6792 ¦ 0.5899 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ test ¦ 0.6688 ¦ 0.6096 ¦ 1.0000 ¦
+----------------------------------------------+
Last improvement of loss on combined happened 1 epoch ago

Epoch 4
Training: 100% 28/28 [00:05<00:00, 5.08it/s]
Evaluation train: 100% 28/28 [00:01<00:00, 17.15it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 18.84it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 17.40it/s]
Took 7.8209s
+----------------------------------------------+
¦ rating ¦ loss ¦ accuracy ¦ hits_at_k ¦
¦----------+--------+------------+-------------¦
¦ train ¦ 0.6696 ¦ 0.6030 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ vali ¦ 0.6731 ¦ 0.5899 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ test ¦ 0.6644 ¦ 0.6096 ¦ 1.0000 ¦
+----------------------------------------------+
Validation loss on combined improved, model saved

Epoch 5
Training: 100% 28/28 [00:05<00:00, 5.05it/s]
Evaluation train: 100% 28/28 [00:01<00:00, 16.83it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 18.45it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 17.47it/s]
Took 7.8870s
+----------------------------------------------+
¦ rating ¦ loss ¦ accuracy ¦ hits_at_k ¦
¦----------+--------+------------+-------------¦
¦ train ¦ 0.6864 ¦ 0.6030 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ vali ¦ 0.6942 ¦ 0.5899 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ test ¦ 0.6797 ¦ 0.6096 ¦ 1.0000 ¦
+----------------------------------------------+
Last improvement of loss on combined happened 1 epoch ago

Best validation model epoch:
Best validation model loss on validation set combined: 1.1208468320076925
Best validation model loss on test set combined: 1.1472772301220504

Finished: experiment_run
Saved to: results/experiment_run_3

The input training CSV is -
data_4900_2_Class.csv.txt

The file on which I am testing is - data_4900to5000_2_Class.csv.txt

I have tried building a model using Google Sentence Encoder and 3 layers for the neural network and it gives decent results. I am using the following network on Keras -

#Explicitly cast the input as a string

def UniversalEmbedding(x):
    return embed(tf.squeeze(tf.cast(x, tf.string)), signature="default", as_dict=True)["default"]
input_text = layers.Input(shape=(1,), dtype=tf.string)
embedding = layers.Lambda(UniversalEmbedding, output_shape=(embed_size,))(input_text)
dense = layers.Dense(256, activation='relu')(embedding)
dense = layers.Dense(256, activation='relu')(dense)
dense = layers.Dense(256, activation='relu')(dense)
pred = layers.Dense(category_counts, activation='softmax')(dense)
model = Model(inputs=[input_text], outputs=pred)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

The model looks like -

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Layer (type) Output Shape Param #

input_1 (InputLayer) (None, 1) 0


lambda_1 (Lambda) (None, 512) 0


dense_1 (Dense) (None, 256) 131328


dense_2 (Dense) (None, 256) 65792


dense_3 (Dense) (None, 256) 65792


dense_4 (Dense) (None, 5) 1285

Total params: 264,197
Trainable params: 264,197
Non-trainable params: 0

Can you please help me replicate the performance on Ludwig?

  • solved #3
  • replies 3
  • views 4.6K
  • likes 0
pdsing created from GitHub issue Classification model not getting trained
#2
#3

The performance is not intrinsic to a model, it depends on the dataset. The same model that is 90% accurate on one dataset can be 30% accurate on another, depending on the data.

That said, the size of the model comapred to the amount of data it is trained on likely makes it overfit, so you need to counterbalance that. there are several options: using regularization (dropout or l2), reducing the size of the model, using pretrained embeddings.
Also you may want to reduce the learning rate and increase the epochs needed for early stopping.

pdsing accepted post #3 as the answer
#4