Classification model not getting trained #8

@msaisumanth @w4nderlust
I tried both the approaches - used the YAML file suggested by @msaisumanth and increased the data points to 10,000. But my model has not really improved in performance. Still, the predictions are all 5.

I also changed the model from multi-class classification (5 classes) to binary classification, but still, it is classifying all as label 5.

I am using the following code -

!pip install https://github.com/uber/ludwig/archive/master.zip
!pip install --upgrade scikit-image
!python -m spacy download en

#upload the training dataset
from google.colab import files
uploaded = files.upload()

#upload the dataset for prediction
from google.colab import files
uploaded = files.upload()

#upload the Model defintion file
from google.colab import files
uploaded = files.upload()

# Running the reduced model using YAML file
!ludwig train --data_csv data_4900_2_Class.csv --model_definition_file model_definition.yaml

# Predicting for the output using the trained model
!ludwig predict --only_predictions --data_csv data_4900to5000_2_Class.csv --model_path results/experiment_run_3/model

The training log is given below for reference -

| |_ _ | | __ ()_ _
| | || / \ V V / / _ |
||_,_,|_/_/|_, |
|__/
ludwig v0.1.0 - Train

Experiment name: experiment
Model name: run
Output path: results/experiment_run_3

ludwig_version: '0.1.0'
command: ('/usr/local/bin/ludwig train --data_csv data_4900_2_Class.csv '
'--model_definition_file model_definition.yaml')
dataset_type: 'data_4900_2_Class.csv'
model_definition: { 'combiner': {'type': 'concat'},
'input_features': [ { 'embedding_size': 28,
'encoder': 'parallel_cnn',
'filter_size': 1,
'level': 'word',
'name': 'review_text',
'num_filters': 28,
'tied_weights': None,
'type': 'text'}],
'output_features': [ { 'dependencies': [],
'loss': { 'class_distance_temperature': 0,
'class_weights': 1,
'confidence_penalty': 0,
'distortion': 1,
'labels_smoothing': 0,
'negative_samples': 0,
'robust_lambda': 0,
'sampler': None,
'type': 'softmax_cross_entropy',
'unique': False,
'weight': 1},
'name': 'rating',
'reduce_dependencies': 'sum',
'reduce_input': 'sum',
'top_k': 3,
'type': 'category'}],
'preprocessing': { 'bag': { 'fill_value': '',
'format': 'space',
'lowercase': 10000,
'missing_value_strategy': 'fill_with_const',
'most_common': False},
'binary': { 'fill_value': 0,
'missing_value_strategy': 'fill_with_const'},
'category': { 'fill_value': ' ',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 10000},
'force_split': False,
'image': {'missing_value_strategy': 'backfill'},
'numerical': { 'fill_value': 0,
'missing_value_strategy': 'fill_with_const'},
'sequence': { 'fill_value': '',
'format': 'space',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 20000,
'padding': 'right',
'padding_symbol': ' ',
'sequence_length_limit': 256,
'unknown_symbol': ' '},
'set': { 'fill_value': '',
'format': 'space',
'lowercase': False,
'missing_value_strategy': 'fill_with_const',
'most_common': 10000},
'split_probabilities': (0.7, 0.1, 0.2),
'stratify': None,
'text': { 'char_format': 'characters',
'char_most_common': 70,
'char_sequence_length_limit': 1024,
'fill_value': '',
'lowercase': True,
'missing_value_strategy': 'fill_with_const',
'padding': 'right',
'padding_symbol': ' ',
'unknown_symbol': ' ',
'word_format': 'space_punct',
'word_most_common': 20000,
'word_sequence_length_limit': 256},
'timeseries': { 'fill_value': '',
'format': 'space',
'missing_value_strategy': 'fill_with_const',
'padding': 'right',
'padding_value': 0,
'timeseries_length_limit': 256}},
'training': { 'batch_size': 128,
'bucketing_field': None,
'decay': False,
'decay_rate': 0.96,
'decay_steps': 10000,
'dropout_rate': 0.0,
'early_stop': 5,
'epochs': 5,
'gradient_clipping': None,
'increase_batch_size_on_plateau': 0,
'increase_batch_size_on_plateau_max': 512,
'increase_batch_size_on_plateau_patience': 5,
'increase_batch_size_on_plateau_rate': 2,
'learning_rate': 0.1,
'learning_rate_warmup_epochs': 5,
'optimizer': { 'beta1': 0.9,
'beta2': 0.999,
'epsilon': 1e-08,
'type': 'adam'},
'reduce_learning_rate_on_plateau': 0,
'reduce_learning_rate_on_plateau_patience': 5,
'reduce_learning_rate_on_plateau_rate': 0.5,
'regularization_lambda': 0,
'regularizer': 'l2',
'staircase': False,
'validation_field': 'combined',
'validation_measure': 'loss'}}

Using full raw csv, no hdf5 and json file with the same name have been found
Building dataset (it may take a while)
Loading NLP pipeline
Writing dataset
Writing train set metadata with vocabulary
Training set: 3466
Validation set: 456
Test set: 976
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.

+----------+
¦ TRAINING ¦
+----------+

2019-02-26 11:09:15.745520: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-02-26 11:09:15.745805: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5646580 executing computations on platform Host. Devices:
2019-02-26 11:09:15.745846: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,

Epoch 1
Training: 100% 28/28 [00:05<00:00, 4.82it/s]
Evaluation train: 100% 28/28 [00:01<00:00, 16.65it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 18.47it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 17.47it/s]
Took 8.1904s
+----------------------------------------------+
¦ rating ¦ loss ¦ accuracy ¦ hits_at_k ¦
¦----------+--------+------------+-------------¦
¦ train ¦ 1.1489 ¦ 0.4204 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ vali ¦ 1.1208 ¦ 0.4386 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ test ¦ 1.1473 ¦ 0.4150 ¦ 1.0000 ¦
+----------------------------------------------+
Validation loss on combined improved, model saved

Epoch 2
Training: 100% 28/28 [00:05<00:00, 5.04it/s]
Evaluation train: 100% 28/28 [00:01<00:00, 17.10it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 18.67it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 17.48it/s]
Took 7.8721s
+----------------------------------------------+
¦ rating ¦ loss ¦ accuracy ¦ hits_at_k ¦
¦----------+--------+------------+-------------¦
¦ train ¦ 0.6740 ¦ 0.6030 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ vali ¦ 0.6753 ¦ 0.5899 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ test ¦ 0.6663 ¦ 0.6096 ¦ 1.0000 ¦
+----------------------------------------------+
Validation loss on combined improved, model saved

Epoch 3
Training: 100% 28/28 [00:05<00:00, 5.05it/s]
Evaluation train: 100% 28/28 [00:01<00:00, 16.84it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 18.90it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 17.44it/s]
Took 7.8884s
+----------------------------------------------+
¦ rating ¦ loss ¦ accuracy ¦ hits_at_k ¦
¦----------+--------+------------+-------------¦
¦ train ¦ 0.6747 ¦ 0.6030 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ vali ¦ 0.6792 ¦ 0.5899 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ test ¦ 0.6688 ¦ 0.6096 ¦ 1.0000 ¦
+----------------------------------------------+
Last improvement of loss on combined happened 1 epoch ago

Epoch 4
Training: 100% 28/28 [00:05<00:00, 5.08it/s]
Evaluation train: 100% 28/28 [00:01<00:00, 17.15it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 18.84it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 17.40it/s]
Took 7.8209s
+----------------------------------------------+
¦ rating ¦ loss ¦ accuracy ¦ hits_at_k ¦
¦----------+--------+------------+-------------¦
¦ train ¦ 0.6696 ¦ 0.6030 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ vali ¦ 0.6731 ¦ 0.5899 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ test ¦ 0.6644 ¦ 0.6096 ¦ 1.0000 ¦
+----------------------------------------------+
Validation loss on combined improved, model saved

Epoch 5
Training: 100% 28/28 [00:05<00:00, 5.05it/s]
Evaluation train: 100% 28/28 [00:01<00:00, 16.83it/s]
Evaluation vali : 100% 4/4 [00:00<00:00, 18.45it/s]
Evaluation test : 100% 8/8 [00:00<00:00, 17.47it/s]
Took 7.8870s
+----------------------------------------------+
¦ rating ¦ loss ¦ accuracy ¦ hits_at_k ¦
¦----------+--------+------------+-------------¦
¦ train ¦ 0.6864 ¦ 0.6030 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ vali ¦ 0.6942 ¦ 0.5899 ¦ 1.0000 ¦
+----------+--------+------------+-------------¦
¦ test ¦ 0.6797 ¦ 0.6096 ¦ 1.0000 ¦
+----------------------------------------------+
Last improvement of loss on combined happened 1 epoch ago

Best validation model epoch:
Best validation model loss on validation set combined: 1.1208468320076925
Best validation model loss on test set combined: 1.1472772301220504

Finished: experiment_run
Saved to: results/experiment_run_3

The input training CSV is -
data_4900_2_Class.csv.txt

The file on which I am testing is - data_4900to5000_2_Class.csv.txt

I have tried building a model using Google Sentence Encoder and 3 layers for the neural network and it gives decent results. I am using the following network on Keras -

#Explicitly cast the input as a string

def UniversalEmbedding(x):
    return embed(tf.squeeze(tf.cast(x, tf.string)), signature="default", as_dict=True)["default"]
input_text = layers.Input(shape=(1,), dtype=tf.string)
embedding = layers.Lambda(UniversalEmbedding, output_shape=(embed_size,))(input_text)
dense = layers.Dense(256, activation='relu')(embedding)
dense = layers.Dense(256, activation='relu')(dense)
dense = layers.Dense(256, activation='relu')(dense)
pred = layers.Dense(category_counts, activation='softmax')(dense)
model = Model(inputs=[input_text], outputs=pred)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

The model looks like -

INFO:tensorflow:Saver not created because there are no variables in the graph to restore

Layer (type) Output Shape Param #

input_1 (InputLayer) (None, 1) 0

lambda_1 (Lambda) (None, 512) 0

dense_1 (Dense) (None, 256) 131328

dense_2 (Dense) (None, 256) 65792

dense_3 (Dense) (None, 256) 65792

dense_4 (Dense) (None, 5) 1285

Total params: 264,197
Trainable params: 264,197
Non-trainable params: 0

Can you please help me replicate the performance on Ludwig?