Predict the sequence of labels for sentences in documents. #23

Aris Fergadis afergadis · Nov'20

Hi there.

I would like to use Ludwig to predict the sequence of labels for all the sentences in a set of documents. More specifically, I would like to use the Pubmed RCT dataset.

The input is abstracts. Each abstract has sentences with a label (BACKGROUND, OBJECTIVE, METHOD, RESULT, CONCLUSION). The task is to predict the labels of the sentences for the test set.

How do I have to setup the model definition file?

Thank you in advance.

replies 2
views 1.6K
likes 0

Piero Molino w4nderlust · Nov'20

Hi Aris,

there could be more than one way to approach this problem.
In my opnion the most traightforward way would be to split the abstract in sentences and feeding each sentence as input with the label as output.
This may end up working fine, but does not consider the fact that there could be dependencies in the sequence of labels.
To address that you may want to provide both the current sentence, the label, and the previous sentences and previous labels also as inputs.
Finally another alternative could be to label each word in the whole abstract with a predicted label with a tagger model (that knows about previous and following tokens), and then as a postprocessing step do a majority vote for the labels in each sentence.

There's probably many more options, hopefully this helps you getting started with some ideas :)

Aris Fergadis afergadis · Nov'20 Author

Thank you, Piero.

I will post again if I have satisfactory results.