The pipeline provides an easy-to-use interface for making predictions. Even a highly accurate model will be useless if it is unclear how to correctly prepare the inputs and how to interpret the outputs. To make things clear, we have introduced a pipeline that is closely linked to a model. It is worth knowing how to deal with the whole process, especially if you plan to build a custom model.
The diagram above illustrates an overview of the pipeline stages. As usual, at the very beginning, we pre-process the raw inputs. We convert the text and the aspects into a task which keeps examples (pairs of a text and an aspect) that we can then further tokenize, encode and pass to the model. The model makes a prediction, and here is a change. Instead of directly post-processing the model outputs, we have added a review process wherein an independent component called the professor supervises and explains the model prediction. The professor might dismiss a model prediction if the model internal states or outputs seem suspicious. In the next sections, we will discuss in detail how the model and the professor work.
import aspect_based_sentiment_analysis as absa name = 'absa/classifier-rest-0.2' model = absa.BertABSClassifier.from_pretrained(name) tokenizer = absa.BertTokenizer.from_pretrained(name) professor = absa.Professor(...) # Explained in detail later on. text_splitter = absa.sentencizer() # The English CNN model from SpaCy. nlp = absa.Pipeline(model, tokenizer, professor, text_splitter) # Break down the pipeline `call` method. task = nlp.preprocess(text=..., aspects=...) input_batch = nlp.encode(task.examples) output_batch = nlp.predict(input_batch) predictions = nlp.review(tokenized_examples, output_batch) completed_task = nlp.postprocess(task, predictions)
Above is an example of how to initialize the pipeline directly, and we can revise in code the process being discussed by exposing what calling the pipeline does under the hood. We have omitted a lot of insignificant details but there’s one thing we would like to highlight. The sentiment of long texts tends to be fuzzy and neutral. Therefore, you might want to split a text into smaller independent chunks, sometimes called spans. These could include just a single sentence or several sentences. It depends on how the text_splitter works. In this case, we are using the SpaCy CNN model, which splits a document into single sentences, and, as a result each sentence can then be processed independently. Note that longer spans have richer context information, so a model will have more information to consider. Please take a look at the pipeline details here.