A restriction-free process of creating model reasoning can build models that are both powerful and fragile at the same time. It is a major advantage that a model can infer the logic behind the transformation directly from data. However, the discovered correlations (vague and intricate to human beings) can generalize a given task well but not necessarily a general problem. As a result, it is totally natural that a model behavior may not be consistent with the expected behavior.
This is a fundamental problem because adverse correlations that encode the task specifics can cause unpredictable model behavior on data even slightly different than that used in training. In contrast to SOTA models, a model working in a real-world application is exposed to data which changes in time. If we do not test and explain model behaviors (and do not fix model weaknesses), the performance of the served model can be unstable and will likely start to decline. In addition, it might even be hard to notice when a model is useless, and stops working at all (a distribution of predictions might remain unchanged).
In this article, we have highlighted the value of testing and explaining model behaviors. Tests can quickly reveal the severe limitations of the SOTA model e.g. an aspect condition does not work as one might expect (works as a feature). We would be in trouble if we tried to use this model architecture in a real-world application. In general, it’s hard to trust model reliability if there are no tests that examine model behaviors, especially in cases wherein a huge model is fine-tuned on a modest dataset (as it is with down-stream NLP tasks).
Testing assures us that the model works properly at the time of the deployment. It is worth monitoring the model once it’s served. Explanations of model decisions are extremely helpful to that end. We may benefit from explanations to understand an individual prediction but also to infer and monitor the general characteristics of model reasoning. Even rough explanations can help in recognizing alarming behaviors. This article has given examples of how to have more control over the ML model. It is extremely important to keep an eye on model behaviors.