The explanations are only useful if they are correct. To form the basic pattern recognizer, we have made several assumptions (prior beliefs), therefore we should be careful about interpreting the explanations too literally. Even if the attention values have thought-provoking properties – for example, they encode rich linguistic relationships – there is no proven chain of causation. There are a lot of articles that illustrate various concerns why drawing conclusions about model reasoning directly from attention values might be misleading [1, 2, 3]. Even if the patterns seem to be reasonable, critical thinking is the key. We need a quantitative analysis to truly measure how correct the explanations are. Unfortunately, as with the training, the evaluation of a pattern recognizer is tough due to the fact that the true patterns are unknown. As a result, we are forced to validate only selected properties of the predicted explanations. To keep this article concise, we have covered solely two tests but there is much more to do to assess the reliability of explanations.