Comparing Deep Sentiment Models using Quantified Local Explanations
Online judge systems that enable the automatic evaluation of predictive models face the problem of choosing the best model among those with near equal accuracy measures. In such scenarios, interpreting competitive models may provide a better insight towards robustness. Sentiment analysis is a well- explored domain with many available approaches and libraries. Despite this, it is context-sensitive and poses many challenges to the research community and thus deems fit to our analysis. Selecting a model with the highest accuracy may not be satisfactory, especially when the decision margins are too narrow. Our comparative study on models with similar accuracies and f1-scores but distinct underlying architectures incorporate custom metrics and evaluation methods to assess their performance on a sentiment analysis task. We have proposed to include human judgment in an online judge system using a feedback acquisition mechanism, presenting explanations for model decisions on selected test cases. Our initial experimental findings indicate that evaluating model robustness by incorporating a well-defined human feedback mechanism, using model agnostic approaches encourages online judge systems to make explainable decisions. The p-value associated with a paired t-Test on the feedback collected for model preference indicates a significant preference for the model supported by our metrics.
View more info for "Comparing Deep Sentiment Models using Quantified Local Explanations"
|Journal||Data powered by Typeset2021 Smart Technologies, Communication and Robotics (STCR)|
|Publisher||Data powered by TypesetIEEE|