Data-driven answer selection in community QA systems

Publication Type:
Journal Article
Citation:
IEEE Transactions on Knowledge and Data Engineering, 2017, 29 (6), pp. 1186 - 1198
Issue Date:
2017-06-01
Filename Description Size
07857066.pdfPublished Version1.7 MB
Adobe PDF
Full metadata record
© 1989-2012 IEEE. Finding similar questions from historical archives has been applied to question answering, with well theoretical underpinnings and great practical success. Nevertheless, each question in the returned candidate pool often associates with multiple answers, and hence users have to painstakingly browse a lot before finding the correct one. To alleviate such problem, we present a novel scheme to rank answer candidates via pairwise comparisons. In particular, it consists of one offline learning component and one online search component. In the offline learning component, we first automatically establish the positive, negative, and neutral training samples in terms of preference pairs guided by our data-driven observations. We then present a novel model to jointly incorporate these three types of training samples. The closed-form solution of this model is derived. In the online search component, we first collect a pool of answer candidates for the given question via finding its similar questions. We then sort the answer candidates by leveraging the offline trained model to judge the preference orders. Extensive experiments on the real-world vertical and general community-based question answering datasets have comparatively demonstrated its robustness and promising performance. Also, we have released the codes and data to facilitate other researchers.
Please use this identifier to cite or link to this item: