Negative Results of Image Processing for Identifying Duplicate Questions on Stack Overflow
Faiz Ahmed, Suprakash Datta, Maleknaz Nayebi

TL;DR
This study investigates the potential of image-based techniques to improve duplicate question detection on Stack Overflow, finding only marginal gains but establishing a foundation for future research in image-text integration.
Contribution
The paper explores the integration of image analysis into duplicate question detection, highlighting the limited impact of current methods and providing a replicable framework for future studies.
Findings
Image-based techniques yielded about 1% improvement in detection accuracy.
Text analysis alone overlooks significant visual information in questions.
The work provides a foundation for future research in image and text integration.
Abstract
In the rapidly evolving landscape of developer communities, Q&A platforms serve as crucial resources for crowdsourcing developers' knowledge. A notable trend is the increasing use of images to convey complex queries more effectively. However, the current state-of-the-art method of duplicate question detection has not kept pace with this shift, which predominantly concentrates on text-based analysis. Inspired by advancements in image processing and numerous studies in software engineering illustrating the promising future of image-based communication on social coding platforms, we delved into image-based techniques for identifying duplicate questions on Stack Overflow. When focusing solely on text analysis of Stack Overflow questions and omitting the use of images, our automated models overlook a significant aspect of the question. Previous research has demonstrated the complementary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Topic Modeling
