PseudoSeer: a Search Engine for Pseudocode
Levent Toksoz, Mukund Srinath, Gang Tan, C. Lee Giles

TL;DR
PseudoSeer is a specialized search engine designed to efficiently retrieve academic papers containing pseudocode by leveraging Elasticsearch and advanced search features, improving targeted search capabilities.
Contribution
The paper introduces PseudoSeer, a novel pseudocode search engine that integrates multiple facets and advanced ranking for precise academic paper retrieval.
Findings
Uses Elasticsearch for efficient indexing and search
Supports combined facet and exact-match queries
Employs BM25-based ranking for relevance
Abstract
A novel pseudocode search engine is designed to facilitate efficient retrieval and search of academic papers containing pseudocode. By leveraging Elasticsearch, the system enables users to search across various facets of a paper, such as the title, abstract, author information, and LaTeX code snippets, while supporting advanced features like combined facet searches and exact-match queries for more targeted results. A description of the data acquisition process is provided, with arXiv as the primary data source, along with methods for data extraction and text-based indexing, highlighting how different data elements are stored and optimized for search. A weighted BM25-based ranking algorithm is used by the search engine, and factors considered when prioritizing search results for both single and combined facet searches are described. We explain how each facet is weighted in a combined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques
