Matchertext: Towards Verbatim Interlanguage Embedding
Bryan Ford

TL;DR
Matchertext introduces a syntactic discipline that allows safe, straightforward embedding of text from one language into another without escaping or obfuscation, simplifying cross-language text embedding tasks.
Contribution
It proposes a novel cross-language embedding method called matchertext that simplifies embedding without escaping, applicable to languages like HTML, URI, and JavaScript.
Findings
Enables safe embedding without escaping or obfuscation
Applied to HTML, URI, JavaScript for practical benefits
Developed MinML as an example language using matchertext
Abstract
Embedding text in one language within text of another is commonplace for numerous purposes, but usually requires tedious and error-prone "escaping" transformations on the embedded string. We propose a simple cross-language syntactic discipline, matchertext, which enables the safe embedding a string in any compliant language into a string in any other language via simple "copy-and-paste" - in particular with no escaping, obfuscation, or expansion of embedded strings. We apply this syntactic discipline to several common and frequently-embedded language syntaxes such as URIs, HTML, and JavaScript, exploring the benefits, costs, and compatibility issues in adopting the proposed matchertext discipline. One early matchertext-based language is MinML, a concise but general alternative syntax for writing HTML or XML.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Web Application Security Vulnerabilities · Software Engineering Research
