Generating Hierarchical JSON Representations of Scientific Sentences Using LLMs
Satya Sri Rajiteswari Nimmagadda, Ethan Young, Niladri Sengupta, Ananya Jana, Aniruddha Maiti

TL;DR
This paper explores the use of fine-tuned lightweight LLMs with a novel structural loss to generate hierarchical JSON representations of scientific sentences, effectively preserving their meaning for reconstruction.
Contribution
It introduces a new structural loss function for fine-tuning LLMs to produce hierarchical JSON structures from scientific sentences, enhancing information retention.
Findings
Hierarchical JSON formats effectively retain scientific sentence information.
Reconstructed sentences show high semantic and lexical similarity to originals.
The approach demonstrates potential for structured scientific text representation.
Abstract
This paper investigates whether structured representations can preserve the meaning of scientific sentences. To test this, a lightweight LLM is fine-tuned using a novel structural loss function to generate hierarchical JSON structures from sentences collected from scientific articles. These JSONs are then used by a generative model to reconstruct the original text. Comparing the original and reconstructed sentences using semantic and lexical similarity we show that hierarchical formats are capable of retaining information of scientific texts effectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
