Maximizing Use-Case Specificity through Precision Model Tuning

Pranjali Awasthi; David Recio-Mitter; Yosuke Kyle Sugi

arXiv:2212.14206·cs.CL·January 2, 2023·1 cites

Maximizing Use-Case Specificity through Precision Model Tuning

Pranjali Awasthi, David Recio-Mitter, Yosuke Kyle Sugi

PDF

Open Access

TL;DR

This paper analyzes how fine-tuning transformer-based language models on biomedical datasets can significantly improve their relevance, accuracy, and interpretability for domain-specific information retrieval tasks, especially with smaller models.

Contribution

It provides a comparative analysis of four transformer models, highlighting the benefits of domain-specific fine-tuning and model size considerations for biomedical information retrieval.

Findings

01

Smaller models (<10B parameters) outperform larger models on specific biomedical questions.

02

Fine-tuning on domain-specific data improves relevance and interpretability.

03

Larger models perform better on broader prompts.

Abstract

Language models have become increasingly popular in recent years for tasks like information retrieval. As use-cases become oriented toward specific domains, fine-tuning becomes default for standard performance. To fine-tune these models for specific tasks and datasets, it is necessary to carefully tune the model's hyperparameters and training techniques. In this paper, we present an in-depth analysis of the performance of four transformer-based language models on the task of biomedical information retrieval. The models we consider are DeepMind's RETRO (7B parameters), GPT-J (6B parameters), GPT-3 (175B parameters), and BLOOM (176B parameters). We compare their performance on the basis of relevance, accuracy, and interpretability, using a large corpus of 480000 research papers on protein structure/function prediction as our dataset. Our findings suggest that smaller models, with <10B…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · BLOOM · Cosine Annealing · Linear Warmup With Cosine Annealing · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Layer Normalization