LoFTI: Localization and Factuality Transfer to Indian Locales

Sona Elza Simon (1); Soumen Kumar Mondal (1); Abhishek Singhania (2),; Sayambhu Sen (2); Preethi Jyothi (1) ((1) Indian Institute of Technology; Bombay; (2) Amazon Alexa)

arXiv:2407.11833·cs.CL·July 17, 2024

LoFTI: Localization and Factuality Transfer to Indian Locales

Sona Elza Simon (1), Soumen Kumar Mondal (1), Abhishek Singhania (2),, Sayambhu Sen (2), Preethi Jyothi (1) ((1) Indian Institute of Technology, Bombay, (2) Amazon Alexa)

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces LoFTI, a benchmark for evaluating how well large language models can transfer factual knowledge to Indian locales, revealing biases and localization capabilities in models like GPT-4.

Contribution

The paper presents LoFTI, a novel benchmark for assessing localization and factual transfer in LLMs, and evaluates multiple models on this new dataset.

Findings

01

Models show bias and skewed results in localized factual accuracy.

02

LoFTI effectively measures localization and factual transfer capabilities.

03

GPT-4 and others exhibit varying performance across hyperlocal levels.

Abstract

Large language models (LLMs) encode vast amounts of world knowledge acquired via training on large web-scale datasets crawled from the internet. However, these datasets typically exhibit a geographical bias towards English-speaking Western countries. This results in LLMs producing biased or hallucinated responses to queries that require answers localized to other geographical regions. In this work, we introduce a new benchmark named LoFTI (Localization and Factuality Transfer to Indian Locales) that can be used to evaluate an LLM's localization and factual text transfer capabilities. LoFTI consists of factual statements about entities in source and target locations; the source locations are spread across the globe and the target locations are all within India with varying degrees of hyperlocality (country, states, cities). The entities span a wide variety of categories. We use LoFTI to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

csalt-research/lofti
noneOfficial

Datasets

sonasimon/LoFTI
dataset· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDisaster Management and Resilience

MethodsAttention Is All You Need · Residual Connection · Adam · Dropout · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Position-Wise Feed-Forward Layer