Meta-Referential Games to Learn Compositional Learning Behaviours

Kevin Denamgana\"i; Sondess Missaoui; and James Alfred Walker

arXiv:2207.08012·cs.CL·December 20, 2023·1 cites

Meta-Referential Games to Learn Compositional Learning Behaviours

Kevin Denamgana\"i, Sondess Missaoui, and James Alfred Walker

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces Meta-Referential Games and the Symbolic Behaviour Benchmark to evaluate and advance artificial agents' ability to learn compositional behaviors and solve the binding problem, inspired by human cognition.

Contribution

It presents a novel meta-learning framework for referential games, creating a domain-agnostic benchmark to assess agents' compositional learning abilities and address the binding problem.

Findings

01

Baseline results demonstrate the benchmark's challenge level.

02

Error analysis highlights current limitations in agents' compositional generalization.

03

The framework encourages development of more capable artificial agents.

Abstract

Human beings use compositionality to generalise from past experiences to novel experiences. We assume a separation of our experiences into fundamental atomic components that can be recombined in novel ways to support our ability to engage with novel experiences. We frame this as the ability to learn to generalise compositionally, and we will refer to behaviours making use of this ability as compositional learning behaviours (CLBs). A central problem to learning CLBs is the resolution of a binding problem (BP). While it is another feat of intelligence that human beings perform with ease, it is not the case for state-of-the-art artificial agents. Thus, in order to build artificial agents able to collaborate with human beings, we propose to develop a novel benchmark to investigate agents' abilities to exhibit CLBs by solving a domain-agnostic version of the BP. We take inspiration from the…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 2

Strengths

This benchmark introduces a challenging and important problem for current methods. Experiments support the claim of the paper.

Weaknesses

It is hard to assess the novelty as it is stated in the paper that there is a previous version of the benchmark (published ?). The paper is hard-to-follow and often confusing.

Reviewer 02Rating 5Confidence 3

Strengths

Relevance: I think this paper tries to address an important problem in that it proposes a benchmark in which succesful behavior means that agents learned to generalize compositionally, instead of only having learned to generalize compositionally. The problem of compositionality and compositional generalization is of general interest to the community. Thus, this benchmark might be generally useful. Novelty: The introduction of S2B and Meta-RGs adds depth to the compositionality field by pushing

Weaknesses

Accessibility: The meta-learning setup, combined with the specialized SCS representation, might limit accessibility and reproducibility. The SCS's construction, particularly the Gaussian kernel setup, could be further detailed, I didn't quite get what was going on there. The writing is generally quite verbose and I had really some difficulties in following along. That there are many abbreviations throughout doesn't really help here either. Some of the figures are very small an complicated to rea

Reviewer 03Rating 6Confidence 2

Strengths

- The paper introduces the S2B benchmark, designed to evaluate the combinatorial learning behaviors (CLBs) of AI models. - It proposes the SCS method for representing stimuli in a domain-independent manner, avoiding reliance on specific modalities like visual, verbal, or auditory information. - Meta-Referential Games are presented as the primary framework within the S2B benchmark, aiming to assess agents' capabilities in symbolic learning and combinatorial learning behaviors (CLBs).

Weaknesses

- Insufficient validation of domain-agnostic BP. While the S2B benchmark and meta-referential game frameworks intend to construct domain-agnostic BP, there is a lack of sufficient experimental data to validate their applicability in various domains or applications. Whether this benchmark and framework can be extended to different fields such as vision and language still needs to be further verified. - Terminology and lack of concrete examples: The paper contains a large number of terms (such as

Code & Models

Repositories

Near32/Regym/tree/develop/benchmark/R2D2/SymbolicBehaviourBenchmark
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsTest · Recurrent Replay Distributed DQN