Goals and Beliefs in Language Model Agents
This project combines interpretability methods with behavioural evaluations to investigate how goals and beliefs are represented in language model agents, and whether these representations can be reliably extracted and manipulated. Funded by Coefficient Giving.
Project Telos: Modelling, Measuring, and Intervening on Goal-directed Behaviour in AI systems
Project Telos develops a general framework for detecting and measuring goal-directedness in AI sytems agents—an essential step for solving the alignment problem—by combining behavioural and representational analyses. Our aim is to enable high-confidence claims about which goals an AI is pursuing, and how consistently it acts towards them. Funded by SPAR and Cohere.
LM Playschool Workshop & Challenge
The LM Playschool Workshop and Challenge invites submissions on language agents that learn, adapt, and improve through situated interaction, with a focus on conversational, collaborative, goal-oriented, and multi-turn environments. A collaboration between ELLIS Units at UCL, Edinburgh, Amsterdam, Potsdam, Saarland, Bozen-Bolzano, and Amazon.