Group Binary Search (GBS) Task
We evaluated coordination abilities using the Group Binary Search task, a common-interest game with imperfect monitoring. In GBS, players must coordinate their actions to collectively reach a target number without direct communication.
Schematic of a GBS game with three players and numerical feedback. The sum of guesses from each player is compared to the mystery number, and the players are provided feedback about the difference between the sum of their guesses and the mystery number. The players can then adjust their answer (without communicating with each other or knowing the guesses by other players), and the game continues until the sum of guesses matches the mystery number or until the limit of 15 rounds is reached.
Game Structure
- Objective: Group sum of all players' guesses must equal a mystery target number (51-100)
- Rounds: Maximum 15 rounds per game, 10 games per session
- Information: Players only know the group sum and feedback—not individual guesses
- Communication: No direct communication between players
- Individual range: Each player submits numbers between 0-50
Group Sizes Tested
- Small groups: 2-3 players
- Medium groups: 4-7 players
- Large groups: 10-17 players
Models Evaluated Against Humans
- Deepseek-V3 (671B parameters)
- Deepseek-V3.1-T (685B parameters)
- Llama 3.3 (70B parameters)
- Gemini 2.0 Flash
Feedback Types
Directional Feedback
Simple qualitative information
Examples:
- "Too high" - Group sum exceeds target
- "Too low" - Group sum below target
- "Just right" - Group sum equals target (game ends)
Players must infer the magnitude of error and adjust accordingly.
Numerical Feedback
Precise quantitative information
Examples:
- "Too high by 12" - Group sum is 12 above target
- "Too low by 8" - Group sum is 8 below target
- "Just right" - Group sum equals target (game ends)
Players know exactly how much to adjust collectively.
Experimental Design
Session Structure
- 10 games per session with alternating feedback types
- Same mystery numbers across human and LLM experiments for fair comparison
- Context preservation: LLMs maintain full history of all previous rounds and games
Experimental Conditions
- Zero-shot Prompts
- Zero-shot Chain-of-Thought (CoT) Prompts
- Mixed LLM groups: Combinations of different LLM models in the same game