Skip to content

Welcome to MASArena ๐ŸŸ๏ธ !

A comprehensive framework for benchmarking single and multi-agent systems across a wide range of tasksโ€”evaluating performance, accuracy, and efficiency with built-in visualization and tool integration.

MASArena

๐ŸŒŸ Core Features

  • ๐Ÿงฑ Modular Design: Swap agents, tools, datasets, prompts, and evaluators with ease.
  • ๐Ÿ“ฆ Built-in Benchmarks: Single/multi-agent datasets for direct comparison.
  • ๐Ÿ“Š Visual Debugging: Inspect interactions, accuracy, and tool use.
  • ๐Ÿ”ง Tool Support: Manage tool selection via pluggable wrappers.
  • ๐Ÿงฉ Easy Extensions: Add agents via subclassingโ€”no core changes.
  • ๐Ÿ“‚ Paired Datasets & Evaluators: Add new benchmarks with minimal effort.

๐Ÿ™Œ Contributing

We warmly welcome contributions from the community!

You can contribute in many ways:

  • ๐Ÿง  New Agent Systems (MAS): Add novel single- or multi-agent systems to expand the diversity of strategies and coordination models.

  • ๐Ÿ“Š New Benchmark Datasets: Bring in domain-specific or task-specific datasets (e.g., reasoning, planning, tool-use, collaboration) to broaden the scope of evaluation.

  • ๐Ÿ›  New Tools & Toolkits: Extend the framework's tool ecosystem by integrating domain tools (e.g., search, calculators, code editors) and improving tool selection strategies.

  • โš™๏ธ Improvements & Utilities: Help with performance optimization, failure handling, asynchronous processing, or new visualizations.