Welcome to MASArena ๐๏ธ !¶
A comprehensive framework for benchmarking single and multi-agent systems across a wide range of tasksโevaluating performance, accuracy, and efficiency with built-in visualization and tool integration.
๐ Core Features¶
- ๐งฑ Modular Design: Swap agents, tools, datasets, prompts, and evaluators with ease.
- ๐ฆ Built-in Benchmarks: Single/multi-agent datasets for direct comparison.
- ๐ Visual Debugging: Inspect interactions, accuracy, and tool use.
- ๐ง Tool Support: Manage tool selection via pluggable wrappers.
- ๐งฉ Easy Extensions: Add agents via subclassingโno core changes.
- ๐ Paired Datasets & Evaluators: Add new benchmarks with minimal effort.
๐ Contributing¶
We warmly welcome contributions from the community!
You can contribute in many ways:
-
๐ง New Agent Systems (MAS): Add novel single- or multi-agent systems to expand the diversity of strategies and coordination models.
-
๐ New Benchmark Datasets: Bring in domain-specific or task-specific datasets (e.g., reasoning, planning, tool-use, collaboration) to broaden the scope of evaluation.
-
๐ New Tools & Toolkits: Extend the framework's tool ecosystem by integrating domain tools (e.g., search, calculators, code editors) and improving tool selection strategies.
-
โ๏ธ Improvements & Utilities: Help with performance optimization, failure handling, asynchronous processing, or new visualizations.