Welcome to MASArena 🏟️ !¶

A comprehensive framework for benchmarking single and multi-agent systems across a wide range of tasks—evaluating performance, accuracy, and efficiency with built-in visualization and tool integration.

MASArena

🌟 Core Features¶

🧱 Modular Design: Swap agents, tools, datasets, prompts, and evaluators with ease.
📦 Built-in Benchmarks: Single/multi-agent datasets for direct comparison.
📊 Visual Debugging: Inspect interactions, accuracy, and tool use.
🔧 Tool Support: Manage tool selection via pluggable wrappers.
🧩 Easy Extensions: Add agents via subclassing—no core changes.
📂 Paired Datasets & Evaluators: Add new benchmarks with minimal effort.
⏱️ Real-time Monitoring with AgentOps: Track agent calls and costs instantly using AgentOps integration.

🙌 Contributing¶

We warmly welcome contributions from the community!

You can contribute in many ways:

🧠 New Agent Systems (MAS): Add novel single- or multi-agent systems to expand the diversity of strategies and coordination models.
📊 New Benchmark Datasets: Bring in domain-specific or task-specific datasets (e.g., reasoning, planning, tool-use, collaboration) to broaden the scope of evaluation.
🛠 New Tools & Toolkits: Extend the framework's tool ecosystem by integrating domain tools (e.g., search, calculators, code editors) and improving tool selection strategies.
⚙️ Improvements & Utilities: Help with performance optimization, failure handling, asynchronous processing, or new visualizations.