Skip to content

Extending a Multi-Agent System

Implementation Logic

  1. Extend AgentSystem base class
  2. Initialize with name and config
  3. Store workers dictionary

  4. Implement _create_agents method

  5. Create specialized AgentNode instances
  6. Set agent names, models, and prompts
  7. Return dictionary of agents

  8. Implement run_agent method

  9. Get agents from _create_agents
  10. Implement agent interaction logic
  11. Return messages and final answer

  12. Register system with AgentSystemRegistry

Example MAS Implementation

from mas_arena.agents.base import AgentSystem, AgentSystemRegistry

class SimpleMAS(AgentSystem):
    def __init__(self, name: str = "simple_mas", config: Dict[str, Any] = None):
        super().__init__(name, config if config else {})
        self.workers = None

    def _create_agents(self, problem_input: Optional[Any] = None) -> Dict[str, AgentNode]:
        agent = AgentNode(
            name="solver",
            model_name=self.config.get("model_name", "gpt-4"),
            prompt=f"Solve the problem: {self.format_prompt}"
        )
        return {"solver": agent}

    def run_agent(self, problem: Dict[str, Any], **kwargs) -> Dict[str, Any]:
        workers = self._create_agents(problem["problem"])
        response = workers["solver"](problem["problem"])
        return {
            "messages": [response],
            "final_answer": response.content
        }


AgentSystemRegistry.register("simple_mas", SimpleMAS)

Extending an Evaluator

  1. Extend BaseEvaluator class
  2. Initialize with name and config
  3. Set up data and log paths
  4. Configure logging

  5. Implement evaluate method

  6. Process problem and run result
  7. Extract final answer from messages
  8. Calculate score
  9. Return evaluation results

  10. Implement verify_answer method (optional)

  11. Compare prediction with reference
  12. Return boolean result

  13. Register evaluator with register_benchmark decorator

  14. Specify evaluator name
  15. Define normalization keys

Example Evaluator Implementation

from mas_arena.evaluators.base_evaluator import BaseEvaluator
from mas_arena.evaluators.registry import register_benchmark

@register_benchmark(
   name="simple",
   normalization_keys={
      "id": "id",
      "problem": "problem",
      "solution": "solution"
   }
)
class SimpleEvaluator(BaseEvaluator):
   def __init__(self, name: str, config: Dict[str, Any] = None):
      super().__init__(name, config)

   def evaluate(self, problem: Dict[str, Any], run_result: Dict[str, Any]) -> Dict[str, Any]:
      final_answer = run_result.get("final_answer", "")
      score = 1 if final_answer == problem["solution"] else 0
      return {
         "final_answer": final_answer,
         "score": score
      }

   def verify_answer(self, prediction: str, reference: str) -> bool:
      return prediction == reference