Skip to content

Supported

📊 Supported Benchmarks

Benchmark Description Dataset File
math Mathematical problem solving math_test.jsonl
humaneval Python code generation humaneval_test.jsonl
mbpp Python programming problems mbpp_test.jsonl
drop Reading comprehension drop_test.jsonl
bbh Complex reasoning tasks bbh_test.jsonl
ifeval Instruction following ifeval_test.jsonl
aime Math competition problems aime_*_test.jsonl
mmlu_pro Multi-domain knowledge mmlu_pro_test.jsonl

🤖 Supported Agent Systems

Agent System File Description
single_agent single_agent.py Single LLM agent
supervisor_mas supervisor_mas.py Supervisor-based multi-agent system
swarm swarm.py Swarm-based agent system
agentverse agentverse.py Dynamic recruitment agent system
chateval chateval.py Debate-based multi-agent system
evoagent evoagent.py Evolutionary agent system
jarvis jarvis.py Task-planning agent system
metagpt metagpt.py Code generation agent system