Supported¶

📊 Supported Benchmarks¶

Benchmark	Description	Dataset File
`math`	Mathematical problem solving	`math_test.jsonl`
`humaneval`	Python code generation	`humaneval_test.jsonl`
`mbpp`	Python programming problems	`mbpp_test.jsonl`
`drop`	Reading comprehension	`drop_test.jsonl`
`bbh`	Complex reasoning tasks	`bbh_test.jsonl`
`ifeval`	Instruction following	`ifeval_test.jsonl`
`aime`	Math competition problems	`aime_*_test.jsonl`
`mmlu_pro`	Multi-domain knowledge	`mmlu_pro_test.jsonl`

Agent System	File	Description
`single_agent`	`single_agent.py`	Single LLM agent
`supervisor_mas`	`supervisor_mas.py`	Supervisor-based multi-agent system
`swarm`	`swarm.py`	Swarm-based agent system
`agentverse`	`agentverse.py`	Dynamic recruitment agent system
`chateval`	`chateval.py`	Debate-based multi-agent system
`evoagent`	`evoagent.py`	Evolutionary agent system
`jarvis`	`jarvis.py`	Task-planning agent system
`metagpt`	`metagpt.py`	Code generation agent system