Test Workflow in GitHub Actions¶

Table of Contents¶

Testing Framework
GitHub Actions CI/CD
Running Tests Locally
Code Quality Standards
Contributing Workflow

Testing Framework¶

Framework Choice¶

We use pytest as our primary testing framework because: - Excellent support for async testing with pytest-asyncio - Rich fixture system for test setup and teardown - Comprehensive assertion introspection - Extensive plugin ecosystem - Built-in coverage reporting

Test Structure¶

Our test suite is organized into the following categories:

tests/
├── conftest.py              # Shared fixtures and configuration
├── test_agents.py           # Agent system tests
├── test_evaluators.py       # Evaluator and benchmark tests
└── test_benchmark_runner.py # Benchmark runner integration tests

Key Test Cases¶

Our test suite covers:

Agent Systems: Creation, configuration, and evaluation interfaces
Evaluators: Math, GSM8K, and other benchmark evaluators
Benchmark Runner: Problem processing, result aggregation, and error handling
Registry Systems: Agent and benchmark registration
Utilities: JSON serialization, file handling, and metrics collection

GitHub Actions CI/CD¶

Workflow Overview¶

Our CI/CD pipeline (.github/workflows/test.yml) automatically runs on: - Push to main and develop branches - Pull requests targeting main and develop branches

Workflow Jobs¶

1. Test Job¶

Matrix Strategy: Tests run on Python 3.11 and 3.12

Steps: 1. Checkout: Get the latest code 2. Python Setup: Install specified Python version 3. Dependency Caching: Cache pip dependencies for faster builds 4. Install Dependencies: Install project and test dependencies 5. Environment Setup: Set PYTHONPATH and test mode flags 6. Unit Tests: Run individual test files with verbose output 7. Coverage Tests: Generate coverage reports 8. Upload Coverage: Send coverage data to Codecov (Python 3.11 only)

2. Lint Job¶

Purpose: Ensure code quality and formatting consistency

Steps: 1. Ruff Linting: Check code style and potential issues 2. Format Checking: Verify code formatting (non-blocking)

Caching Strategy¶

We implement dependency caching to improve build performance: - Cache Key: Based on requirements.txt hash - Cache Path: ~/.cache/pip - Fallback: OS-specific pip cache

Environment Variables¶

PYTHONPATH: Ensures proper module imports
MAS_ARENA_TEST_MODE: Enables test-specific configurations
API keys are mocked in test environment

Running Tests Locally¶

Basic Test Execution¶

# Run all tests
pytest

# Run specific test file
pytest tests/test_agents.py -v

# Run tests with coverage
pytest --cov=mas_arena --cov-report=html

# Run only unit tests
pytest -m unit

# Run async tests
pytest -m asyncio

Test Configuration¶

Our pytest.ini configuration: - Test Discovery: test_*.py files, Test* classes, test_* functions - Markers: slow, integration, unit, asyncio - Warnings: Filtered deprecation warnings - Options: Strict marker enforcement, quiet output

Debugging Tests¶

# Verbose output with full traceback
pytest -v --tb=long

# Stop on first failure
pytest -x

# Run specific test method
pytest tests/test_agents.py::TestAgentCreation::test_create_single_agent

Code Quality Standards¶

Linting with Ruff¶

We use Ruff for fast Python linting and formatting:

# Check code style
ruff check mas_arena/

# Format code
ruff format mas_arena/

# Check formatting without changes
ruff format --check mas_arena/

Configuration¶

Line Length: 120 characters
Target Version: Python 3.11+
Rules: Standard Python style guidelines

Coverage Requirements¶

Maintain test coverage above 80%
New features must include corresponding tests
Critical paths require comprehensive test coverage

Contributing Workflow¶

Before Submitting a Pull Request¶

Run Tests Locally:
```
pytest tests/ -v
```

Check Code Quality:

ruff check mas_arena/
ruff format mas_arena/

Verify Coverage:

pytest --cov=mas_arena --cov-report=term-missing

Pull Request Process¶

Create Feature Branch: git checkout -b feature/your-feature-name
Make Changes: Implement your feature with tests
Test Locally: Ensure all tests pass
Commit Changes: Use descriptive commit messages
Push Branch: git push origin feature/your-feature-name
Create PR: Submit pull request with description
CI Validation: Wait for GitHub Actions to pass
Code Review: Address reviewer feedback
Merge: Squash and merge after approval

Writing New Tests¶

When adding new functionality:

Create Test File: Follow naming convention test_*.py
Use Fixtures: Leverage shared fixtures from conftest.py
Mock External Dependencies: Use unittest.mock for API calls
Test Edge Cases: Include error conditions and boundary cases
Add Markers: Use appropriate pytest markers
Document Tests: Include docstrings for complex test scenarios

Example Test Structure¶

"""Tests for new feature."""

import pytest
from unittest.mock import Mock, patch

from mas_arena.your_module import YourClass


class TestYourClass:
    """Test your new class functionality."""

    def test_basic_functionality(self, sample_config):
        """Test basic functionality with valid input."""
        instance = YourClass(sample_config)
        result = instance.method()
        assert result is not None

    @pytest.mark.asyncio
    async def test_async_method(self, sample_config):
        """Test async method execution."""
        instance = YourClass(sample_config)
        result = await instance.async_method()
        assert result["status"] == "success"

    def test_error_handling(self, sample_config):
        """Test error handling for invalid input."""
        instance = YourClass(sample_config)
        with pytest.raises(ValueError):
            instance.method(invalid_input=True)