SREGym Logo

SREGym

Running Your Own Agent

Guide to registering and running your custom agent in SREGym, including understanding evaluation phases and configuring task lists

Agent Registration

SREGym uses agents.yaml to register agents for execution. This is how SREGym knows which agent to run when you start the benchmark. The Stratus agent is already registered:

agents:
- name: stratus
  kickoff_command: python -m clients.stratus.stratus_agent.driver.driver --server http://localhost:8000
  kickoff_workdir: .
  kickoff_env: null

To register your own agent:

  • name: A unique identifier for your agent
  • kickoff_command: The command SREGym will execute to start your agent
  • kickoff_workdir: The working directory from which to run the command
  • kickoff_env: Optional environment variables (use null if none needed)

Add a new entry to agents.yaml following this format to register your custom agent.

Understanding Evaluation Phases

There are at most 2 phases in each problem of SREGym:

  1. Fault Diagnosis: The agent should localize where the incident originates and explain the root cause.

    Expected submission: The root cause of the incident in natural language.

  2. Incident Mitigation: The agent should try to mitigate the incident and bring the cluster back online.

    Expected submission: No arguments for mitigation problems. NOTE: Not all problems are evaluated for mitigation.

Configuring Task Lists

By default, SREGym runs the common evaluation with all available problems and tasks. If you want to run a custom evaluation with a specific subset of problems or tasks, you can configure this using tasklist.yaml.

The task list follows this format for each problem:

k8s_target_port-misconfig:
  - diagnosis
  - mitigation

If no entry exists for a problem in tasklist.yaml, all tasks will run by default. Additionally, diagnosis and mitigation may be skipped if the problem does not have a corresponding oracle attached.

Monitoring Your Agent

SREGym provides a dashboard to monitor the status of your evaluation. The dashboard runs automatically when you start the benchmark with python main.py and can be accessed at http://localhost:11451 in your web browser.

Next Steps