SREGym Logo

SREGym

Introduction

A unified platform to enable the design, development, and evaluation of AI agents for Site Reliability Engineering (SRE).

SRE Problems

Problems in SREGym consist of three components: an application, a fault, and an oracle. When evaluating a problem, SREGym first deploys the application specified in the problem. After deployment, the fault is injected into the system to cause the incident. Then, SREGym begins evaluating the agent and uses the oracle as the ground truth for the problem's solution.

See our registry for a complete list of problems.

SREGym is built to be extensible, we always welcome new contributions. See Contributing to get started.

Getting Started

Using SREGym

Development

SREGym is built to be extensible, we always welcome new contributions. See Contributing to get started.