GitHub - Tracer-Cloud/opensre: Build your own AI SRE agents. The open source toolkit for the AI era ✨

OpenSRE: Build Your Own AI SRE Agents

The open-source framework for AI SRE agents, and the training and evaluation environment they need to improve. Connect the 40+ tools you already run, define your own workflows, and investigate incidents on your own infrastructure.

Quickstart · Docs · FAQ · Security

Why OpenSRE?

When something breaks in production, the evidence is scattered across logs, metrics, traces, runbooks, and Slack threads. OpenSRE is an open-source framework for AI SRE agents that resolve production incidents, built to run on your own infrastructure.

We do that because SWE-bench¹ gave coding agents scalable training data and clear feedback. Production incident response still lacks an equivalent.

Distributed failures are slower, noisier, and harder to simulate and evaluate than local code tasks, which is why AI SRE, and AI for production debugging more broadly, remains unsolved.

OpenSRE is building that missing layer:

an open reinforcement learning environment for agentic infrastructure incident response, with end-to-end tests and synthetic incident simulations for realistic production failures

We do that by:

building easy-to-deploy, customizable AI SRE agents for production incident investigation and response
running scored synthetic RCA suites that check root-cause accuracy, required evidence, and adversarial red herrings (tests/synthetic)
running real-world end-to-end tests across cloud-backed scenarios including Kubernetes, EC2, CloudWatch, Lambda, ECS Fargate, and Flink (tests/e2e)
keeping semantic test-catalog naming so e2e vs synthetic and local vs cloud boundaries stay obvious (tests/README.md)

Our mission is to build AI SRE agents on top of this, scale it to thousands of realistic infrastructure failure scenarios, and establish OpenSRE as the benchmark and training ground for AI SRE.

¹ https://arxiv.org/abs/2310.06770

Install

curl -fsSL https://raw.githubusercontent.com/Tracer-Cloud/opensre/main/install.sh | bash

brew install Tracer-Cloud/opensre/opensre

irm https://raw.githubusercontent.com/Tracer-Cloud/opensre/main/install.ps1 | iex

Quick Start

opensre onboard
opensre investigate -i tests/e2e/kubernetes/fixtures/datadog_k8s_alert.json
opensre update

Development

New to OpenSRE? See SETUP.md for detailed platform-specific setup instructions, including Windows setup, environment configuration, and more.

git clone https://github.com/Tracer-Cloud/opensre
cd opensre
make install
# run opensre onboard to configure your local LLM provider
# and optionally validate/save Grafana, Datadog, Honeycomb, Coralogix, Slack, AWS, GitHub MCP, and Sentry integrations
opensre onboard
opensre investigate -i tests/e2e/kubernetes/fixtures/datadog_k8s_alert.json

How OpenSRE Works

Investigation Workflow

When an alert fires, OpenSRE automatically:

Fetches the alert context and correlated logs, metrics, and traces
Reasons across your connected systems to identify anomalies
Generates a structured investigation report with probable root cause
Suggests next steps and, optionally, executes remediation actions
Posts a summary directly to Slack or PagerDuty - no context switching needed

Benchmark

Generate the benchmark report:

make benchmark

Capabilities


🔍 Structured incident investigation	Correlated root-cause analysis across all your signals
📋 Runbook-aware reasoning	OpenSRE reads your runbooks and applies them automatically
🔮 Predictive failure detection	Catch emerging issues before they page you
🔗 Evidence-backed root cause	Every conclusion is linked to the data behind it
🤖 Full LLM flexibility	Bring your own model — Anthropic, OpenAI, Ollama, Gemini, OpenRouter, NVIDIA NIM

Integrations

OpenSRE connects to 40+ tools and services across the modern cloud stack, from LLM providers and observability platforms to infrastructure, databases, and incident management.

Category	Integrations	Roadmap
AI / LLM Providers	Anthropic · OpenAI · Ollama · Google Gemini · OpenRouter · NVIDIA NIM · Bedrock
Observability	Grafana (Loki · Mimir · Tempo) · Datadog · Honeycomb · Coralogix · CloudWatch · Sentry · Elasticsearch	Splunk · New Relic · Victoria Logs
Infrastructure	Kubernetes · AWS (S3 · Lambda · EKS · EC2 · Bedrock) · GCP · Azure	Helm · ArgoCD
Database	MongoDB · ClickHouse	PostgreSQL · MySQL · MariaDB · MongoDB Atlas · Azure SQL · RDS · Snowflake
Data Platform	Apache Airflow · Apache Kafka · Apache Spark · Prefect	RabbitMQ
Dev Tools	GitHub · GitHub MCP · Bitbucket	GitLab
Incident Management	PagerDuty · Opsgenie · Jira	ServiceNow · incident.io · Alertmanager · Linear · Trello
Communication	Slack · Google Docs	Discord · Teams · WhatsApp · Confluence · Notion
Agent Deployment	Vercel · LangSmith · EC2 · ECS	Railway
Protocols	MCP · ACP · OpenClaw

Contributing

OpenSRE is community-built. Every integration, improvement, and bug fix makes it better for thousands of engineers. We actively review PRs and welcome contributors of all experience levels.

Good first issues are labeled good first issue. Ways to contribute:

🐛 Report bugs or missing edge cases
🔌 Add a new tool integration
📖 Improve documentation or runbook examples
⭐ Star the repo - it helps other engineers find OpenSRE

See CONTRIBUTING.md for the full guide.

Thanks goes to these amazing people:

_davincios	_{VaibhavUpreti}	_aliya-tracer	_arnetracer	_kylie-tracer	_paultracer
_zeel2104	_iamkalio	_w3joe	_yeoreums	_{anandgupta1202}	_rrajan94
_vrk7	_{cerencamkiran}	_edgarmb14	_lukegimza	_{ebrahim-sameh}	_shoaib050326
_venturevd	_shriyashsoni	_Devesh36	_KindaJayant	_overcastbulb	_Yashkapure06
_Davda-James	_{Abhinnavverma}	_{devankitjuneja}	_ramandagar	_mvanhorn	_{abhishek-marathe04}
_{yashksaini-coder}	_{haliaeetusvocifer}	_Bahtya	_{mayankbharati-ops}	_{harshareddy832}	_sundaram2021
_{micheal000010000-hub}	_ljivesh	_{gautamjain1503}	_mudittt

Security

OpenSRE is designed with production environments in mind:

No storing of raw log data beyond the investigation session
All LLM calls use structured, auditable prompts
Log transcripts are kept locally - never sent externally by default

See SECURITY.md for responsible disclosure.

Telemetry

opensre collects anonymous usage statistics with Posthog to help us understand adoption and demonstrate traction to sponsors and investors who fund the project. What we collect: command name, success/failure, rough runtime, CLI version, Python version, OS family, machine architecture, and a small amount of command-specific metadata such as which subcommand ran. For opensre onboard and opensre investigate, we may also collect the selected model/provider and whether the command used flags such as --interactive or --input.

A randomly generated anonymous ID is created on first run and stored in ~/.config/opensre/. We never collect alert contents, file contents, hostnames, credentials, or any personally identifiable information.

Telemetry is automatically disabled in GitHub Actions and pytest runs.

To opt out locally, set the environment variable before running:

export OPENSRE_NO_TELEMETRY=1

The legacy alias OPENSRE_ANALYTICS_DISABLED=1 also still works.

To inspect the payload locally without sending anything, use:

export OPENSRE_TELEMETRY_DEBUG=1

License

Apache 2.0 - see LICENSE for details.

Citations

¹ https://arxiv.org/abs/2310.06770

Name		Name	Last commit message	Last commit date
Latest commit History 1,075 Commits
.cursor		.cursor
.github		.github
app		app
docs		docs
packaging		packaging
tests		tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.tool-versions		.tool-versions
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYEMENT.md		DEPLOYEMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
SETUP.md		SETUP.md
docker-compose.database.yml		docker-compose.database.yml
docker-compose.testing.yml		docker-compose.testing.yml
install.ps1		install.ps1
install.sh		install.sh
langgraph.json		langgraph.json
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenSRE: Build Your Own AI SRE Agents

Why OpenSRE?

Install

Quick Start

Development

How OpenSRE Works

Investigation Workflow

Benchmark

Capabilities

Integrations

Contributing

Security

Telemetry

License

Citations

About

Uh oh!

Releases 9

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenSRE: Build Your Own AI SRE Agents

Why OpenSRE?

Install

Quick Start

Development

How OpenSRE Works

Investigation Workflow

Benchmark

Capabilities

Integrations

Contributing

Security

Telemetry

License

Citations

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages