[AWS] Introduce initial alert rule templates#15346
Conversation
|
💚 CLA has been signed |
ef16f46 to
bbb5db6
Compare
|
@gpop63 : The template will be usable from 9.2 onwards . Can you please share a screenshot of how a particular alert looks like. Also, are we not adding any information about alert support in the README's ? |
| "type": "alerting_rule_template", | ||
| "attributes": { | ||
| "name": "EC2 High CPU Utilization", | ||
| "tags": [], |
There was a problem hiding this comment.
What tags were you thinking of, Muthu?
There was a problem hiding this comment.
The tags can have the service name and the Alert metrics name. Similar to what I have added here in Azure AI Foundry.
e.g., [AWS EC2, AWS EC2 CPU Utilization].
| @@ -0,0 +1,37 @@ | |||
| { | |||
| "id": "b6513de4-6c36-499a-8f0a-98431cd4dbee", | |||
There was a problem hiding this comment.
Should the id match with the file name of the rule_template?
Error: defines non-matching ID
|
@ishleenk17 right now the support is not fully there we only see them under assets and in saved objects
|
| "groupBy": "all", | ||
| "termSize": 5, | ||
| "sourceFields": [], | ||
| "timeField": "event.ingested", |
There was a problem hiding this comment.
Can the time field be @timestamp? Is there a reason for choosing event.ingested instead of @timestamp?
There was a problem hiding this comment.
I tried using the @timestamp field but it wasn't generating alerts. For some AWS data streams @timestamp is when the actual metric happened in AWS.
| "esql": "FROM metrics-aws.ec2_metrics-default\n| STATS cpuutilization=avg(aws.ec2.metrics.CPUUtilization.avg) by cloud.account.id, cloud.region, aws.dimensions.InstanceId\n| WHERE cpuutilization >= 80" | ||
| }, | ||
| "aggType": "count", | ||
| "groupBy": "all", |
There was a problem hiding this comment.
Is this groupBy not applicable while using ESQL query?
There was a problem hiding this comment.
The group by of actual data happens in the esql query itself, this has to be a property of the alert.
| "thresholdComparator": ">", | ||
| "size": 100, | ||
| "esqlQuery": { | ||
| "esql": "FROM metrics-aws.ec2_metrics-default\n| STATS cpuutilization=avg(aws.ec2.metrics.CPUUtilization.avg) by cloud.account.id, cloud.region, aws.dimensions.InstanceId\n| WHERE cpuutilization >= 80" |
There was a problem hiding this comment.
Applying dataset filter help fetch only the specific data for the alerting metrics. WDYT?
There was a problem hiding this comment.
How do we do that? also this esql query targets documents from a specific data stream/index (metrics-aws.ec2_metrics-default)
There was a problem hiding this comment.
We can ignore this as we directly target against specific datastream.
| "searchType": "esqlQuery", | ||
| "timeWindowSize": 5, | ||
| "timeWindowUnit": "m", | ||
| "threshold": [ |
There was a problem hiding this comment.
Similar to groupby. Check whether the threshold value is applied directly from ESQL query and not from here.
There was a problem hiding this comment.
The threshold is set in the esql query, this is a different property of the alert.
Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
🚀 Benchmarks reportTo see the full report comment with |
|
There was a problem hiding this comment.
Where do we declare which service (entity) this alert template applies to? Something like resource : aws.ec2
There was a problem hiding this comment.
I have included the service name in the name of the alert rule template. I suppose Kibana should allow us to filter by tags or by partial matches on the title of the alert rule template.
| "schedule": { | ||
| "interval": "1m" | ||
| }, |
There was a problem hiding this comment.
This is applicable for all the configurations.
Should we keep this so frequently? I suggest, this be equal to the default period value for metrics ingestion. Following so, it helps to avoid any no-data found alert (when user decides to extend the configuration)
There was a problem hiding this comment.
Should we set timeWindowSize to match the integration period? That way, for example, every 5 minutes we’d check for alerts in documents from the past 5 minutes.
There was a problem hiding this comment.
I think, thats a resonable thing to do. The impact I assume here will be that instead of an alert being notified at the period + 1m interval, the alert will be notified at 2 x period internal. Here period is 5m for most AWS servies.
@tommyers-elastic , what would be your recommendation?
There was a problem hiding this comment.
i don't think we have any way to couple configs in agent policy templates with these rule configurations, so whatever we choose will have to be always added by hand.
my only thinking here is that it doesn't make sense to run a rule more frequently than the integration collection period. matching the rule frequency with the collection period seems sensible to me.
There was a problem hiding this comment.
it's a shame there's no way to put hints in the form such that we could have something that shows up and says "should match the integration collection period" or something. if we think it's worthwhile we could suggest this as a feature.
packages/aws/manifest.yml
Outdated
| subscription: basic | ||
| kibana: | ||
| version: "^8.19.0 || ^9.1.0" | ||
| version: "^9.2.1" |
There was a problem hiding this comment.
@elastic/security-service-integrations team, This feature is supported starting from 9.2.1 release version. The minimum stack version gets upgraded to 9.2.1. Since AWS integrations involve co-ownership, Could you confirm if the stack version upgrade is fine with the integrations managed by security team?
There was a problem hiding this comment.
Ad discussed elsewhere, I think this version constraint should be left unchanged. The alerting rule template files will be built into the package and they will be installed and used on stack versions that support them
|
/test |
|
/test |
|
Sharing a suggestion here: Could we follows a more structured comment style - helping user to identify the purpose, default value, condition, group-by information with easily and make suggestions easier? When followed, these descriptions could be combined with other platform capabilities including AI assistant (in future, if & when needed) Examples: |
|
/test |
|
@agithomas - Apart from version dependency, Can you help with the review and approval if everything looks good? |
agithomas
left a comment
There was a problem hiding this comment.
@agithomas - Apart from version dependency, Can you help with the review and approval if everything looks good?
LGTM from the alerts configuration. Looking forward to have a common agreement on the version dependency before proceeding.
|
/test |
1 similar comment
|
/test |
💚 Build Succeeded
History
cc @gpop63 |
|
Package aws - 5.4.0 containing this change is available at https://epr.elastic.co/package/aws/5.4.0/ |



Overview
This PR introduces the first set of alert rule templates for key AWS data streams. For each stream, we selected the two most critical metrics to monitor.
ec2_metricslambdasqssnsChecklist
changelog.ymlfile.Author's Checklist
How to test this PR locally
Related issues
Screenshots