Conversation
|
|
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: 36010c2 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e8969a6cdb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| logger.info(f"Observed _dd.agent_psr values (ramp phase): {sorted(agent_psr_values)}") | ||
|
|
||
| assert any(abs(v - LOW_RATE) < 0.01 for v in agent_psr_values), ( |
There was a problem hiding this comment.
Avoid requiring low rate in post-cutoff span set
This assertion can fail for a correctly implemented tracer because _spans_before_ramp is captured immediately after the first observed low-rate span, so the post-cutoff window may legitimately contain only ramped values (e.g., if no additional low-rate requests are emitted after the cutoff). In that case CI reports a regression even though capped increase behavior is correct, making the new test flaky across tracer flush timings.
Useful? React with 👍 / 👎.
brettlangdon
left a comment
There was a problem hiding this comment.
manifests/python.yml change lgtm
When the trace-agent is restarted, a rate of 100% is initially provided by the trace-agent, increasing dramatically the number of traces sampled. A rate could go suddenly from 0.1% to 100% and back to 0.1% when the trace-agent eventually computes the new sampling rate. In particular it is observed that when the agent restarts, the payload buffering that waits for new container tags breaches its memory limit and we send spans without container tags. This PR applies a limit of sampling rate increases of x2 every 1s resulting in a x10 completed every 3-4s 1->100% takes 7s 0.1 -> 100% takes 10s Matching system-test: DataDog/system-tests#6412 Below is a screen of the before/after of the [dd-trace-go implementation](#4488) with go_span_new using the PR tracer and go_spam_old using the latest release of dd-trace-go and both applications generating 500 traces/s. Notice how the new code does not burst in throughput <img width="671" height="242" alt="Screenshot 2026-03-06 at 14 46 31" src="https://github.com/user-attachments/assets/df1496dd-1213-40d3-9d12-ad886052fc47" /> ### Motivation <!-- * What inspired you to submit this pull request? * Link any related GitHub issues or PRs here. * If this resolves a GitHub issue, include "Fixes #XXXX" to link the issue and auto-close it on merge. --> ### Reviewer's Checklist <!-- * Authors can use this list as a reference to ensure that there are no problems during the review but the signing off is to be done by the reviewer(s). --> - [ ] Changed code has unit tests for its functionality at or near 100% coverage. - [ ] [System-Tests](https://github.com/DataDog/system-tests/) covering this feature have been added and enabled with the va.b.c-dev version tag. - [ ] There is a benchmark for any new code, or changes to existing code. - [ ] If this interacts with the agent in a new way, a system test has been added. - [ ] New code is free of linting errors. You can check this by running `make lint` locally. - [ ] New code doesn't break existing tests. You can check this by running `make test` locally. - [ ] Add an appropriate team label so this PR gets put in the right place for the release notes. - [ ] All generated files are up to date. You can check this by running `make generate` locally. - [ ] Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild. Make sure all nested modules are up to date by running `make fix-modules` locally. Unsure? Have a question? Request a review! Co-authored-by: kemal.akkoyun <kemal.akkoyun@datadoghq.com>
System test for the following changes RFC
Normally the agent responds with rate_by_service containing the sampling rates it wants the tracer to use. A mock overrides that response with a controlled sequence:
first returning 0.1, then switching to 1.0, so the test can verify the tracer caps the increase gradually rather than jumping straight to the new rate.
Workflow
🚀 Once your PR is reviewed and the CI green, you can merge it!
🛟 #apm-shared-testing 🛟
Reviewer checklist
tests/ormanifests/is modified ? I have the approval from R&P teambuild-XXX-imagelabel is present