Inject APM trace headers into SQS _datadog message attribute on send#10843
Conversation
…attribute Documents the missing send-side feature: Java never writes x-datadog-trace-id or x-datadog-parent-id into _datadog, breaking Java→other-tracer propagation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, the Java SQS instrumentation never wrote APM trace context (x-datadog-trace-id, x-datadog-parent-id) into the _datadog message attribute, breaking distributed tracing from Java to other language tracers (Python, Node, etc.) that rely on that attribute. Changes: - SqsInterceptor (v1 + v2): call defaultPropagator().inject() before DSM inject so APM headers are written into _datadog on SendMessage and SendMessageBatch - MessageAttributeInjector (v1 + v2): accumulate multiple key-value pairs into the _datadog JSON blob instead of short-circuiting on the second key - SqsClientInstrumentation (v1 + v2): register SqsInterceptor when isSqsInjectDatadogAttributeEnabled() is true, not only when DSM is enabled, so APM injection works even without Data Streams - Tests: add APM trace context injection test for both v1 and v2, covering DSM-on and DSM-off variants Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a message is sent with an existing _datadog attribute of Binary type (e.g. from the SQS-JMS library), getStringValue() returns null. Skip injection in that case rather than NPE. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
I heard some suggestions that this was deliberate. Or at least, it was deliberate for baggage. |
DSM_CONCERN is registered with usedAsDefault=true, so defaultPropagator() already includes DSM injection when Data Streams is enabled. The separate dsmPropagator.inject() call was therefore always redundant: a no-op when DSM is off, a duplicate when DSM is on. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SqsInterceptor: skip injection if _datadog is already present in the message attributes. Appending to a caller-provided attribute could produce duplicate JSON keys or corrupt propagation context set intentionally upstream. MessageAttributeInjector: add comments explaining why set() accumulates into the existing _datadog value rather than overwriting — a single propagator.inject() call fires set() once per header key, so all keys (x-datadog-trace-id, x-datadog-parent-id, dd-pathway-ctx-base64, ...) must land in the same JSON attribute. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ad0d115b56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
.../aws-java-sqs-1.0/src/main/java/datadog/trace/instrumentation/aws/v1/sqs/SqsInterceptor.java
Outdated
Show resolved
Hide resolved
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 57 metrics, 14 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~e0d5df4268, baseline=1.61.0-SNAPSHOT~3123b191d9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1064234
Total [baseline] (8.888 s) : 0, 8888318
Agent [candidate] (1.065 s) : 0, 1064574
Total [candidate] (8.907 s) : 0, 8906821
section iast
Agent [baseline] (1.24 s) : 0, 1240280
Total [baseline] (9.621 s) : 0, 9620947
Agent [candidate] (1.221 s) : 0, 1221003
Total [candidate] (9.604 s) : 0, 9603733
gantt
title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~e0d5df4268, baseline=1.61.0-SNAPSHOT~3123b191d9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.257 ms) : 0, 1257
crashtracking [candidate] (1.243 ms) : 0, 1243
BytebuddyAgent [baseline] (638.834 ms) : 0, 638834
BytebuddyAgent [candidate] (637.139 ms) : 0, 637139
AgentMeter [baseline] (29.5 ms) : 0, 29500
AgentMeter [candidate] (29.542 ms) : 0, 29542
GlobalTracer [baseline] (249.653 ms) : 0, 249653
GlobalTracer [candidate] (251.509 ms) : 0, 251509
AppSec [baseline] (32.19 ms) : 0, 32190
AppSec [candidate] (32.168 ms) : 0, 32168
Debugger [baseline] (59.49 ms) : 0, 59490
Debugger [candidate] (59.712 ms) : 0, 59712
Remote Config [baseline] (620.051 µs) : 0, 620
Remote Config [candidate] (613.146 µs) : 0, 613
Telemetry [baseline] (8.083 ms) : 0, 8083
Telemetry [candidate] (8.199 ms) : 0, 8199
Flare Poller [baseline] (8.171 ms) : 0, 8171
Flare Poller [candidate] (8.249 ms) : 0, 8249
section iast
crashtracking [baseline] (1.262 ms) : 0, 1262
crashtracking [candidate] (1.24 ms) : 0, 1240
BytebuddyAgent [baseline] (813.025 ms) : 0, 813025
BytebuddyAgent [candidate] (799.54 ms) : 0, 799540
AgentMeter [baseline] (11.738 ms) : 0, 11738
AgentMeter [candidate] (11.343 ms) : 0, 11343
GlobalTracer [baseline] (241.347 ms) : 0, 241347
GlobalTracer [candidate] (238.529 ms) : 0, 238529
IAST [baseline] (26.193 ms) : 0, 26193
IAST [candidate] (25.696 ms) : 0, 25696
AppSec [baseline] (30.734 ms) : 0, 30734
AppSec [candidate] (30.848 ms) : 0, 30848
Debugger [baseline] (63.478 ms) : 0, 63478
Debugger [candidate] (62.973 ms) : 0, 62973
Remote Config [baseline] (1.155 ms) : 0, 1155
Remote Config [candidate] (572.234 µs) : 0, 572
Telemetry [baseline] (11.298 ms) : 0, 11298
Telemetry [candidate] (10.613 ms) : 0, 10613
Flare Poller [baseline] (3.494 ms) : 0, 3494
Flare Poller [candidate] (3.46 ms) : 0, 3460
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~e0d5df4268, baseline=1.61.0-SNAPSHOT~3123b191d9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.055 s) : 0, 1054592
Total [baseline] (11.087 s) : 0, 11087255
Agent [candidate] (1.056 s) : 0, 1056283
Total [candidate] (11.159 s) : 0, 11159020
section appsec
Agent [baseline] (1.247 s) : 0, 1246577
Total [baseline] (11.166 s) : 0, 11165683
Agent [candidate] (1.251 s) : 0, 1251164
Total [candidate] (11.122 s) : 0, 11121809
section iast
Agent [baseline] (1.232 s) : 0, 1231694
Total [baseline] (11.269 s) : 0, 11269450
Agent [candidate] (1.236 s) : 0, 1236246
Total [candidate] (11.388 s) : 0, 11388496
section profiling
Agent [baseline] (1.186 s) : 0, 1186372
Total [baseline] (11.109 s) : 0, 11109126
Agent [candidate] (1.189 s) : 0, 1189310
Total [candidate] (11.176 s) : 0, 11175877
gantt
title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~e0d5df4268, baseline=1.61.0-SNAPSHOT~3123b191d9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.234 ms) : 0, 1234
crashtracking [candidate] (1.24 ms) : 0, 1240
BytebuddyAgent [baseline] (631.1 ms) : 0, 631100
BytebuddyAgent [candidate] (633.027 ms) : 0, 633027
AgentMeter [baseline] (29.173 ms) : 0, 29173
AgentMeter [candidate] (29.238 ms) : 0, 29238
GlobalTracer [baseline] (248.588 ms) : 0, 248588
GlobalTracer [candidate] (248.249 ms) : 0, 248249
AppSec [baseline] (32.035 ms) : 0, 32035
AppSec [candidate] (31.894 ms) : 0, 31894
Debugger [baseline] (60.112 ms) : 0, 60112
Debugger [candidate] (59.779 ms) : 0, 59779
Remote Config [baseline] (598.913 µs) : 0, 599
Remote Config [candidate] (618.507 µs) : 0, 619
Telemetry [baseline] (8.147 ms) : 0, 8147
Telemetry [candidate] (8.041 ms) : 0, 8041
Flare Poller [baseline] (7.487 ms) : 0, 7487
Flare Poller [candidate] (8.052 ms) : 0, 8052
section appsec
crashtracking [baseline] (1.237 ms) : 0, 1237
crashtracking [candidate] (1.24 ms) : 0, 1240
BytebuddyAgent [baseline] (660.778 ms) : 0, 660778
BytebuddyAgent [candidate] (662.334 ms) : 0, 662334
AgentMeter [baseline] (12.143 ms) : 0, 12143
AgentMeter [candidate] (12.095 ms) : 0, 12095
GlobalTracer [baseline] (249.098 ms) : 0, 249098
GlobalTracer [candidate] (249.691 ms) : 0, 249691
IAST [baseline] (24.513 ms) : 0, 24513
IAST [candidate] (24.622 ms) : 0, 24622
AppSec [baseline] (183.544 ms) : 0, 183544
AppSec [candidate] (185.319 ms) : 0, 185319
Debugger [baseline] (66.128 ms) : 0, 66128
Debugger [candidate] (66.475 ms) : 0, 66475
Remote Config [baseline] (611.061 µs) : 0, 611
Remote Config [candidate] (606.502 µs) : 0, 607
Telemetry [baseline] (8.559 ms) : 0, 8559
Telemetry [candidate] (8.76 ms) : 0, 8760
Flare Poller [baseline] (3.542 ms) : 0, 3542
Flare Poller [candidate] (3.59 ms) : 0, 3590
section iast
crashtracking [baseline] (1.25 ms) : 0, 1250
crashtracking [candidate] (1.244 ms) : 0, 1244
BytebuddyAgent [baseline] (807.587 ms) : 0, 807587
BytebuddyAgent [candidate] (809.151 ms) : 0, 809151
AgentMeter [baseline] (11.501 ms) : 0, 11501
AgentMeter [candidate] (11.517 ms) : 0, 11517
GlobalTracer [baseline] (240.011 ms) : 0, 240011
GlobalTracer [candidate] (240.832 ms) : 0, 240832
IAST [baseline] (25.885 ms) : 0, 25885
IAST [candidate] (26.046 ms) : 0, 26046
AppSec [baseline] (33.352 ms) : 0, 33352
AppSec [candidate] (32.929 ms) : 0, 32929
Debugger [baseline] (60.619 ms) : 0, 60619
Debugger [candidate] (61.048 ms) : 0, 61048
Remote Config [baseline] (1.133 ms) : 0, 1133
Remote Config [candidate] (564.807 µs) : 0, 565
Telemetry [baseline] (10.243 ms) : 0, 10243
Telemetry [candidate] (12.481 ms) : 0, 12481
Flare Poller [baseline] (3.535 ms) : 0, 3535
Flare Poller [candidate] (3.831 ms) : 0, 3831
section profiling
crashtracking [baseline] (1.19 ms) : 0, 1190
crashtracking [candidate] (1.187 ms) : 0, 1187
BytebuddyAgent [baseline] (691.921 ms) : 0, 691921
BytebuddyAgent [candidate] (694.38 ms) : 0, 694380
AgentMeter [baseline] (9.211 ms) : 0, 9211
AgentMeter [candidate] (9.189 ms) : 0, 9189
GlobalTracer [baseline] (208.196 ms) : 0, 208196
GlobalTracer [candidate] (207.682 ms) : 0, 207682
AppSec [baseline] (32.742 ms) : 0, 32742
AppSec [candidate] (32.852 ms) : 0, 32852
Debugger [baseline] (65.26 ms) : 0, 65260
Debugger [candidate] (65.928 ms) : 0, 65928
Remote Config [baseline] (574.203 µs) : 0, 574
Remote Config [candidate] (573.834 µs) : 0, 574
Telemetry [baseline] (8.594 ms) : 0, 8594
Telemetry [candidate] (7.835 ms) : 0, 7835
Flare Poller [baseline] (3.638 ms) : 0, 3638
Flare Poller [candidate] (3.579 ms) : 0, 3579
ProfilingAgent [baseline] (93.698 ms) : 0, 93698
ProfilingAgent [candidate] (94.641 ms) : 0, 94641
Profiling [baseline] (94.277 ms) : 0, 94277
Profiling [candidate] (95.203 ms) : 0, 95203
LoadParameters
See matching parameters
SummaryFound 3 performance improvements and 1 performance regressions! Performance is the same for 17 metrics, 15 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~e0d5df4268, baseline=1.61.0-SNAPSHOT~3123b191d9
dateFormat X
axisFormat %s
section baseline
no_agent (1.256 ms) : 1244, 1268
. : milestone, 1256,
iast (3.264 ms) : 3218, 3310
. : milestone, 3264,
iast_FULL (6.198 ms) : 6135, 6262
. : milestone, 6198,
iast_GLOBAL (3.522 ms) : 3468, 3576
. : milestone, 3522,
profiling (2.412 ms) : 2388, 2435
. : milestone, 2412,
tracing (1.879 ms) : 1863, 1895
. : milestone, 1879,
section candidate
no_agent (1.251 ms) : 1239, 1263
. : milestone, 1251,
iast (3.275 ms) : 3227, 3323
. : milestone, 3275,
iast_FULL (5.989 ms) : 5928, 6050
. : milestone, 5989,
iast_GLOBAL (3.676 ms) : 3615, 3737
. : milestone, 3676,
profiling (2.066 ms) : 2048, 2084
. : milestone, 2066,
tracing (1.888 ms) : 1872, 1905
. : milestone, 1888,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~e0d5df4268, baseline=1.61.0-SNAPSHOT~3123b191d9
dateFormat X
axisFormat %s
section baseline
no_agent (19.219 ms) : 19021, 19417
. : milestone, 19219,
appsec (18.623 ms) : 18436, 18810
. : milestone, 18623,
code_origins (17.728 ms) : 17554, 17902
. : milestone, 17728,
iast (18.014 ms) : 17835, 18194
. : milestone, 18014,
profiling (18.541 ms) : 18357, 18725
. : milestone, 18541,
tracing (17.616 ms) : 17443, 17789
. : milestone, 17616,
section candidate
no_agent (17.389 ms) : 17215, 17562
. : milestone, 17389,
appsec (19.011 ms) : 18820, 19203
. : milestone, 19011,
code_origins (17.841 ms) : 17665, 18017
. : milestone, 17841,
iast (17.87 ms) : 17695, 18046
. : milestone, 17870,
profiling (18.57 ms) : 18388, 18752
. : milestone, 18570,
tracing (17.883 ms) : 17703, 18063
. : milestone, 17883,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~e0d5df4268, baseline=1.61.0-SNAPSHOT~3123b191d9
dateFormat X
axisFormat %s
section baseline
no_agent (15.521 s) : 15521000, 15521000
. : milestone, 15521000,
appsec (14.713 s) : 14713000, 14713000
. : milestone, 14713000,
iast (18.062 s) : 18062000, 18062000
. : milestone, 18062000,
iast_GLOBAL (17.942 s) : 17942000, 17942000
. : milestone, 17942000,
profiling (14.921 s) : 14921000, 14921000
. : milestone, 14921000,
tracing (14.984 s) : 14984000, 14984000
. : milestone, 14984000,
section candidate
no_agent (15.362 s) : 15362000, 15362000
. : milestone, 15362000,
appsec (14.59 s) : 14590000, 14590000
. : milestone, 14590000,
iast (18.322 s) : 18322000, 18322000
. : milestone, 18322000,
iast_GLOBAL (17.915 s) : 17915000, 17915000
. : milestone, 17915000,
profiling (15.36 s) : 15360000, 15360000
. : milestone, 15360000,
tracing (15.077 s) : 15077000, 15077000
. : milestone, 15077000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~e0d5df4268, baseline=1.61.0-SNAPSHOT~3123b191d9
dateFormat X
axisFormat %s
section baseline
no_agent (1.502 ms) : 1490, 1514
. : milestone, 1502,
appsec (3.831 ms) : 3607, 4055
. : milestone, 3831,
iast (2.29 ms) : 2221, 2359
. : milestone, 2290,
iast_GLOBAL (2.337 ms) : 2267, 2407
. : milestone, 2337,
profiling (2.115 ms) : 2060, 2169
. : milestone, 2115,
tracing (2.099 ms) : 2045, 2152
. : milestone, 2099,
section candidate
no_agent (1.496 ms) : 1484, 1508
. : milestone, 1496,
appsec (3.871 ms) : 3648, 4094
. : milestone, 3871,
iast (2.291 ms) : 2222, 2361
. : milestone, 2291,
iast_GLOBAL (2.339 ms) : 2270, 2409
. : milestone, 2339,
profiling (2.126 ms) : 2070, 2183
. : milestone, 2126,
tracing (2.116 ms) : 2062, 2170
. : milestone, 2116,
|
Moving newContext/getContext outside the per-entry loop prevents creating a new span per entry where each overwrites the same contextStore slot, leaking all but the last span and injecting mismatched parent IDs into earlier batch messages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
What Does This Do
Java's SQS instrumentation never wrote APM trace context (
x-datadog-trace-id,x-datadog-parent-id) into the_datadogSQS message attribute on send. The receive side already knows how to read them, but the send side only injected the DSM pathway context — and only when Data Streams was enabled.Motivation
This broke distributed tracing from Java to other language tracers (Python, Node.js, etc.) that rely on the
_datadogattribute for context propagation.Additional Notes
SqsInterceptor(v1 + v2): CalldefaultPropagator().inject()before the DSM inject onSendMessageandSendMessageBatch, so APM trace headers are written into_datadog.MessageAttributeInjector(v1 + v2): Accumulate multiple key-value pairs into the_datadogJSON blob instead of short-circuiting on the second call. Also guard against a nullstringValue(Binary-typed_datadogattributes from SQS-JMS).SqsClientInstrumentation(v1 + v2): RegisterSqsInterceptorwhenisSqsInjectDatadogAttributeEnabled()is true, not only when Data Streams is enabled."APM trace context is injected into _datadog message attribute on send"for both AWS SDK v1 and v2, covering both DSM-off (V0Test) and DSM-on (V0DataStreamsTest) variants.DD context extraction was added in #5920
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: APMS-18847
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.