Fix nil pointer dereference in node allocation#2764
Merged
anshulpundir merged 1 commit intomoby:masterfrom Oct 17, 2018
Merged
Conversation
When the network allocator starts, it performs two passes of allocation. The first, with existingAddressesOnly set to "true", simply re-allocates any already reserved addresses, which make the local driver state consistent with the state in swarmkit's object store. The second pass then performs any outstanding new allocations, from when the allocator last stopped. Since moby#2725, nodes only have attachments allocated for them if they have a task currently scheduled which requires those networks. This happens after a task is allocated and scheduled. Before this change, it was possible that, if a Task was correctly allocated, but the allocator stopped before the Node was also allocated, during the restore phase, an empty api.NetworkAttachment object was added to the Node's attachments. Then, in the new allocations phase, when trying to process all attachments, we were unconditionally looking at the NetworkAttachment object's Network field, which was nil. This caused a segfault and crash. With this change, we no longer add these errant NetworkAttachment objects to nodes. Signed-off-by: Drew Erny <drew.erny@docker.com>
anshulpundir
approved these changes
Oct 17, 2018
wk8
approved these changes
Oct 17, 2018
Codecov Report
@@ Coverage Diff @@
## master #2764 +/- ##
========================================
Coverage ? 61.8%
========================================
Files ? 134
Lines ? 21867
Branches ? 0
========================================
Hits ? 13514
Misses ? 6901
Partials ? 1452 |
Contributor
|
@anshulpundir @dperny we need this to be backported to the 18.09 branch (if it is not created, I guess it must be created and cherry-picked ?). |
This was referenced Oct 18, 2018
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
- What I did
Fix a nil pointer dereference in the
- How I did it
https://www.youtube.com/watch?v=bLHL75H_VEM
When the network allocator starts, it performs two passes of allocation. The first, with existingAddressesOnly set to "true", simply re-allocates any already reserved addresses, which make the local driver state consistent with the state in swarmkit's object store. The second pass then performs any outstanding new allocations, from when the allocator last stopped.
Since #2725, nodes only have attachments allocated for them if they have a task currently scheduled which requires those networks. This happens after a task is allocated and scheduled.
Before this change, it was possible that, if a Task was correctly allocated, but the allocator stopped before the Node was also allocated, during the restore phase, an empty api.NetworkAttachment object was added to the Node's attachments. Then, in the new allocations phase, when trying to process all attachments, we were unconditionally looking at the NetworkAttachment object's Network field, which was nil. This caused a segfault and crash.
With this change, we no longer add these errant NetworkAttachment objects to nodes.
- How to test it
Includes an automated test, which I have verified fails before the change and passes after.
- Description for the changelog
Fix nil pointer dereference that could crash swarmkit.