Blackbird Collections
In-situ Stream Processing in HBase
Ishan Chhabra, Nitin Aggarwal, Deepankar Reddy
Rocketfuel Inc.
Quick Recap:
Blackbird @ Rocketfuel Inc.
Quick Recap:
Blackbird @ Rocketfuel Inc.
Collections:
At the center of all non-trivial
applications
Usual first attempt:
Read Modify Write
Read existing 100
elementsEvent
Write 101 elements
Problem:
Not sympathetic to
HBase Internals
Memstore bloat
Constant flushing
Compactions
Problem:
Asymptotic network
usage
DC 1 DC 2
Write 1: 101
elements
Write 2: 48
elements
Problem: Concurrency Bug
Read existing 100
elements
Event 1
Write 101 elements
100
elements
Read existing 100
elements
Write 101 elements
Event 2
101
elements
Append Only Collections:
Be sympathetic to HBase
Internals
Trivial example: Lists
But what about Sets, Maps, Counters,
etc. and domain specific collections?
Working example:
SegmentSet
Working example:
SegmentSet
Keep the latest entry only
for a segment
Atmost 1000 of the most
recently updated segments
Blackbird
Collections:
Logical Model
Collection of entries
Can only add elements to it
Apply series of functions during
read to enforce properties
Blackbird
Collections:
Logical Model
For every collection:
Define the structure
Define the series of functions f1,
f2, f3… to apply during read
So how would
SegmentSet work?
Initial State in DB
So how would
SegmentSet work?
Adding a new element in DB
So how would
SegmentSet work?
Enforce properties during read at client
Blackbird Collections:
HBase Implementation
Combined Column: 100 entries
views:4587
1 entry
views:2398
Logical Collection
2 entries
views:6798
1 entry
views:2983
Step 1: Write
appends to separate
columns
Step 2: Apply the
functions during reads
Combined Column: 100 entries
1 entry
2 entries 1 entry
104 entries 92 entries
f1, f2, …
But what about all the garbage
that is building up?
Step 3:
Normalization
Combined Column: 100 entries
1 entry
2 entries 1 entry
92 entries
Step 3:
Normalization
2 kinds of runs:
nightly and weekly
Nightly run only looks at subset
of data (data changed that day)
Weekly run looks at all the data
Step 3:
Normalization
Heavily optimized: < 1h for nightly run and 2-
3h for weekly run (~50TB of data)
Made fast by MR over snapshots
and bulkloads
No impact on live read performance
Blackbird Collections:
Updated Logical Model
Collection of entries
Can only add elements to it
Apply series of functions during reads
Apply series of functions during daily normalizationApply series of functions during weekly
normalization
Another Example:
Transient Counters
Another Example:
Transient Counters
Be able to increment/decrement counts
Remove entries if timestamp +
time to live < current time
Keep the latest 1000 entries only
Another Example:
Transient Counters
aggregate()
Expire()
limit_to_1000()
Another Example:
Transient Counters
aggregate() expire()During
read:
aggregate() expire() limit_to_1000()
Daily
Normalization:
aggregate() expire() limit_to_1000()
Weekly
Normalization:
Conclusion
Thank you!
Questions?
Reach us at:
ishan@rocketfuel.com
naggarwal@rocketfuel.com
dreddy@rocketfuel.com

HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase