Blackbird
Billions of rows, couple of milliseconds away
Ishan Chhabra
Shrijeet Paliwal
Abhijit Pol
Image
$0.56
$2.42
1.25
$2.11
$1.26
$2.78
$0.756
$0.809
$2.42
1.25
$2.11
$1.26
$2.78
$1.256
$1.809
$2.42
1.25
$2.11
$1.26
Site/PageGeo/WeatherTime of DayBrand AffinityUser
[ + ][ + ]
Image
User
Segments
3. Bid Request
5. Rocket Fuel
Winning Ad
2. Ad
Request
6. Ad Served
1. Page Request
4. Bid & Ad
Browser
User
Engagements
Publishers
Data Partners
Exchange Partners
Optimize
Simple View of Rocket Fuel
Platform
Real-time Bidder
User
Engagement
User Data
Store
Model
Scoring
So what is Blackbird?
5 B
6 B
45 B
Facebook likes
Searches on Google
Bid Requests Considered by Rocketfuel
Requests per day
400
100
20
2
Blink of an eye
SF to Tokyo network round trip
One beat of a hummindbird's wing
Look up in Blackbird
Time (ms)
Powered by
HBase, we have a problem..
Object NoSQL Mapper
List<KeyValue>
High Performance Collections
» Data loss on concurrent modification
» Read per write
» High amount of data per write
» O(n)
» Significantly reduced flushes,
compaction, network usage, GC.
» O(1)
Combined Column: 100 entries
c1:combined
1 entry
c1:rand1
2 entries
c1:rand2
1 entry
c1:rand3
Logical Collection
Append Only, the HBase view
Optimizing reads using normalization
Combined Column: 100 entries
c1:combined
1 entry
c1:rand1
2 entries
c1:rand2
1 entry
c1:rand3
Combined Column: 103 entries
c1:combined
.filter(⋋), .transform(⋋), ⨍
Secondary Indexes
High Throughput
Low Latency
Lookups
Not so easy!
HBase is designed for high throughput writes
Key Ideas
Read as
little as
possible
Stay stable,
uniform,
data local
Don’t go to
disk
Even if you
have to go
to disk,
make it fast
Protobufs, Protobufs, everywhere
Stay stable, uniform, data local at all times
Good quality hardware
Properly designed row keys
Off peak daily major compaction
Give me all your Cache!
128 Gb machines with 50% block Cache
High Cache hit ratio (90% +) by effective utilization
Image
It’s time to disk(o)
15K SAS drives
Local & Short circuit reads (20-30% improvement)
High throughput writes are
supported too!
Small Writes
• Append Only
• Protobufs
Large
Memstores
• 4 Gb
• Avoids flushes,
memory churn,
compaction
• Maintains read
performance
by avoiding
multiple seeks
Tuned
Compaction
• Avoid Minor
compactions
• Off Peak Major
compaction
Reliability & Availability
Organize the chaos or pay the cost..
» Blind writes can grow rows & table too big
» Newbie clients 'guess' a lot
» Simple queries such as row count can be hard on
the fly
Be aware…
Web app Bid Serving
Ad servingData augmentation
Batch data pipelines
Ops Housekeeping
Real time data pipelines
Multitenant Blackbird
Multi tenancy makes it hard to find the defaulter
Use ACLs & client side metrics in all access paths
Draft guidelines for new clients, help them estimate the growth
Keep track of growth, row count, row size, column size etc.
Maintaining SLA Guarantees
It’s a delicate equilibrium that is hard to maintain
Shield it with aggressive alerting, dashboards & canary monitoring
1st region server dies after
several hours of clogged RPC
queue
Bad region moves to another region
server & soon kills it too!
2jmj7l5rSw0yVb_vlWAYkK_Ybwk
stgLVlK_SsLMn4HoG82ymp-QlRtA
Clients can go rouge, it can get as bad as a DoS attack
Protection via dynamic blacklists & size limit filters
Surviving the failures
» In absence of proxy: ‘The client is part of the cluster’ [1]
» Client must report availability error to calling application thread in short time span
» Follow circuit breaker pattern for read calls (Anecdote)
» ‘pseudo’ puts (local file) for write calls
[1] Blog post from Lars Hofhansl http://hadoop-hbase.blogspot.com/2012/09/hbase-client-timeouts.html
Shoutouts!
Obligatory “we are hiring” slide!
http://rocketfuel.com/careers
ishan@rocketfuel.com
shrijeet@rocketfuel.com
apol@rocketfuel.com
Case studies   session 2
Case studies   session 2

Case studies session 2