HBaseCon 2015: Events @ Box - Using HBase as a Message Queue

1
David MacKenzie
Box Engineering
@davrmac @BoxEng
/events @ Box: Using HBase
as a message queue

2
Share, manage and access your content from any device, anywhere

3
What is the /events API?
• Realtime stream of all activity happening within a user’s account
• GET /events?stream_position=1234&stream_type=all
• Persistent and re-playable
1 2 3 4 5
Client

4
Why did we build it?
• Main use-case was sync  switch from batch to incremental diffs
• Several requirements arose from the sync use case:
‒ Guaranteed delivery
‒ Clients can be offline for days at a time
‒ Arbitrary number of clients consuming each user’s stream
Persistence
Re-playability

5
How is it implemented?
• Each user assigned a separate section of the HBase key-space
• Messages are stored in order from oldest to newest within a user’s
section of the key-space
• Reads map directly to scans from the provided position to the user’s end
key
• Row key structure: <pseudo-random prefix>_<user_id>_<position>
2-bytes of user_id sha1 Millisecond timestamp

6
Using a timestamp as a queue position
• Pro: Allows for allocating roughly monotonically increasing positions
with no co-ordination between write requests
• Con: Isn’t sufficient to guarantee append-only semantics in the presence
of parallel writes
Write
Write 2
Write
R
e
a
d
1
2
R
e
a
d

7
Time-bounding and Back-scanning
• Need to ensure that clients don’t advance their stream positions past
writes that will eventually succeed
‒ But clients do need to advance position eventually
‒ How do we know when it’s safe?
• Solution: time-bound writes and back-scan reads
‒ Time-bounding: every write to HBase must complete within a fixed time-bound to be
considered successful
‒ No guaranteed delivery for unsuccessful writes.
‒ Clients should retry failed writes at higher stream positions.
‒ Back-scanning: clients cannot advance their stream positions further than (current
time – back-scan interval)
‒ Back-scan interval >= write time-bound
• Provides guaranteed delivery but at the cost of duplicate events

8
3
Write
Write
R
e
a
d
2
3
Write R
e
a
d
1
2
3
Write
R
e
a
dWrite 4

9
Replication
• Master/slave architecture
‒ One cluster per DC
‒ Master cluster handles all reads and writes
‒ Slave clusters are passive replicas
• On promotion, clients transparently fail over to the new master cluster
• Can’t use native HBase replication directly
‒ Could cause clients to miss events when failing over to a lagging cluster
Replication
1
2
1
Failover Replication
1
2
1
Write
R
e
a
d3

10
Replication Contd.
• Replication system needs to be aware of master/slave failovers
‒ Stop exactly replicating messages. Start appending messages to the current ends of
the queues.
• Currently, use a client-level replication system piggy backing on MySQL
replication
• Plan to switch to a system that hooks into HBase replication by
configuring itself as a slave HBase cluster
1
2
1
Failover
1
2
1
3
4
R
e
a
d

11
Why HBase?
• Closest off-the-rack queuing system is Kafka
‒ Developed at LinkedIn. Open sourced in 2011.
‒ Originally built to power LinkedIn’s analytics pipeline
‒ Very similar model built around “ordered commit logs”
‒ Allow for easy addition of new subscribers
‒ Allow for varying subscriber consumption patterns  slow subscribers don’t back up the
pipeline

12
Why HBase and not Kafka?
• Better consistency vs. availability tradeoffs
‒ No automatic rack aware replica placement
‒ No automatic replica re-assignment upon replica failure
‒ On replica failure, no fast failover of new writes to new replicas.
‒ Can’t require minimum replication factor for new writes without significantly impacting
availability on replica failure
• Replication support
‒ Not enough control over Kafka queue positions to implement transparent client
failovers between replica clusters
• Unable to scale to millions of topics
‒ Currently tops out in the tens of thousands of topics.
‒ Design requires very granular topic tracking. Barrier to scale.

13
In conclusion…
• We were able to leverage HBase to store millions of guaranteed delivery
message queues, each of which was:
‒ replicated between data centers
‒ independently consumable by an arbitrary number of clients
• Cluster metrics:
‒ ~30 nodes per cluster
‒ 15K write/sec at peak. Bursts of up to 40K writes/sec.
‒ 50K-60K requests/sec at peak.

14
Questions?
Twitter @davrmac
@BoxEng
Engineering Blog tech.blog.box.com
Platform developers.box.com
Open Source opensource.box.com

HBaseCon 2015: Events @ Box - Using HBase as a Message Queue

More Related Content

Viewers also liked

More from HBaseCon

HBaseCon 2015: Events @ Box - Using HBase as a Message Queue