HBaseCon 2015: Multitenancy in HBase

HBase Scale and Multi-tenancy @ Y!
PRESENTED BY
Francis Liu | toffer@apache.org
Vandana Ayyalasomayajula | avandana@apache.org
Virag Kothari | virag@apache.org

Outline
▪ HBase @ Y!
▪ Group Favored Nodes
▪ Scaling to 1M Regions and beyond

Y! Grid
▪ Off-Stage Processing
▪ Hosted Service
▪ Multi-tenant

Y! HBase
▪ Hosted Multi-tenant Service
▪ Isolation
› Isolated Deployment
› Region Server Groups
› Namespace
▪ Security
› ACLs
› Audit Logging
▪ Cross-Colo Replication

HBase
Client
HBase
Client
JobTracker Namenode
TaskTracker
DataNode
Namenode
RegionServer
DataNode
RegionServer
DataNode
RegionServer
DataNode
HBase MasterZookeeper
Quorum
HBase
Client
MR Client
M/R Task
TaskTracker
DataNode
M/R Task
TaskTracker
DataNode
MR Task
Compute Cluster HBase Cluster
Gateway/Launcher
Isolated Deployment

Region Server Groups - Overview
▪ Member Tables
▪ Resource Isolation
▪ Flexibility with configuration
Group Bar
Region Server 5…8
Table3
Table4
Group Foo
Region Server 1…4
Table1
Table2
RS1
Table1
Table2
RS2
Table1
Table2
RS3
Table1
Table2
RS4 RS5
Table3
Table4
RS6
Table3
Table4
RS7
Table3
Table4
RS8
Conﬁgs

Region Server Groups - Implementation
LoadBalancer
GroupBasedLoadBalancer
GroupAdminEndpoint
GroupMasterObserver
HMaster
FilterBy
Group
foo
bar
GroupInfoManager
Group Table
Group
ZNode

Namespace
▪ Analogous to Database
▪ Full Table Name: <table namespace>:<table name>
▪ i.e. my_ns:my_table
▪ Reserved namespaces
› default – tables with no explicit namespace
› hbase – system tables (ie hbase:meta, hbase:acl, etc)
▪ Table Path: /<hbaseRoot>/data/<namespace>/<tableName>

Namespace
▪ Default Region Server Group
▪ Quota
› Max Tables
› Max Regions
▪ Per Tenant

Replication
▪ Sinks are randomly picked
▪ Sources recover any queue
▪ Shared RPC Quality of Protection config
source: https://hbase.apache.org/replication.html

Replication + Group
▪ Region Server Group Aware
▪ Rule based API
› Source: {namespace},[Table], [CF]
› Slave: {Peer}
› Effective Time
Group Foo
Group Bar
Table1
Table2
Group Foo
Table1
Table2

Replication + Thrift
▪ Encryption via SASL
▪ 0.94 <-> 0.96+ interoperability

Favored Nodes
▪ What are Favored Nodes ?
› While writing data, we can pass a set of preferred hosts to HDFS client to replicate data.
› preferred hosts => “Favored Nodes”
› Usually 3 hosts : primary, secondary, tertiary.
› Constraint: Primary host on one rack , secondary and tertiary hosts on different rack.
▪ Favored Nodes of regions are scattered across various groups.
› No guarantees about data locality within a region server group.

Example
RS7
DN7
RS Group - B
RS5
DN5 DN6
RS6
RS8
DN8
RS3
DN3
RS Group - A
RS1
DN1 DN2
RS2
RS4
DN4

Example
▪ Locality is lost when region server RS1 dies.
RS7
DN7
RS Group - B
RS5
DN5 DN6
RS6
RS8
DN8
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS dies

▪ Fix the data locality problem by
› choosing favored nodes within region server group
› Assigning regions to only favored nodes
Group Aware Favored Nodes
RS7
DN7
RS Group - B
RS5
DN5 DN6
RS6
RS8
DN8
RS3
DN3
RS Group - A
RS1
DN1 DN2
RS2
RS4
DN4

FavoredGroupLoadBalancer
▪ Region server groups aware
▪ Region assignment on favored nodes
▪ Region balancing done using Stochastic Load Balancer
▪ Favored Node Management
› Generate favored nodes for regions
› Favored nodes are inherited during a region split/merge events.
› Favored nodes do not change unless required.

Favored Node Management APIs
▪ Redistribute
› Ability to expand region block replicas to newly added nodes.
› Change favored nodes of regions such that replicas spread to newly added nodes
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
redistribute
New node
added

Favored Node Management APIs
▪ Complete_Redistribute
› Ability to recreate entire set of favored nodes in balanced fashion
› Balances the replica load evenly among all the nodes
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
complete
redistribute
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
Host with least number of
replicas

Enhancements
▪ Improvements to Stochastic Load Balancer (HBASE-13376)
▪ Improvements to Region Placement Maintainer Tool
› Ability to view locality of region on each of its FN.
› Ability to view primary, secondary and tertiary node distribution of region servers.
▪ Hadoop JIRA’s
› HDFS-7300
› HDFS-7795
▪ Configuration changes made on Hadoop side
› Set “dfs.namenode.replication.considerLoad” to false in small clusters

Scaling to 1M and beyond (HBASE-11165)
▪ Store Petabytes of data
▪ Support mixed workload (batch and near real-time)
▪ Performance
› Latency, throughput
▪ Operability
› Load balancing, compactions, etc.

Experience at Scale
▪ Web Crawl Cache
› ~2.3PB Table
› 80GB regions -> 20GB regions
› Batch workload
▪ Hot Regions
▪ Large compactions (Write amplification)
▪ Longer failover time
▪ Less Parallel/Imbalanced MapReduce Tasks
▪ Large MapReduce tasks

Scaling Region Count
▪ Master Region Management
› Creation, Assign, Balance, etc.
› Meta table
▪ Metadata
› HDFS scalability
› Zookeeper
› Region Server density

RSMaster
Meta
region
Zookeeper
Region 1
Region 2
Region 1
Region 2
RS
RS
Assignment
communication
Write
ops
Observations
▪ Assignment
› ZK assignment - complex and more storage
› High CPU usage on master
▪ Single hot meta
› 7GB in size for 1M
› Master writing at 400 ops/second
› Longer scanning times
▪ HDFS
▪ Longer directory creation time

User region 1
User region 2
RS
Master
▪ Assignment
› Zk less assignment (HBASE-11059)
› Simpler
› No involvement of Zk
› Unlock region states (HBASE-11290)
Enhancements - Assignment
User region 1
User region 2
User region
Meta region
RS
User region 1
User region 2
RS

▪ Split meta (HBASE-11288)
› Distributed IO load
› Distributed caching
› Shorter scan time
› Distributed compaction
Meta region
User region
RS
Master
Meta region
User region
User region
Meta region
RS
Meta region
User region
RS
Enhancements – Split Meta

Region dir creation time - 4k buckets
1M regions 5M 10M
normal table 20 mins 4 hours 23 minutes Doesn’t finish
humongous table 15 mins 48 secs 1 hour 27 minutes 2hr 53 minutes
Enhancements - Hierarchical region dir
● Scaling namenode operations - Table dir has millions of region files
● Approach - Buckets within table directory
● E.g 3 letters of bucket names gives 4k buckets

HBaseCon 2014
Thank You!
(We’re Hiring)

HBaseCon 2015: Multitenancy in HBase

More Related Content

What's hot

Viewers also liked

Similar to HBaseCon 2015: Multitenancy in HBase

More from HBaseCon

Recently uploaded

HBaseCon 2015: Multitenancy in HBase