7 Tips for running HBase in EC2

We are running a HBase (Currently 0.20.4) cluster on ec2. I thought it will be useful for others to know some tips about running HBase in EC2.

1) Use private dns addresses in config files such as hdfs-site.xml, hbase-site.xml. On ec2 Ubuntu instances java’s getHost() gets resolved to the private dns addresses.

2) Use c1.xlarge or bigger node to start with. I have seen Andrew Purtell (HBase committer) recommending this on HBase mailing list. We have tried m1.large machines. It has worked well for us when our traffic was small. We are hitting HBase in real time. As traffic started increasing we started getting CPU maxouts. Currently we use c1.xlarge machines.

3) Define a security group for all of your Hadoop/HBase nodes. Hadoop, HBase, Zookeeper nodes need to talk with each other on many ports. It’s better to assign one security group to all the nodes and give that group permission to talk to it self.

4) Use dfs.host property in hdfs-site.xml. This property allows you to specify a path of a file with a list of dns addresses of allowable nodes. Only nodes listed in the file will be allowed to join the Hadoop cluster. This becomes important in case you are spanning a QA cluster in ec2. You don’t want your QA node to join your production cluster accidentally. If you add a new node, you need execute the following command to refresh the allowable nodes list to avoid entire cluster restart:
hadoop dfsadmin -refreshNodes
I am assuming here that you are not running map reduce on the same cluster. If you want to run map reduce on the same cluster do not use a similar setting in mapred-site.xml. There is no refreshNodes option available for mradmin. It’s coming up in 0.21.

5) Use EBS backed instances for a Hadoop/HBase nodes. There are many benefits of using EBS volumes instead of the s3 backed ones. For example, let’s say you tried a different vm args on a node to test the performance. And if the very same node goes down, you don’t have to worry about loosing those settings. All you need to do is take a snapshot of that volume and spawn an instance. I generally find it easier to deal with EBS backed instances than S3 backed instances. The added cost is too small in comparison with the added convenience of not loosing data.

6) Prepare an AMI with Hadoop, HBase installation and with all the configurations. This will allow you to add a node quickly when you want to do so.

7) Use Elastic Map Reduce to run map reduce jobs with your HBase. We have found this much more cost effective. Let’s say you have a 7 node HBase cluster. But your map reduce jobs need at least 10 nodes for it to finish on time. Elastic map reduce will allow you to spawn a 10 node job with your HBase. The load on your HBase cluster will reduce and you don’t have to pay for 3 extra nodes 24 x 7. Furthermore, you can spawn cheaper instances for your EMR job than you are using for your HBase (c1.xlarge) if you want to. We use c1.mediums.

HBase guys have written scripts to spawn a cluster in ec2. The scripts makes it easier to spawn an entire cluster. Don’t hesitate to try it out. Let me know if you have any other tips that you would like to share. I am sure many people are running their HBase clusters on ec2. I am eager to learn from their experiences.

7 Tips for running HBase in EC2

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112