Apache HBase - getting started on Mac
Quick reference for a Standalone HBase installation on a Mac. Please note that these are quick working notes for reference and not an elaborate documentation.
Apache HBase Reference Guide: https://hbase.apache.org/book.html
Prerequisite: JDK needs to be installed
Download Apache HBase: https://hbase.apache.org/downloads.html => (download the bin for the stable version) = > download from https://www.apache.org/dyn/closer.lua/hbase/2.4.9/hbase-2.4.9-bin.tar.gz
Extract and install: tar -xvzf hbase-2.4.9-bin.tar.gz
Standalone mode using local filesystem
A standalone instance of HBase has all the HBase daemons included:
- Master
- RegionServers
- ZooKeeper
All the 3 daemons run in a single JVM persisting to the local filesystem.
Configure:
Check Java_HOME: echo $JAVA_HOME
`$/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre`
Update conf/hbase-env.sh in HBase installation directory with:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre
Update ~/.zshrc file:
export PATH="$PATH:</install path>/hbase-2.4.9/bin"
Start HBase: start-hbase.sh
Received the following warnings:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/...../hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/...../hbase-2.4.9/lib/client-facing-thirdparty/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
running master, logging to /..../hbase-2.4.9/bin/../logs/hbase-.....local.out
Verify that there is one running process HMaster. In standalone mode HBase runs all three daemons (HMaster, HRegionServer and ZooKeeper) within the single JVM.
jps
80356 Jps
80156 HMaster
HBase Web UI at http://localhost:16010 was not accessible
Solved above warnings/error:
In ~/.zshrc commented
#export PATH=”$PATH:/Users/shouvik/opt/hadoop-3.3.1/bin”
Start HBase: bin/start-hbase.sh
The warning is gone!
...
running master, logging to /Users/shouvik/opt/hbase-2.4.9/bin/../logs/hbase-.......local.out
...
jps
81571 HMaster
81732 Jps
HBase Web UI: http://localhost:16010 is now accessible => http://localhost:16010/master-status
Connect to HBase
Run HBase shell:
hbase shell
Version 2.4.9, ........
Took 0.0031 seconds
hbase:001:0>
Works!!
Exploring the HBase shell commands:
Quick commands to get started from the reference guide.
Create table - must provide one column family along with table name
hbase:003:0> create 'test', 'cf'
Created table test
Took 1.2568 seconds
=> Hbase::Table - test
hbase:004:0> list 'test'
TABLE
test
1 row(s)
Took 0.0358 seconds
=> ["test"]
hbase:005:0> describe 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODIN
G => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLIC
ATION_SCOPE => '0'}
1 row(s)
Quota is disabled
Took 0.1286 seconds hbase:006:0> `put 'test', 'row1', 'cf:a', 'value1'`
Took 0.1017 seconds hbase:007:0> `put 'test', 'row2', 'cf:b', 'value2'`
Took 0.0069 seconds hbase:008:0> `put 'test', 'row3', 'cf:c', 'value3'`
Took 0.0150 seconds
hbase:009:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=2022-02-09T15:53:39.281, value=value1
row2 column=cf:b, timestamp=2022-02-09T15:54:02.022, value=value2
row3 column=cf:c, timestamp=2022-02-09T15:54:14.169, value=value3
3 row(s)
Took 0.0546 seconds
hbase:010:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=2022-02-09T15:53:39.281, value=value1
1 row(s)
Took 0.0136 seconds
hbase:011:0> disable 'test'
Took 0.3587 seconds
hbase:012:0> list 'test'
TABLE
test
1 row(s)
Took 0.0049 seconds
=> ["test"]
hbase:013:0> put 'test', 'row4', 'cf:d', 'value4'
ERROR: Table test is disabled!
For usage try 'help "put"'
Took 0.4446 seconds
hbase:014:0> enable 'test'
Took 0.6702 seconds
hbase:015:0> put 'test', 'row4', 'cf:d', 'value4'
Took 0.0094 seconds
hbase:016:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=2022-02-09T15:53:39.281, value=value1
row2 column=cf:b, timestamp=2022-02-09T15:54:02.022, value=value2
row3 column=cf:c, timestamp=2022-02-09T15:54:14.169, value=value3
row4 column=cf:d, timestamp=2022-02-09T15:56:33.403, value=value4
4 row(s)
Took 0.0160 seconds
hbase:017:0> disable 'test'
Took 0.3560 seconds
hbase:018:0> drop test
Traceback (most recent call last):
ArgumentError (wrong number of arguments (given 0, expected 2..3))
hbase:019:0> drop 'test'
Took 0.1380 seconds
hbase:020:0> list
TABLE
0 row(s)
Took 0.0116 seconds
=> []
Standalone mode with HBase over HDFS
To run in standalone mode and use HDFS instead of local filesystem
Check jps to check nothing is running. Stop HBase if running.
stop-hbase.sh
stopping hbase..............
jps
86473 Jps
To configure this standalone variant, edit your hbase-site.xml setting hbase.rootdir to point at a directory in your HDFS instance but then set hbase.cluster.distributed to false.
Update conf/hbase-site.xml
<property>
<name>hbase.cluster.distributed</name>
<value>false</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
# Remove existing configuration for hbase.tmp.dir and hbase.unsafe.stream.capability.enforce
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
Start Hadoop
sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [Shouviks-MacBook-Pro.local]
jps
87649 NameNode
87761 DataNode
87991 Jps
87898 SecondaryNameNode
Verify http://localhost:9870/dfshealth.html#tab-overview
Start HBase
bin/start-hbase.sh
The authenticity of host '127.0.0.1 (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:.....
Are you sure you want to continue connecting (yes/no/[fingerprint])? y
Please type 'yes', 'no' or the fingerprint: yes
127.0.0.1: Warning: Permanently added '127.0.0.1' (ECDSA) to the list of known hosts.
127.0.0.1: running zookeeper, logging to /..../opt/hbase-2.4.9/bin/../logs/hbase-shouvik-zookeeper-....local.out
running master, logging to /..../opt/hbase-2.4.9/bin/../logs/hbase-shouvik-master-....local.out
: running regionserver, logging to /..../opt/hbase-2.4.9/bin/../logs/hbase-shouvik-....local.out
Issue faced:
jps
3075 DataNode
2964 NameNode
4134 Jps
3214 SecondaryNameNode
jps command should show the HMaster processes running. HMaster was not running
Solution:
- Cleaned up Hadoop configuration - stopped dfs, deleted /tmp/hadoop-dir, name node format, start hadoop
- If there is a Hadoop installation and a path exists in .zshrc file, comment it to prevent teh Log4J error
bin/start-hbase.sh
127.0.0.1: running zookeeper, logging to /..../opt/hbase-2.4.9/bin/../logs/hbase-shouvik-zookeeper-.....local.out
running master, logging to /..../opt/hbase-2.4.9/bin/../logs/hbase-shouvik-master-.....local.out
: running regionserver, logging to /..../opt/hbase-2.4.9/bin/../logs/hbase-shouvik-regionserver-......local.out
jps
3075 DataNode
2964 NameNode
4084 HMaster
4134 Jps
3214 SecondaryNameNode
HMaster process running after the cleanup.
Exploring HBase and data directories
bin/hdfs dfs -ls /
drwxr-xr-x - shouvik supergroup 0 2022-02-09 21:08 /hbase
hbase shell
hbase:001:0> create 'test', 'cf'
Created table test
Took 1.9204 seconds
=> Hbase::Table - test
hbase:002:0> put 'test', 'row1', 'cf:a', 'value1'
hbase:003:0> put 'test', 'row2', 'cf:b', 'value2'
hbase:004:0> put 'test', 'row3', 'cf:c', 'value3'
hbase:007:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=2022-02-09T21:24:50.673, value=value1
row2 column=cf:b, timestamp=2022-02-09T21:24:58.527, value=value2
row3 column=cf:c, timestamp=2022-02-09T21:25:04.519, value=value3
3 row(s)
Took 0.0625 seconds
Following directories were created after table creation. This confirms that HBase is using the HDFS file system
bin/hadoop fs -ls /hbase
/hbase/.hbck
/hbase/.tmp
/hbase/MasterData
/hbase/WALs
/hbase/archive
/hbase/corrupt
/hbase/data
/hbase/hbase.id
/hbase/hbase.version
/hbase/mobdir
/hbase/oldWALs
/hbase/staging
Pseudo-Distributed mode with HBase over HDFS
In Pseudo-distributed mode HBase runs each of the following daemons as a separate process on a single host
- HMaster
- HRegionServer
- ZooKeeper
Stop HBase if running: bin/stop-hbase.sh
Update conf/hbase-site.xml:
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
Note: HBase will create /hbase when it is started. No need for manual creation.
Start Hbase: bin/start-hbase.sh
jps
37088 DataNode
**38305 HRegionServer**
38434 Jps
36978 NameNode
37226 SecondaryNameNode
**38139 HMaster**
38063 HQuorumPeer
hdfs dfs -ls /
drwxr-xr-x - shouvik supergroup 0 2022-02-17 15:50 /hbase
hdfs dfs -ls /hbase shows the list of directories:
/hbase/MasterData
/hbase/WALs
/hbase/archive
/hbase/corrupt
/hbase/data
/hbase/hbase.id
/hbase/hbase.version
/hbase/mobdir
/hbase/oldWALs
/hbase/staging