Skip to main content
Version: 1.3.1.0

Worker Node Sizing

Worker nodes in an ODP cluster provide the storage and compute capacity for data workloads. Each worker node typically runs HDFS DataNode, YARN NodeManager, and optionally co-located services such as HBase RegionServer, Impala daemon, Kafka broker, or Kudu tablet server.

This page covers HDFS storage planning, YARN memory and vcore allocation, disk layout best practices, and Kafka co-location sizing.

HDFS Storage Planning

Replication Factor

HDFS stores each data block across multiple nodes for fault tolerance. The default replication factor is 3, meaning each block consumes 3x the logical data size in raw disk space.

Raw storage formula:

Required raw storage per node = (total logical data / number of data nodes) × replication factor × 1.25

The 1.25 factor accounts for HDFS overhead (block metadata, intermediate files, temporary job data, and free space to prevent DataNode disk full conditions).

Example:

  • 100 TB logical data, 10 worker nodes, replication factor 3
  • Per-node raw: (100 TB / 10) × 3 × 1.25 = 37.5 TB raw disk per worker

Disk Configuration — JBOD Only

warning

Do not use hardware RAID (RAID 5, RAID 6, RAID 10) for HDFS data disks. HDFS provides its own fault tolerance through replication. Hardware RAID adds cost and complexity without benefit, and can reduce throughput due to parity overhead.

Configure HDFS data disks as JBOD (Just a Bunch of Disks). Each disk is mounted independently (e.g., /data/disk1, /data/disk2, ..., /data/diskN) and listed in dfs.datanode.data.dir.

Example dfs.datanode.data.dir configuration for a node with 8 data disks:

/data/disk1/hdfs,/data/disk2/hdfs,/data/disk3/hdfs,/data/disk4/hdfs,
/data/disk5/hdfs,/data/disk6/hdfs,/data/disk7/hdfs,/data/disk8/hdfs

This allows HDFS to distribute blocks across all spindles, maximizing aggregate I/O throughput.

DataNode Heap

The DataNode JVM heap scales with the number of blocks managed per node. A commonly used formula is 1 GB per 1 million blocks stored on the node.

JVM heap example (hadoop-env):

HADOOP_DATANODE_OPTS="-Xms2g -Xmx4g -XX:+UseG1GC"

For most worker nodes in medium clusters, 4 GB DataNode heap is sufficient.

YARN Memory and vCore Allocation

YARN allocates container resources from the NodeManager's configured capacity. Proper sizing avoids both resource underutilization and OOM conditions.

Total Available Memory

The available YARN memory (yarn.nodemanager.resource.memory-mb) should be:

yarn.nodemanager.resource.memory-mb = total RAM - OS overhead - co-located services heap

Example for a 128 GB worker node with HBase RegionServer (16 GB heap) and DataNode (4 GB heap):

128 GB - 8 GB (OS) - 16 GB (HBase) - 4 GB (DataNode) - 4 GB (buffer) = 96 GB → 98304 MB

Total Available vCores

Set yarn.nodemanager.resource.cpu-vcores to the number of logical CPUs on the node minus overhead for OS and co-located services:

yarn.nodemanager.resource.cpu-vcores = total logical CPUs - 2 (OS) - [cores reserved for co-located services]

For a 24-core node: 24 - 2 = 22 vcores available to YARN.

Container Memory Settings

ParameterRecommended value
yarn.scheduler.minimum-allocation-mb1024 MB (1 GB)
yarn.scheduler.maximum-allocation-mbEqual to nodemanager.resource.memory-mb
yarn.scheduler.minimum-allocation-vcores1
yarn.scheduler.maximum-allocation-vcoresEqual to nodemanager.resource.cpu-vcores

MapReduce and Spark Memory Defaults

For MapReduce:

  • mapreduce.map.memory.mb: 2048–4096 MB
  • mapreduce.reduce.memory.mb: 4096–8192 MB

For Spark (per executor):

  • spark.executor.memory: 4–16 GB depending on workload
  • spark.executor.cores: 2–5 (avoid single-core executors for efficiency)

Disk Layout Best Practices

Separation of OS and Data Disks

Always use separate disks for the operating system (and service logs) versus HDFS data:

Disk roleRecommended typeMount points
OS + logsSSD 200–500 GB/, /var/log/
HDFS dataHDD 4–12 TB each/data/disk1 ... /data/diskN
Kudu data (if deployed)NVMe / SATA SSD/kudu/disk1 ... /kudu/diskN
Kafka logs (if deployed)HDD or SSD/kafka/disk1 ... /kafka/diskN

HDFS Short-Circuit Reads

Enable HDFS short-circuit reads (dfs.client.read.shortcircuit=true) to allow co-located compute tasks (MapReduce, Spark, Impala) to read HDFS blocks directly from local disk via Unix domain socket, bypassing the DataNode network stack. This significantly reduces latency for local reads.

Kafka Broker Sizing (Co-Located)

When Kafka brokers are co-located with HDFS DataNodes on worker nodes (acceptable for small to medium clusters), follow these guidelines:

ResourceGuidance
Kafka heap4–8 GB (-Xms6g -Xmx6g)
Kafka log disksSeparate disks from HDFS data disks
Kafka log retentionSize per broker: partitions × retention size per partition
NetworkKafka is network-intensive; 10 GbE minimum, 25 GbE recommended for high-throughput topics

JVM heap example (kafka-env):

export KAFKA_HEAP_OPTS="-Xms6g -Xmx6g"

Key Kafka broker settings for co-located deployments:

  • Set log.dirs to dedicated disks (not shared with HDFS or OS)
  • Set num.io.threads and num.network.threads based on available cores (typically num.io.threads = 2 × number of data disks, num.network.threads = 3–8)
  • Tune replica.fetch.max.bytes and message.max.bytes for your expected message sizes
note

For large Kafka deployments (high message throughput or large retention), consider dedicated Kafka nodes rather than co-location with HDFS DataNodes to avoid I/O and network contention.

HBase RegionServer Sizing (Co-Located)

HBase RegionServers are typically co-located on worker nodes alongside HDFS DataNode and YARN NodeManager.

ResourceGuidance
RegionServer heap16–32 GB
hfile.block.cache.size0.40 (40% of heap for read cache)
hbase.regionserver.global.memstore.size0.40 (40% of heap for write buffer)
Regions per RegionServer20–200 (start with 50 and tune based on workload)

JVM heap example (hbase-env):

HBASE_REGIONSERVER_OPTS="-Xms16g -Xmx16g -XX:+UseG1GC -XX:MaxGCPauseMillis=100"

Use G1GC for HBase RegionServers to minimize GC pause times, which can cause region server timeouts and unnecessary region reassignment.