Scaling Amazon Kinesis Data Streams involves managing the number of shards in your stream. Shards are the fundamental unit of scaling, and you can adjust the number of shards in a stream based on your application’s throughput requirements. Here’s how you can scale your Kinesis Data Streams using shard splitting and merging, and how to calculate throughput and monitor the stream.

### Shard Basics
A shard in Kinesis Data Streams has a fixed capacity:
– **Write capacity**: A shard can handle up to 1,000 records per second or 1 MB/sec of data ingress.
– **Read capacity**: A shard can serve up to 2 MB/sec of data egress.

### Scaling with Shard Splitting
When you anticipate an increase in data throughput, you can increase the number of shards by splitting one or more existing shards.
– **Splitting a Shard**: You can split a shard into two shards, effectively doubling the throughput capacity of the original shard. This is achieved using the `UpdateShardCount` API or AWS Management Console by explicitly directing which shard to split.
– **Throughput Calculations**: If each original shard supports 1 MB/sec, splitting it results in two shards, each also supporting 1 MB/sec of input capacity. Thus, splitting the shard does not increase the overall capacity of each shard but rather increases parallelism — allowing more data to be written into and read from the stream concurrently.

### Scaling with Shard Merging
When your data throughput decreases, reducing the number of shards can help save costs.
– **Merging Shards**: You can merge two adjacent shards into a single shard, which is effective when the combined traffic load of those shards declines significantly and it becomes economical to do so.
– **Throughput Calculations**: After merging, the resulting shard will have the same capacity limits (1 MB/sec ingress and 2 MB/sec egress) as any other single shard.

### Throughput Calculations
To determine the number of shards you need:
1. **Estimate Peak Throughput**: Know your peak data input rate in MB/second and the number of records per second.
2. **Calculate Required Shards**:
– Use the formula for data size: `Number of Shards = ceil(Total data input in MB/sec / 1 MB/sec per shard)`
– Use the formula for record count: `Number of Shards = ceil(Total number of records/sec / 1000 records/sec per shard)`
3. Choose the higher number from the calculations above to ensure your stream can handle both throughput limits.

### Monitoring
Monitoring is critical to effectively manage and scale your Kinesis Data Streams.
– **Amazon CloudWatch**: Use CloudWatch metrics such as `IncomingBytes`, `IncomingRecords`, `ReadProvisionedThroughputExceeded`, and `WriteProvisionedThroughputExceeded` to gauge if the stream is being overutilized.
– **Kinesis Data Analytics**: For deeper insights and real-time analytics, you can use Kinesis Data Analytics to process and analyze stream data.
– **Scaling Automation**: Consider implementing automated scaling with AWS Lambda functions or AWS Application Auto Scaling, responding to CloudWatch alarms indicative of streams nearing their capacity.

By consistently monitoring these metrics and understanding the throughput patterns, you can make informed decisions about when to split or merge shards, ensuring optimal performance and cost-efficiency of your Kinesis Data Streams.

Scroll to Top