Example – Data Lake consumer component
On AWS, the Kinesis Firehose service provides the essential components for processing a stream and storing the events in an S3 bucket, an Elasticsearch instance, or a Redshift instance. The following fragment from a serverless.yml file shows an example of connecting a Kinesis stream to an S3 bucket via Firehose. The entire solution can be configured declaratively as CloudFormation resources. In this example, a basic S3 bucket is defined to hold the data lake. Life cycle management and replication configurations are excluded for brevity. Next, a Firehose delivery stream is defined and connected to the Kinesis stream source and the S3 bucket destination. All of the necessary roles and permissions are also assigned, though the details are excluded for brevity. In this example, a compressed object will be written to S3 when the object size reaches 50 megabytes or after an interval of 60 seconds. To account for multiple streams, a delivery stream would be defined for each stream in the system's topology. Configuring a delivery stream for Elasticsearch or Redshift is virtually identical.
resources:
Resources:
Bucket:
Type: AWS::S3::Bucket
Properties:
BucketName: ${opt:stage}-${opt:region}-${self:service}-datalake
DeliveryStream:
Type: AWS::KinesisFirehose::DeliveryStream
Properties:
DeliveryStreamType: KinesisStreamAsSource
KinesisStreamSourceConfiguration:
KinesisStreamARN:
Fn::GetAtt:
- Stream
- Arn
RoleARN:
Fn::GetAtt:
- KinesisRole
- Arn
ExtendedS3DestinationConfiguration:
BucketARN:
Fn::GetAtt:
- Bucket
- Arn
Prefix: stream1/
BufferingHints:
IntervalInSeconds: 60
SizeInMBs: 50
CompressionFormat: GZIP
RoleARN:
Fn::GetAtt:
- DeliveryRole
- Arn