![]() In this step we will be using a tool called CloudFormation. Deploy the AWS Glue data catalog in CloudFormation AWS Glue is the perfect service for this use case.Ħ. To provide the data files with structure that services can reference, you need to set up a data catalog. S3 is a place to store many different kinds of data / files.Now that you’ve created a bucket, let’s set up a delivery stream for your data. Virginia) as the region where you want the bucket to reside Select “are you a developer? Create an app.”Īfter you create the bucket you cannot change the name, so choose wiselyĬhoose a bucket name that reflects the objects in the bucket because the bucket name is visible in the URL that points to the objects that you're going to put in your bucketįor information about naming buckets, see Rules for Bucket Naming in the Amazon Simple Storage Service Developer Guideįor Region, choose US East (N. Once your account is created, go to reddit developer console. Note: This Tutorial only works in region: us-east-1įollow prompts to create new reddit account: Basic Linux Experience: needed to troubleshoot any errors in the EC2 instance.AWS experience: Prior knowledge of base AWS infrastructure (VPC, EC2, S3) is helpful, but not required to complete this exercise.Skill level: A basic understanding of desktop computing is helpful but not required.An AWS account to provision the AWS infrastructure.An Internet browser of Chrome or Firefox.A laptop with Wi-Fi running Microsoft Windows, Mac OS X, or Linux.Use Athena to directly query your S3 bucket with SQL.Create a Glue data catalog via CloudFormation to provide schemas and structure to your data.Deploy and run an EC2 Streaming python app via CloudFormation.Provision a Kinesis Firehose Delivery Stream that will accept data from various sources and deliver it to the S3 bucket.Provision an S3 bucket to act as a data lake and the target for your stream data.Create a Reddit App using the Reddit developer site. ![]() With the help of AWS Glue and Amazon Athena, you’ll be able to develop insights on the data as it accumulates in your data lake. You will create a Kinesis Firehose delivery stream from an EC2 server to an S3 data lake. In this tutorial, you will play the role of a data architect looking to modernize a company’s streaming pipeline. The Reddit API offers developers a simple way to collect all of this data, which is a perfect use case to learn how to use Kinesis Firehose, S3, Glue, and Athena. At peak times, Reddit can see over 300,000 comments and 35,000 submissions an hour. Reddit is a popular social news aggregation, web content rating, and discussion website. Deploy the EC2 streaming server in CloudFormationĪWS provides several key services for an easy way to quickly deploy and manage data streaming in the cloud.Create a Key Pair for your streaming server.Set up Kinesis Firehose Delivery Stream.Deploy the AWS Glue data catalog in CloudFormation. ![]() Real-Time Reddit Streaming Solution: Self-Guided Tutorial A Kinesis Firehose, S3, Glue, Athena Use-case Updated July 2019
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |