Posts/Projects - Joseph-T-Gordon

Post #176 (AWS) – Rekognition and where I’ve been

Posted on January 9, 2024January 10, 2024 by admin

Amazon Rekognition is an AI tool that can be used to located text, faces, people, or scenes in images in videos. You can create your own database of “familiar faces” for Rekognition to use as well. Recognition can be used to look for a particular individual, a set of features, a specific object, text detection,…

Post #175 (AWS) – Big Data Ingestion Pipeline

Posted on December 13, 2023December 13, 2023 by admin

The Big Data Ingestion Pipeline is the architecture for your big data analytics. For instance it could be fully serverless, data is transformed and queries can be ran on data via SQL, the reports of the queries are stored in S3, and the data is then sent to a data warehouse where dashboards can be…

Post #174 (AWS) – Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Posted on December 13, 2023December 13, 2023 by admin

Amazon MSK is best labeled as an alternative to Kinesis, they both allow you to stream data taking input from producers and having data pulled from them by consumers. MSK allows you have run a fully managed Apache Kafka cluster on AWS, MSK will create and manage nodes, is highly available with 3 AZ, automatic…

Post #173 (AWS) – Kinesis Data Analytics

Posted on December 13, 2023 by admin

Kinesis Data Analytics for SQL applications can accept data from Kinesis Data Stream and Kinesis Data Firehose for analysis and run SQL queries on this data. You can send the results of the analytics back through Kinesis Data Firehose to a Firehose destination such as S3, Redshift, etc. You can send the data through Kinesis…

Post #172 (AWS) – Lake Formation

Posted on December 12, 2023 by admin

Lake Formation simplifies the creation of Data Lakes, pools of data where multiple data sources are brought together for analytics. It sits on top of AWS Glue, the data goes from Glue to your Data Lake. You can have fine grain access control, row and column level. A key usage of Lake Formation is the…

Post #171 (AWS) – AWS Glue

Posted on December 11, 2023 by admin

Glue is a ETL, Extract Transport Load, service. It is serverless and used to transform and prepare data for analytics. For example you could take data for S3 and RDS and load it into Redshift. You can also convert data into the Parquet format utilizing glue, which is columnar, making it better suited for analysts…

Post #170 (AWS) – QuickSight

Posted on December 11, 2023December 11, 2023 by admin

QuickSight is used for business analytics and offers a dashboard in order to review data. It can perform IN-MEMORY computation using SPICE by having the data loaded directly into it, allowing for very fast compute times. You can integrate Quicksight with a data source as well so you don’t have to load data directly into…

Post #169 (AWS) – EMR, Elastic MapReduce

Posted on December 11, 2023 by admin

Amazon EMR, is made for data analytics on big data clusters. It comes pre-packaged with tools perfect for big data specialists. It can be made of hundreds of EC2 instances. EMR also supports autoscaling, and spot instances. EMR can be used for data processing, machine learning, web indexing, etc. In your EMR cluster you have…

Post #168 (AWS) – OpenSearch

Posted on December 11, 2023December 11, 2023 by admin

AWS OpenSearch is a utility that allows you to search databases, not just for the primary key, but for any field, even partial matches. OpenSearch is best used as a complement to your database to add additional search functionality. It is important to note as well, OpenSearch was formally ElastiSearch. You can deploy OpenSearch via…

Post #167 (AWS) – Redshift for analytics

Posted on December 7, 2023December 13, 2023 by admin

Redshift is based upon PostgreSQL and is used for OLAP, online analytical processing. Data must be loaded into Redshift before you can run queries on data, however it is extremely efficient at data analysis once data is loaded, data can be store directly in Redshift but I would recommend storing data in RDS or S3….