Previously, I wrote how to web scrape data on https://www.appilize.com/web-scraping-architecture-on-aws/

Here is an overview of the different projects

Horoscrape

The name “horo-scrape” was coined , as this project scrapes daily horoscope.
An AWS Lambda fetches, parses the HTML. The data are stored in DynamoDB.
Each Lambda has an EventBridge rule set, in order to define what time the web scrapping should start.
The web scrapping is done in Python 3 .
The IaC is written in AWS Cloud Development Kit (CDK) in Python. CDK synthesizes into lower code for the service CloudFormation. CloudFormation code in YAML or JSON is verbose. CDK code is much shorter than CloudFormation. In CDK, you can use control flow ( loops, if else condition .. )
The lambda are given permission to write in their DynamoDB table.

Serverless web scrapper orchestrated at specific time.

Goroscope

“Goroscope” is named after “Go roscope” because it was written in Golang.
In order to save on compute resources, the main advantages of Golang are fast execution and small footprint. It is to write and get started.

The infrastructure as Code ( IaC ) was written in Terraform. The AWS DynamoDB infrastructure was not created by this project. Terraform handled the policies to read in DynamoDB.

Each Lambda has its own associated DynamoDB table AWS policy getItem (permission to read in the database).

API Gateway is an API management (similar to Apigee acquired by Google ). Several endpoints expose public APIs ( accessible without being authenticated ).

As frequent requests to the same data source will choke the database, it is good practice to cache the data. For now, DynamoDB Accelerator (DAX) handles the caching.

Goroscope : proxy providing the API to any public client.

Later, end users will be able to post, comment, rate about their daily horoscope.
This part will expose a POST endpoint. The database choice is still open depending on What to store.

The frontend

The frontend can read secure through Goroscope API.

For now and simplicity, I created a web frontend with Python Sanic web framework and its HTML response .

Later, I will integrate with Jinja2 template to have prettier CSS. Yes, the table at the index page looks broken without CSS.

It is hosted on https://sanic-astro.fly.dev

The first version was hosted on https://sanic-start.onrender.com/ . For this one, please wait for 50 seconds before the start time.

Overall

Overall , these 3 projects/ micro-services diagram are below.
The dataflow goes from left to right. When the frontend requests data, the data fetched stops at the service in the middle (Goroscope). Goroscope acts as a public proxy. it provides a public non authenticated API to read the data scrapped previously and stored.

Overall architecture
Overall architecture

Later, there will be a native mobile app at the frontend.