Skip to content

Favorite-Icons-Of-Internet/Favorite-Icons-Of-Internet

Repository files navigation

Favorite Icons Of Internet

Art project that aims to depict the vastness and colorfullness of the internet.

You can see the result of all the crawling and image-crunching at FavoriteIconsOfInternet.com

Our current goal is to bring the project to the state where we can keep the history of daily favicon changes for at least a million web sites.

Favorite Icons

Workers

This project uses Phantom Of The Cloud image to launch workers for parallelizable steps (3, 4, 6 and eventually 8), AWS auto-scaling groups can be used to speed-up or slow down processing.

Processing Steps

Step 1. Load domains

Updates a list of domains in the database, currently takes a list of Alexa Rankings.

Runs on central box. See steps_1_and_2.sh

Gets a list of domains to crawl (currently only active Alexa domains) and uploads them to a queue in chunks for crawlers to pick up

Runs on central box. See steps_1_and_2.sh

Step 3. Fetch icons

Listens for messages in a queue and crawls the sites in the message finding favorite icons and comparing them to existing version to see if the have changed.

Runs on crawler workers. See steps_3_and_4.sh

After all icons are fetched, convert them to PNG, calculate average color and upload to results storage together with manifest describing which icons are new, which has changed and etc.

Runs on crawler workers. See steps_3_and_4.sh

Gather all the results and update the database. Calculate a list of tiles that need to be updated (currently all tiles with predefined width/height ordered by Alexa ranking) and put each tile as a job into a queue.

Generate HTML and necessary JSON metadata.

Runs on central box. See step5.sh

Step 6. Generate tiles 🔴

Grab images required for the tile (or sync them all) and generate a tile. Optimize the image using smu.sh and deploy to a CDN.

Runs on tile workers. TBD (To Be Developed)

Once all tiles are done, move HTML and metadata chunks over to production!

Runs on central box. TBD (To Be Developed)

Notify users (if any), send daily newsletter and etc.

Runs on central box (and SMTP workers if load is high). TBD (To Be Developed)

About

Software that runs FavoriteIconsOfInternet.com crawler and web site

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published