Posts

Showing posts from December, 2020

Lets learn "About web crawler system design"

Image
  In this tutorial, we'll learn about the system design for the distributed web crawler. A web crawler is also known as a spider bot. It's a framework or tool or software which is used to collect the web pages and save them in a way to easily access them. This process is known as web indexing. Different use-cases of web crawlers  Search Engine: E.g: whenever you search something it returns the relevant pages as we've already stored the web pages in an optimized fashion.   Copyright violation detection: This means finding users using copyright content and sue them. Keyword-based finding: This is similar to the search engine which helps to find web pages based on keywords. Web malware detection: To detect if someone has created similar pages for phishing attacks. Like amazon could be doing web scrawling to search web pages similar to its own web page that some hacker is using for phishing. Web analysis: Used by some companies/data scientists to characterize the feature of web

Lets learn "System design for paste bin (or any text sharing website)"

Image
  In this tutorial, we'll learn about the system design for PasteBin. If you don't know what Pastebin is here goes its definition. Pastebin is a type of online content hosting service where users can store texts to share with others over the internet.  Pastebin system design requirements  Functional Basic - User should be able to paste the text, generate the link and share the link with other users. There should be the max size of the text that our system will support (say 10Mb) Users should be able to provide the custom URL path for the link created. Like if a user wants to create the links  www.pb.com/letlearn, he can give this URL and pastebin should create a link using this URL to share with others. The link or paste expiry feature should be there for each stored text. There can be default expiry time say 1 year but this should be configurable. Personalization means the user should be able to login to track all the pastes created by him and manage them. Non-Functional Durab