Lets learn "About web crawler system design"
In this tutorial, we'll learn about the system design for the distributed web crawler. A web crawler is also known as a spider bot. It's a framework or tool or software which is used to collect the web pages and save them in a way to easily access them. This process is known as web indexing. Different use-cases of web crawlers Search Engine: E.g: whenever you search something it returns the relevant pages as we've already stored the web pages in an optimized fashion. Copyright violation detection: This means finding users using copyright content and sue them. Keyword-based finding: This is similar to the search engine which helps to find web pages based on keywords. Web malware detection: To detect if someone has created similar pages for phishing attacks. Like amazon could be doing web scrawling to search web pages similar to its own web page that some hacker is using for phishing. Web analysis: Used by some companies/data scientists to characterize the feature of web