Web Scraping Interview Questions

4 questions developers actually get asked about web scraping — with clear, practical answers you can use to prepare.

Q1. What is web scraping and how does it work?

Web scraping is the process of using a program to fetch a web page, read its HTML source code, and extract specific pieces of data. The program sends an HTTP request (just like a browser), receives the HTML back, then uses a parsing library like BeautifulSoup or Cheerio to locate elements using CSS selectors such as .price or #product-title.

Q2. What is the difference between web scraping and using an API?

An API is a structured endpoint that a website provides on purpose, returning clean data in formats like JSON. Scraping parses the raw HTML that was designed for human visitors. APIs are faster, more reliable, and explicitly permitted. Scraping is a fallback for when no API exists.

Q3. How do you handle websites that block scrapers?

Websites detect scrapers by looking at request speed, missing browser headers, and repeated patterns. You can add realistic User-Agent headers, insert random delays between requests (1-3 seconds), and rotate IP addresses with proxies. Respecting the site's robots.txt file also helps avoid blocks.

Q4. What is robots.txt and why should scrapers respect it?

robots.txt is a file at the root of every website (example.com/robots.txt) that tells automated programs which pages they can and cannot access. Ignoring it can get your IP permanently blocked, violate the site's terms of service, or create legal risk. Always check it before scraping a new site.

Want the full concept, analogies, and AI prompts for web scraping?

Read the full Web Scraping block →