Web scraping the action of pulling and organizing data from a website. This data could be a tables, pages, images or even the body text. A websites contents could mean anything or have any use to anybody.
Python is the most commonly used language to scrape, there are plenty of frameworks to help in scraping en devours.
In a lot of industries there are unwritten codes, or actions that aren’t received very well. In the web dev world it can take countless hours to get a website to its peak and to have someone come a long and rip its contents of can have many effects.
Essentially how you scrape and what you do with the contents is what counts. DDOS a website and put it under high load when scraping isn’t cool as is ripping the websites contents to use as a spin off website. Not nice at all and it is illegal (If the content is that websites own).
Found a hidden API? great don’t abuse access to it or display/access inside information with it (if that is possible). Avoid dealing with customer or personal data.
If you’re not using an API it probably best to not post the full scraped contents online for all to see and use this can leave you open to legal issues (Angry web devs). Getting data to make graphs or statistics is ok because you’re not just posting the full data allowing others to manipulate it as they see fit.
Whether you’re making your own API, archiving data or making content for Reddit karma be respectful and mindful of why you’re scraping.