Scraping LivestreamFails Part 1

I do not encourage web scraping nor will I post full code to do so, this was simply a challenge to myself and was documented as so.

LivestreamFails the place to view streamers doing stupid/funny/illegal/annoying things provides a clip which Twitch cannot remove as well as a points parameter, NSFW notice and link to Reddit post discussing the “fail”.

Given these clips are 720p, massively compressed and short in length I had thoughts about doing a “back up” of the whole site. This vision became easier when you realize how the website is laid out.

The first clip is at all the way through to the latest post at The best way to describe this layout is predictive. I know the URL to all of LivestreamFails clips thanks to this design.

To scrap every clip I just need to remember the last “id” I did, +1 onto it and repeat the process. Using random strings or the title as the URL would prevent this. I can’t predict or “”.

With the looping process done I just needed to fetch data from the page and store it. Using PHP simple HTML DOM parser this was done easy. The data I needed was:

  • Title
  • Streamer
  • Game
  • Video link (MP4)
  • Thumbnail link
  • Score/upvotes
  • Reddit link
  • NSFW or not?
  • LivestreamFails URL

Once I had done the conditions with the DOM parser correctly it was just a matter of storing them in a MYSQL database.

I did get restricted to only doing 12 (at most) pages per try. To get around this I would limit the loops to 10 pages, sleep 5 seconds and do another 10. This process took around 25 seconds, which was important to be under 30 seconds so the cron job service got the impression it was a live link and didn’t stop it.

The process

The overall process looked like this with a cron job for every minute:

  1. Call from database the highest id (latest scrape identifier)
  2. +1 onto it, loop from this number through to +10 (10 Entries)
  3. Sleep 5 seconds
  4. Repeat once more

This was getting 28,800 results every 24 hours. LivestreamFails has 28,000 clips, its good enough.

Once i have all the details for each clip I can move onto the second part which is downloading, renaming and storing the MP4 files.