Scraping LivestreamFails part 2

Part 1 here

In part 1 the focus was on scraping and storing the data. Now let’s get into using the stored data to download the videos.

What I needed was the videos title and the video files url. The idea was to run a loop with FFmpeg in php, copying the video file and naming it as the title.

I wanted to format the would be filename to replace spaces with _ and to filter out characters that cannot be used for filenames. I did this with a chyrp function:

function sanitize($string, $force_lowercase = true, $anal = false) {
    $strip = array("~", "`", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "=", "+", "[", "{", "]",
                   "}", "\\", "|", ";", ":", "\"", "'", "‘", "’", "“", "”", "–", "—",
                   "—", "–", ",", "<", ".", ">", "/", "?");
    $clean = trim(str_replace($strip, "", strip_tags($string)));
    $clean = preg_replace('/\s+/', "-", $clean);
    $clean = ($anal) ? preg_replace("/[^a-zA-Z0-9]/", "", $clean) : $clean ;
    return ($force_lowercase) ?
        (function_exists('mb_strtolower')) ?
            mb_strtolower($clean, 'UTF-8') :
            strtolower($clean) :
        $clean;
}

The other aspect is FFmpeg with PHP, i have done plenty of FFMpeg posts in the past but using it with PHP was something new to me. The process is easy, just drop ffmpeg.exe into the web server root and use the following:

shell_exec("ffmpeg -i ".$video_url." -c copy ".$filename.".mp4");

To finish up i simply called from the database and for each row it would run the loop that fetches the title and video link, parses the title, downloads the video and saves it as the cleaned title.

The main issue was having a server with enough storage to cope with the dump.