Pushshift is a third party Reddit API useful to find comments and submissions (posts) from the past or that are otherwise archived.
Searching submissions uses this endpoint:
https://api.pushshift.io/reddit/search/submission/
Importantly there are a great number of parameters to better define your search. Here are some important ones:
subreddit
the name of the subreddit you want to search.
score
submissions that =, > or < a score (upvote).
domain
submissions for a domain URL (eg youtube.com or imgur.com).
after
submissions after a date and time (unix format).
before
submissions before date and time (unix format).
sort_type
sort posts by value (“score”,”num_comments”,”created_utc”).
sort
ascending or descending.
To search query post titles use title
. To query posts from a certain user: author
.
size
controls the number of results returned, the maximum amount is 100.
The goal is to query the top 100 posts each month (2008 to 2015) at r/nba which was a youtube.com submission. Unlike today’s rapid-fire and almost live highlight clips the posting of videos was still scarce particularly before 2014.
Define the months and years in arrays:
$months_array = [ '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12' ]; $years_array = [ '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015' ];
Looping through each month for every year and verify it is a valid date:
foreach ($years_array as $y) { foreach ($months_array as $m) { if (checkdate($m, '01', $y)) { //Is valid } } }
Set the timezone to UTC (Reddit uses this) and convert the previous date and current date into Unix format to get the after and before parameter values.
Doing this will help refine the results down as you can only get a maximum of 100 results per query.
date_default_timezone_set('UTC'); foreach ($years_array as $y) { foreach ($months_array as $m) { if (checkdate($m, '01', $y)) { $after_dt = new DateTime("{$y}-{$m}-01 {$after_time}"); $after_unix = $after_dt->getTimestamp(); $before_dt_str = date('Y-m-d', strtotime("+1 months", strtotime("{$y}-{$m}-01 {$before_time}"))); $before_dt = new DateTime($before_dt_str); $before_unix = $before_dt->getTimestamp(); echo "{$after_dt->format('Y-m-d')} $before_dt_str<br>"; } } }
This will output:
2008-01-01 2008-02-01 2008-02-01 2008-03-01 2008-03-01 2008-04-01 2008-04-01 2008-05-01 2008-05-01 2008-06-01 2008-06-01 2008-07-01 2008-07-01 2008-08-01 2008-08-01 2008-09-01 2008-09-01 2008-10-01 2008-10-01 2008-11-01 2008-11-01 2008-12-01 2008-12-01 2009-01-01 2009-01-01 2009-02-01 2009-02-01 2009-03-01 2009-03-01 2009-04-01 2009-04-01 2009-05-01 2009-05-01 2009-06-01 2009-06-01 2009-07-01 2009-07-01 2009-08-01 2009-08-01 2009-09-01 2009-09-01 2009-10-01 ......
All that’s left now is to add in the GET API call for the built Pushshift URL:
$url = "https://api.pushshift.io/reddit/search/submission/?subreddit=nba&after={$after_unix}&before={$before_unix}&metadata=true&domain=youtube.com&sort_type=score&sort=desc&size=100";
You can utilize a cURL function or simply use file_get_contents() do be mindful of rate limits (sleep 1 second for each month loop is suitable):
foreach ($years_array as $y) { foreach ($months_array as $m) { if (checkdate($m, '01', $y)) { $after_dt = new DateTime("{$y}-{$m}-01 {$after_time}"); $after_unix = $after_dt->getTimestamp(); $before_dt_str = date('Y-m-d', strtotime("+1 months", strtotime("{$y}-{$m}-01 {$before_time}"))); $before_dt = new DateTime($before_dt_str); $before_unix = $before_dt->getTimestamp(); $url = "https://api.pushshift.io/reddit/search/submission/?subreddit=nba&after={$after_unix}&before={$before_unix}&metadata=true&domain=youtube.com&sort_type=score&sort=desc&size=100"; $data = json_decode(file_get_contents($url), true); //Do stuff with response } } }
Accessing each post from the returned data is done with a foreach loop:
foreach ($data['data'] as $p) { $p['id'];//Post id $p['title'];//Post title $p['author'];//Username $p['url'];//Submitted URL //etc }
A drained and empty Kennington reservoir images from a drone in early July 2024. The…
Merrimu Reservoir from drone. Click images to view larger.
Using FTP and PHP to get an array of file details such as size and…
Creating and using Laravel form requests to create cleaner code, separation and reusability for your…
Improving the default Laravel login and register views in such a simple manner but making…
Laravel validation for checking if a field value exists in the database. The validation rule…