One hundred and two million unique strings

Using mixed character strings to identify objects makes more sense due to the greater amount you can assign and their complete unpredictability.

Why strings?

Using numerals 1 through to 234,567,890 as an example means object identifiers can be predicted and you will eventually run out and/or use more space.

Choosing several mixed characters from 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ gives you much much more flexibility and identifiers to assign…









Changing just the two last characters  from this string alone grows its variance.

The use of strings as assigners is prevelant throughout the web, YouTube for videos, Reddit for posts, Imgur for URL’s and GitHub for Gist’s the list goes on and on.

A negative to generating random strings for database assignment over using incrementing id’s is checking that the random string you generated hasnt been used. A way around this is a pre-prepared pool filled with unique strings you can just grab and delete from.

Ridiculous or going the extra mile?

With PHP i generated and inserted 102 million rows, just under 6GB in size the table contains 2 columns, one for an incrementing id and the other being a unique 8 character string.

102 million mysql rows hashesThe strings/hashes were made with this function:

function random_str($length, $keyspace = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')
    $pieces = [];
    $max = mb_strlen($keyspace, '8bit') - 1;
    for ($i = 0; $i < $length; ++$i) {
        $pieces [] = $keyspace[random_int(0, $max)];
    return implode('', $pieces);


Generating a random id and then fetching the assigned string would be executed in times such as 0.0098 seconds, granted the fast NVMe read speed and the 4.2Ghz CPU was making the MariaDB 10.3.16 database server very snappy.

Upon first seek the time was slower but once cached this was no issue:

1st: 1.531 seconds for id 63,429,181

2nd: 0.012 seconds for id 11,447,994

3rd: 0.009 seconds for id 82,614,173

Considering these times vs generating a string on the spot and searching if it has been used i would take the former. The generating part has been done just grab a row and delete it.

Finding the id for a random string (DB4LyctD) took just 0.009 seconds.


The database in its raw form is massive at 5.9 GB, you wouldnt want to be loading this up in a text editor nor directly importing it into your database client.

Putting the whole table into a zip file with Heidi SQL took 36 minutes and the size was 1.06 GB.

If you have the space, time and care why wouldnt you build a string pool?