06 Nov

Distributed File Storage, using PHP and MongoDB

Scenario:

  • Alice creates an entry on Server1, and uploads an image to it.
  • Bob views that entry on Server2, but can’t view the image because the server doesn’t have it.

There are a number of solutions to this.

  1. after each upload, push the new file out to all servers so they also have a copy
  2. mount an external file system, networked to all servers
  3. create a caching distributed file system centered around an external database

The first solution, ensuring that every uploaded file is simultaneously uploaded to all servers, is wrong for an obvious reason: hard-drive space. Imagine you have 20 servers and the file is likely only to ever be read on 3 of them (maybe they’re location-based?) – by uploading to all servers, you waste space, increasing storage costs and also slowing down the servers as they are busy doing work that they really don’t need to be doing.

The second solution is better – an external mounted solution such as NFS, S3QL, or Samba can store your files on file servers that are backed up and replicated, and are simultaneously available to all your web servers. But these solutions come at a huge speed cost – every file check involves network access, lock checking, POSIX compliance and other ugliness. Also, network file systems of this sort are very sensitive to network outages, however temporary they are.

The solution we will build in this article is to create an external file system that

  1. supports local caching of files for speed
  2. has immediate availability of files across all servers
  3. is “shardable”, so files only exist on servers where they are actually needed

Storage

To store the files, you need an external storage solution. For reasons that we will see later, the solution I use is MongoDB and its GridFS solution.

MongoDB is a NoSQL database, that stores information in binary JSON files. It is extremely scalable, and shards nicely as well, allowing us to concentrate more on our application and less on database maintenance.

To store the files, we will upload them into the MongoDB network, where they will be stored as “chunks”. Retrieving and storing the files is a simple matter, as we’ll see.

Saving Files

Up until now, all your files were recorded on the system using direct access – using file_put_contents(), for example.

We need to find all instances of these calls and route them through a new function called mdbFileSet (MongoDB File Set) that will record the file as requested, but will also upload it to the database.

In most cases, this is a simple matter – if the user-files directory is $_SERVER[‘DOCUMENT_ROOT’].’/userfiles/’, then a call such as file_put_contents($_SERVER[‘DOCUMENT_ROOT’].’/userfiles/’.$filename, $filecontent) will be replaced with mdbFileSet($filename, $filecontent). This is obviously more readable, and we are abstracting the user-files location as well, making it flexible.

The actual mdbFileSet() function works like this

  1. parameters are $fname and $file, which contain the filename (including the directories, delimited by ‘/’), and the file content as a string.
  2. check GridFS to see if the file already exists. If it does:
    1. delete the existing file (see Deleting Files in this article)
  3. copy the uploaded file to the local user-files location (to act as a cache)
  4. upload the file using GridFS

Code for the mdbFileSet function:

function mdbFileSet($fname, $file) {
  if (strpos($fname, '..')!==false) { // hack attempt
    return false;
  }
  global $MDBVARS;
  if (strpos($fname, '/')!==false) {
    @mkdir($MDBVARS['cache'].preg_replace('/[^\/]*$/', '', $fname), 0755, true);
  }
  file_put_contents($MDBVARS['cache'].$fname, $file);
  $conn=new Mongo($MDBVARS['dbhost']);
  $db=$conn->{$MDBVARS['dbname']};                
  $db->authenticate($MDBVARS['username'], $MDBVARS['password']);
  $grid=$db->getGridFS();                    
  $existing=$grid->findOne($fname);               
  if (!is_null($existing)) {
    $grid->delete($existing->file['_id']);
  }
  $grid->storeBytes($file, array('filename'=>$fname), array('safe'=>true));
  $conn->close();
}

You will need to set the $MDBVARS global array before running the function. I keep mine in the server’s config.php.

Example:

$MDBVARS=array(
  'cache'=>$_SERVER['DOCUMENT_ROOT'].'/userfiles/',
  'username'=>'username',
  'password'=>'password',
  'dbname'=>'filesdb',
  'dbhost'=>'mdb1.yourmongodbserver.com'
);

Replace the values in the above code with your own values.

You can test this easily. Create a test.php file with the following code:

<?php
require_once 'php/basics.php'; // link to file containing common functions
mdbFileSet('test/file.php', file_get_contents(__FILE__));
?>

The above code will upload a copy of the test.php file you just created, and will store a copy in your cache as well. After loading the file in your browser, you can test this by looking in your cache on the server:

[root@cp3 server]# ls userfiles/test/file.php -l
-rw-r--r-- 1 apache apache 142 Nov  6 10:36 userfiles/test/file.php

And also by logging into the MongoDB server and searching for the file:

> db.fs.files.find({filename:'test/file.php'})
{ "_id" : ObjectId("545b4f1560b99367688b456b"), "filename" : "test/file.php", "uploadDate" : ISODate("2014-11-06T10:36:05.121Z"), "length" : NumberLong(142), "chunkSize" : NumberLong(261120), "md5" : "00397d7306c53cda5ea9446d7bd62594" }

Before going any further, you should go through your code now and edit all your user-file-writing functions so they use the mdbFileSet() function. Everything should still work as before, but now, there will be a copy of each file saved in the MongoDB database as well.

Reading Files

Okay, so let’s say all your work so far in this article has been done on Server1. You now switch over to Server2 and want to open a record that includes an image uploaded to Server1. The image is obviously not on Server2, so how do we transparently download it to Server2 such that the end-user never needs to know?

For this, we will write a function called mdbFileGet (MongoDB File Get), which will retrieve it from the MongoDB server if it is not already cached locally. How it works:

  1. there is one parameter, $fname, which is the filename including the directories.
  2. if the file already exists in the local server’s cache, then return that file’s contents.
  3. otherwise, download the file from GridFS, store a copy in the local cache, and return the file’s contents.

There is an issue to do with the cache, which I’ll explain in a moment, but in the meantime, here is the code for the function:

function mdbFileGet($fname) {
  if (strpos($fname, '..')!==false) { // hack attempt
    return false;
  } 
  global $MDBVARS;
  if (file_exists($MDBVARS['cache'].$fname)) {
    return file_get_contents($MDBVARS['cache'].$fname);
  }
  $conn=new Mongo($MDBVARS['dbhost']);         
  $db=$conn->{$MDBVARS['dbname']};                
  $db->authenticate($MDBVARS['username'], $MDBVARS['password']);        
  $fdata=$db->fs->files->findOne(array('filename'=>$fname));
  if (is_null($fdata)) { // file doesn't exist
    $conn->close();
    return false;
  }
  $grid=$db->getGridFS();                    
  $file=$grid->findOne(array('filename'=>$fname));
  if (strpos($fname, '/')!==false) {
    @mkdir($MDBVARS['cache'].preg_replace('/[^\/]*$/', '', $fname), 0755, true);
  } 
  $bytes=$file->getBytes();
  file_put_contents($MDBVARS['cache'].$fname, $bytes);
  $ftime=date('YmdHis', $file->file['uploadDate']->sec);
  touch($MDBVARS['cache'].$fname, $ftime);
  $conn->close();
  return $bytes;
}

For an example of this in use, let’s consider an image, /userfiles/1/image.jpg that was uploaded to Server1. It’s obviously not yet on Server2, so how do we view it there?
When loading the file up (let’s say http://server2.yourcomp.any/userfiles/1/image.jpg), the server looks directly for the image, and doesn’t find it. We need to route the request through a script that makes sure the file is there before sending it back.
To do that in this case, we can use mod_rewrite so that calls to /userfiles/[whatever] are routed to something like /php/file-get.php, which handles the work.
Edit your .htaccess file, and add in something like this:

RewriteEngine on
RewriteRule ^userfiles/.*$ /php/file-get.php [QSA,L]

Now create the file php/file-get.php:

<?php
require_once 'basics.php'; // load common functions and config.php
$fname=preg_replace('/^\/userfiles\/|\?.*/', '', $_SERVER['REQUEST_URI']);
if (strpos($fname, '..')!==false) { // hack attempt
  exit;
}
$ext=strtolower(preg_replace('/.*\./', '', $fname));
switch ($ext) {
  case 'png':
    header('Content-type: image/png');
  break;
  case 'jpg': case 'jpeg':
    header('Content-type: image/jpg');
  break;
  case 'gif':
    header('Content-type: image/gif');
  break;
  default:
    header('Content-type: ');
}
echo mdbFileGet($fname);
?>

You can see that most of the file’s code is actually just figuring out the mime-type to show. The downloading and showing of the file is done right at the last line.

You can now transparently upload files on one server and view them on another!

In fact, once the file is uploaded, you can remove it completely from all servers, and then when you next need it, just load it up through mdbFileGet() as normal and it will download again.

The caching issue that I mentioned earlier has to do with cache invalidation. Let’s say we upload image.jpg and it is distributed to a number of servers. After a few hours, we might upload a replacement image – how do we tell the servers that the old image is invalid and it should be downloaded again?

We will start solving that in the next section.

Deleting Files

Deleting files is not as obvious as it sounds. On a one-server system, it’s simply a matter of using unlink() to remove the file, and there’s no more to be said about that.

However, in a multi-server system, we have three steps:

  1. delete the local cached file
  2. delete the database-stored file
  3. find all servers that have a copy of the file and delete the file from those servers.

#1 and #2 can be solved immediately in a very simple function:

function mdbFileRemove($fname) {
  if (strpos($fname, '..')!==false) { // hack attempt
    return false;
  }
  global $MDBVARS;
  @unlink($MDBVARS['cache'].$fname);
  $conn=new Mongo($MDBVARS['dbhost']);
  $db=$conn->{$MDBVARS['dbname']};
  $db->authenticate($MDBVARS['username'], $MDBVARS['password']);
  $grid=$db->getGridFS();
  $existing=$grid->findOne($fname);
  if (!is_null($existing)) {
    $grid->delete($existing->file['_id']);
  }
  $conn->close();
}

The above will delete a file from the local server and from the MongoDB database, but will not clear the file from other server caches.

To delete from the other machines, we need to set up a deletion queue, which we’ll do later in the File Delete Queues section.

Creating File Delete Queues

To delete files from all servers, we need to send a message to those servers to tell them to delete their local copies of the file.

Sending a message to every single server in your network is a waste of resources, as most of the servers may not actually have a copy of the file you are trying to delete.

So, we need to adapt the mdbFileSet and mdbFileGet functions so they add a record to the database telling it exactly what servers have copies of the files. This will then allow us to target just those servers and to know that we’re not wasting time.

Edit the mdbFileSet function and change this line:

$grid->storeBytes($file, array('filename'=>$fname), array('safe'=>true));

to this:

$grid->storeBytes(
  $file,
  array('filename'=>$fname, 'servers'=>array($_SERVER['HTTP_HOST'])),
  array('safe'=>true)
);

As a test, I uploaded an image called 3184/user-photos/3184.jpg, then checked my MongoDB instance:

> db.fs.files.find({filename:'3184/user-photos/3184.jpg'})
{ "_id" : ObjectId("545b68a160b993b86b8b4567"), "filename" : "3184/user-photos/3184.jpg", "servers" : [ "cp3.myserver.com" ], "uploadDate" : ISODate("2014-11-06T12:25:05.344Z"), "length" : NumberLong(37182), "chunkSize" : NumberLong(261120), "md5" : "9def8b14cb1611097e755692d04dcbdd" }

Note the highlighted servers section. As part of the file upload, we are initialising an array which states what servers have a copy of that file.

An important thing to note as well, is that in GridFS, the file is recorded in a set of chunks which are standard MongoDB documents, and the metadata of the file is recorded in another normal document. What we look at with db.fs.files.find is the metadata, not the file chunks. It would be uneconomical to store metadata within the same document(s) as the file chunks, as checking something as simple as its creation date, or the list of servers that have it, would then involve downloading the entire file.

Next, we need to adapt the mdbFileGet() function. Change the following:

touch($MDBVARS['cache'].$fname, date('YmdHis', $file->file['uploadDate']->sec));
$conn->close();

to this:

touch($MDBVARS['cache'].$fname, date('YmdHis', $file->file['uploadDate']->sec));
$db->fs->files->update(
  array('filename'=>$fname),
  array('$push'=>array('servers'=>$_SERVER['HTTP_HOST']))
);
$conn->close();

In this, we inline-update the server array that we created in mdbFileSet(). There is no need to download, change, and re-upload the record. In fact, there is a race condition there, in that some other server may be doing the same thing at the same time. It is safer to have the MongoDB server handle the update of the document directly.

If you then open the image on another server and check the file again on the MongoDB server, you’ll see something like this:

> db.fs.files.find({filename:'3184/user-photos/3184.jpg'})
{ "_id" : ObjectId("545b68a160b993b86b8b4567"), "filename" : "3184/user-photos/3184.jpg", "servers" : [ "cp3.myserver.com", "cp4.myserver.com" ], "uploadDate" : ISODate("2014-11-06T12:25:05.344Z"), "length" : NumberLong(37182), "chunkSize" : NumberLong(261120), "md5" : "9def8b14cb1611097e755692d04dcbdd" }

Note that the servers array has an extra entry in it, but nothing else was touched. Exactly what we want.

Next, we need to adapt the mdbFileRemove function, so it builds the queue of files to delete (and what servers to delete them from).

To do that, change the following:

  if (!is_null($existing)) {
    $grid->delete($existing->file['_id']);
  }

to this:

  if (!is_null($existing)) {
    if (isset($existing->file['servers'])) {
      $servers=array_unique($existing->file['servers']);
      $idx=array_search($_SERVER['HTTP_HOST'], $servers);
      if ($idx!==false) {
        unset($servers[$idx]);
      }
      $list=array_values(array_map(function($server) use ($fname) {
        return array(
          'filename'=>$fname,
          'server'=>$server
        );
      }, $servers));
      $ret=$db->command(array('insert'=>'deletes', 'documents'=>$list));
    }
    $grid->delete($existing->file['_id']);
  }

This code inserts an entry into a db.deletes collection on the MongoDB server for every server that has a cached copy of the file. Of course, it removes a reference to the local server before doing so, as we can handle that immediately.

After doing an update of the image on cp3.myserver.com, I then checked the MongoDB deletes collection:

> db.deletes.find()
{ "_id" : ObjectId("545b7f6165f402bccee49573"), "filename" : "3184/user-photos/3184.jpg", "server" : "cp4.myserver.com" }

This means we can now work on the next part; writing a deletion daemon.

Running a File Deletion Queue

We now have a list of the cached files and the servers that have them. But how do we tell those servers to delete those cached files?

A way to do this is to write a cron job that runs every minute and checks the MongoDB deletes collection to see if there are any cached files that need to be deleted, then call those servers and tell them to delete the files.

This script will need to run directly on the MongoDB server, so install PHP on that server. In particular, you will need the command-line version of PHP. In Centos7, it is installed like this:

[root@mdb1 ~]# yum install php-cli php-devel php-pear gcc openssl-devel
[root@mdb1 ~]# pecl install mongo
[root@mdb1 ~]# echo "extension=mongo.so" >> /etc/php.ini

On the MongoDB server, create a user called mongo (useradd mongo), and create a file called /home/mongo/checkCaches.php:

<?php
$MDBVARS=array(
        'username'=>'username',
        'password'=>'password',
        'dbname'=>'filesdb',
        'dbhost'=>'mdb1.yourmongodbserver.com',
        'apikey'=>'805f73958de1653e073e6a8c674bb1e8'
);
$conn=new Mongo($MDBVARS['dbhost']);
$db=$conn->{$MDBVARS['dbname']};
$db->authenticate($MDBVARS['username'], $MDBVARS['password']);
$fdata=$db->deletes->find();
while ($d=$fdata->getNext()) {
        $url='http://'.$d['server'].'/php/cacheClear.php';
        $ch=curl_init($url);
        curl_setopt($ch, CURLOPT_POST, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        $fname=$d['filename'];
        $now=''.microtime(true);
        $md5=md5($fname.$now.$MDBVARS['apikey']);
        curl_setopt($ch, CURLOPT_POSTFIELDS, array(
                'filename'=>$fname,
                'time'=>$now,
                'md5'=>$md5
        ));
        $ret=curl_exec($ch);
        if ($ret=='ok') {
                $db->deletes->remove(array('_id'=>new MongoId($d['_id'])));

        }
        curl_close($ch);
}
?>

The $MDBVARS array is almost the same as those on the application servers. We add a new item, though, apikey, which helps us provide some authentication without needing usernames and passwords. By running the filename, the time, and the apikey through an MD5 function, we create a value that can only reasonably be reproduced by another MD5 function that knows the same details. So, we send the filename, time and MD5 result through to the target server, and if the target server can reproduce the MD5 result by MD5ing the filename, time, and its own copy of the apikey, then that’s enough proof that the call is valid.

Make sure to add the apikey entry to all your servers’ $MDBVARS arrays.

On the target server, then, we create the /php/cacheClear.php file:

<?php
require_once 'basics.php';
$fname=$_REQUEST['filename'];
$md5=md5($fname.$_REQUEST['time'].$MDBVARS['apikey']);
if ($md5!=$_REQUEST['md5']) {
  echo 'incorrect API key';
  exit;
}
if (strpos($fname, '..')!==false) { // check for hacks
  exit;
}
@unlink($MDBVARS['cache'].$fname);
echo 'ok';
?>

As usual, there is a potential flaw to consider. The checkCaches.php file on the MongoDB server goes through every delete entry in the database, but what if this takes more than a minute to finish?

If it takes more than a minute to finish, and the script is being called once a minute, then eventually, the server will have multiple copies of the script running against overlapping lists of files, and it will crash.

The solution to this is simply to add a timeout to the script, so it runs for 55 seconds (say) and then stops.

In checkCaches.php on the MongoDB server, change the following:

while ($d=$fdata->getNext()) {

to this:

$now2=time();
while ($d=$fdata->getNext()) {
  if ($time-55>$now) {
    exit;
  }

Now it will simply stop after second 55, and continue when it is called again.

Edit cron for the mongo user (su mongo -c “crontab -e”) and add this line then save the file:

* * * * * php /home/mongo/checkCache.php >/dev/null 2>/dev/null

That’s it! You now have a working distributed filesystem.