Category Archives: databases

multi-tenanted CMS architecture

Last week, I did a talk at the Dublin Google buildings titled “Multi-tenanted CMS Architecture using PHP”.

Here are the slides that I used:

While talking with Google’s Brian Brazil, he explained that it is actually more efficient to use one database and many separate tables, than to separate each installation into a separate database, so one point I made (that KV-WebME uses separate databases per site) will change in the future.

I think the talk went down well, by the number of questions afterwards.

Last year, I gave a similar talk, and made the mistake of including way too much PHP in it – I had assumed that the audience would be composed of PHP developers. This year, there is just one slide of PHP, and that’s just to illustrate one possible way to build a proxy config.

Lesson’s learned for this time:

  • Talk slower. When I’m explaining something, I tend to try to get as much in as possible, so speak very fast. This makes it hard to hear what I’m saying.
  • More pictures, less words!
  • Stats. Some of the questions were around how efficient certain parts of the method were – particularly on the overhead of piping a file through a script as opposed to simply delivering it via Apache. I need to come up with numbers for that.

Overall, I was happy with this presentation.

keeping an admin session active

I had a call from a client who asked why, after logging into a CMS admin area and spending an hour or so writing a document, she was unable to submit it because it claimed she was not logged in.

The answer was that the session had expired.

On busy servers, one method of optimisation is to reduce the session-time. This makes it easier for the server to cope with a large number of visitors, but also has the undesired effect of logging people out if they take their time over anything.

One solution to this is to keep the admin session in a database table, tied to a cookie in the browser. Unfortunately, that means that every time the browser sends the cookie, it must be verified, whereas a session is usually trusted.

The workaround is to use some JavaScript to refresh the session every now and then.

I wrote a simple cyclical script which polled the server every minute to refresh the session.

Here it is, using jQuery to handle the AJAX:

function keep_session_alive(){
  setTimeout(keep_session_alive,60000);
  $.get('/ww.admin/keepalive.php');
}
setTimeout(keep_session_alive,60000);

And the server-side code is this:

<?php
session_start();

Very very simple trick. The polling could be enhanced, if you want, to alert the admin of anything interesting that’s happened on the server.

happy birthday KFM

KFM is 4 years old today:

http://verens.com/2006/05/27/file-manager-for-fckeditor-day-one/

demos of older versions no longer available, but it’s interesting to know that it’s lasted four years!

of particular interest in my own case, is that there has been no major functionality change in the last two years or so. KFM today is basically the same as KFM two years ago; just faster, more secure, and better coded.

CMS Design using PHP and jQuery

I’m happy this week. Last week, I spent some time and organised myself a bit more. In work, things are going smoothly – managed to get over a tricky piece of work and the rest is simply a list of small tasks.

For the last few weeks, I’ve been emailing and messaging Darshana at Packt Publishing, about writing a second book (jQuery 1.3 with PHP is going very well – list of reviews).

I initially wanted to write about file management, to explain how KFM works, and to help force me to improve on it. But there’s just not enough of an interested market in that – it’s too specialised.

So instead, I’ll be writing about CMS design using PHP and jQuery.

We (webworks.ie) have a CMS engine which we’ve written and improved for the last 6 or so years. We’ve open-sourced it a number of times, but never managed to generate much interest in it. We never had the time to spend on publicising it.

The book will not be specifically about that engine, but rather about the concepts that went into creating it – how a CMS works, how to manage plugins, administration, user management, and all the other little bits and pieces that every PHP developer needs to eventually address.

By way of explanation, I will be demonstrating various parts of our CMS, and explaining how and why it was built that way. I will be closely examining the other major CMSes as well, and giving alternative methods where good ones exist.

The proposed chapter list is:

  1. Introduction
  2. CMS core design
  3. User management and access control
  4. Page creation and Navigation
  5. Template Management
  6. Plugins
  7. Form creation
  8. Image Gallery
  9. Panels
  10. Search and Polls
  11. RSS and News
  12. Online Store
  13. Products

I’m really excited about this project!

jQuery 1.3 With PHP: Datatables chapter is online

Packt have put another chapter of my book online. This time, it’s the datatables chapter, which discusses how to use the Datatables plugin to organise very large data tables.

The chapter discusses how to make a large (over two million rows) database usable through an ajaxified table.

The Datatables plugin supports search, sorting and pagination.

jQuery 1.3 With PHP: calendars chapter is online

Packt have put one of the chapters of my book, jQuery 1.3 with PHP, online as an article.

I mentioned this chapter in June, and provided a small demo and the source.

I’ve had a few requests for example code on how to do this using a real database. I’ve been meaning to make the time to do that. I’ll get that done hopefully tomorrow.

In the meantime, though, please enjoy the article.

By the way, the book went to the printer only a few days ago, so if you want to avail of the pre-order price, now’s the time.

KFM2: the beginning

KFM 1.x has reached the end of its development. It has fulfilled its original purpose; to improve on the default file-manager for FCKeditor, and well-exceeded it.

From my records, I can see that KFM has been installed on over 9000 separate domains. That is quite a lot of sites, and says quite loudly how important it is that KFM2 is at least as good as KFM1 (we all know about the KDE4.0 fiasco…).

KFM2 will blow KFM1 well out of the water. It will do everything that KFM1 does, but much quicker, and with a much less monolithic architecture, allowing it to be hugely flexible. I am planning on creating a download section similar to jQuery UI’s, where you can choose the components that you want in the system, and it will be built up and configured for you, ready for download.

See, KFM 2 will be modular. If you just want a simple upload/download facility, then you will only need the core, and plugins to handle uploads and downloads. If you want the whole shebang, you will find plugins for various ways of selecting files and directories, full-on graphical UIs such as the existing KFM1 forces on the user, plugins for hooking into external file systems using FTP or whatever other method people can imagine, plugins for multi-media, file editing, databases, user authentication, search, RSS, and so on.

Basically everything that KFM1 already does, but with the option to easily remove/add bits that you want. Make it as fast or as “bling” as you want.

KFM2 will have a number of new things which are not available in any other online file-systems, that I know of.

new file-system things

The file system will be modular. You will be able to attach symbolic links to any part of the file system, linking to other external systems.

As an example, let’s look at a URL: http://example.com/kfm/get/images/site1/logo.jpg.

One of the new ideas is that this will not necessarily be located on the server named example.com.

If you are running a very busy website, you might prefer to delegate image management to a different machine deeper in your network, so you add a symbolic link /images/ linking to the image-management machine. This causes the server to issue a redirect to the browser, redirecting it to http://images.example.com/kfm/get/site1/logo.jpg.

Now, imagine that the image management server doesn’t host the logo.jpg file on itself. Maybe it’s one of a number of images.example.com servers in a load-balancing cluster, and the file is actually located on a file-server accessible only by FTP. So, we add a symbolic link in the server’s KFM, telling it how to connect to the FTP server. The image server does this, gets the required file, and sends it to the browser.

This sounds a bit complex, but at least it’s possible with the new system. The old system simply would not be able to handle that at all.

API

At the moment, if you want to select a file or directory, you fire up KFM1, wait 15 seconds for it to finish booting itself up, and select the file or directory. That’s way too long. Really, you should only need to load up a tiny widget that’s designed specifically for picking a file.

The new system will have a simple core with an API that can be connected to by little JavaScript widgets. As an example, let’s say you just want to get the file-listing for the foo directory.

At the moment, you need to go through the whole business of booting up KFM’s GUI, navigating to the right directory, then waiting for the server to connect to the database, generate thumbnails and other unnecessary stuff.

You should just have to do something simple like load up http://example.com/kfm/rpc.php?a=get_file_listing&v=/foo through jQuery’s $.get() function. The RPC script does exactly what was asked, and nothing more.

This should allow us to write some incredibly fast and tiny scripts for all the file manipulation things you could possibly want.

Think of it as the “gnu” way of doing KFM – many small tools, each of which is designed to do one task well.

Of course, you will still be able to use the original GUI with this, after it’s been re-engineered to hook into the new RPC system.

Or, you can write your own – the RPC’s API will be well-documented, and will use simple HTTP parameters, so you can write your widgets in whatever language you want – JavaScript is what I will be writing in, but there’s no reason why you’re confined to that – write a GUI in Python, C, Flash – whatever you’re comfortable with.

plugins

I think the new plugin architecture will make this profoundly extensible. I’m writing the core to be as flexible and stable as possible. You will be able to write a plugin to just about anything:

  • authenticate a user before serving a file.
  • change a user’s root directory based on the server-name or the user’s authentication.
  • the entire system is held on a different machine accessible only by FTP? fine – let’s change how files are accessed.
  • record all files and directories in a database, allowing searches to be done.
  • manipulate files online – edit text files, rotate or crop or resize images, move or rename or delete, etc. standard KFM1 stuff, these.
  • create a log plugin, so all access is recorded.
  • create a quota plugin, for denying uploads or downloads when the quota is exceeded.

It was possible for us (Benjamin and myself) to write all of these into KFM1, but as we are only two people, and very busy people at that, it’s never been the absolute top priority to get all these things done.

But, when KFM2 is released, you won’t have to wait for us anymore. You will be able to write plugins, or download plugins created by other people, which do all of the above and I’m sure there will be plugins that I simply have not imagined yet, that inventive people will come up with (Conor, I’m sure you’ll be one of those!)

languages

Languages have always been one of the cool things about KFM. Version 1.3 was released in about 13 languages in total. Unfortunately, the code got too complex in 1.4 to easily add more, but KFM2 is starting from scratch with the learning-curve of KFM1 well behind me. I’ve already made most of the big mistakes I’m likely to make, so from now on, things should be much easier.

How languages will work is that a website will be created which allows people to help out open-source projects, not just KFM by providing translations of words and sentences.

The programmer will create a screen-shot showing the text in use, and will provide the untranslated text in a format such as .po. Translators will translate the text on-line, and the translation will be available to download by the programmer as soon as it’s done.

To ensure good translation, each line will be presented to a number of different translators. The translations which agree the most with each other, will be considered the “right” one.

Using this, it will be possible for KFM to organically add languages with no interaction from myself or any other programmers. When first loaded, a plugin would try to translate messages into the user’s language. Finding that it doesn’t have the language in its database, it will try downloading it from the language translation website. If it’s not available, a request will be added that it be created. When translators that speak that language come to the website, they will be automatically given the KFM text to translate to that language. The next time the plugin checks, the translation might be done, and it will be added to the KFM instance’s local language database.

Of course, people don’t just do this stuff for the hell of it. It’s more fun if there’s a competition or reward. It would be nice if I could get a few paying customers to add their own translation requests to the website – that way I could offer a reward to the best translators (those that do the most work, or are the most consistent).

The competition aspect comes in, where you give people points for every translation they do that is matched by someone else’s translation of the same text.

Google does a similar thing, for example, where it shows images to people and asks them to come up with keywords describing the image – the commonest keywords are considered to be correct. Same trick, different target.

Anyway! Without further ado, I have a core to design, and tests to create. I was going to talk about testing with PHPUnit, automatic upgrades, and the like, but to hell with ye all – I’ve work to do ;-)

a tale of prefixes

I debugged a problem for a friend of mine over the last day. He had installed bbPress on a server which also had WordPress on it, and his WordPress site just suddenly stopped serving comments and pages, and started killing the database server instead.

In the end (after a day of considering this and that and a lot of head-scratching), it turns out that the bbPress installation had been done using the same table prefix as the WordPress one.

See, when a web app is installed, it usually installs a load of database tables. The way to avoid these tables overwriting each other is to “namespace” them. The most common method is to simply place a unique prefix in front of the table names.

For example, WordPress and bbPress both have tables for posts. The default installation is to name them wp_posts and bb_posts, respectively, using wp_ and bb_ as prefixes. KFM does that as well, using kfm_ by default.

In the current case, what happened is that the bbPress installation was done using wp_ as the prefix, and some of the WordPress tables were then changed to match what bbPress expected.

In the case of wp_posts, the field post_status was changed from a varchar holding values such as “publish” or “draft”, to a smallint holding just ’1′ or ’0′. This had the immediate effect of unpublishing over 6000 articles on the blog. (aside: oi, WordPress – why are you not using enums?)

A few hours, spread over a day or so, were spent figuring that out then fixing that.

So, what actually happened? Why was wp_ used as the prefix?

Let’s look at the WordPress installation.

wordpress

When you install a copy of WordPress, the first thing you do is fill in the database details. This is the field for the prefix:

<input id="prefix" type="text" size="25" value="wp_" name="prefix"/>

Notice the name, “prefix”. A lot of browsers these days include auto-fill, which is sometimes useful. When encountering a form, the browser would fill in the details that it already knows, such as email address, etc.

The problem, though, is that sometimes, autofill might fill in details that really should be left blank or at their default states. An example of this is in phpBB, the popular forum software – whenever an admin goes to edit a user’s details, autofill in Firefox tends to fill in the “password” field in the user form, but it really shouldn’t do that, and can potentially break a user’s account.

This is actually quite easy to fix, though, in the installer – the installation program just needs to change to use a different name for its input box:

<input id="prefix" type="text" size="25" value="wp_" name="wp_prefix"/>

Now the value should only be autofilled by values recorded from other WordPress installations, and should not affect future installations of other software.

The thing is, this is not what broke the system I was fixing.

Let’s look at the bbPress installation.

bbpress

When writing bbPress, more thought appears to have gone into the installation. The only fields visible here are database name, user, and password. These are the most commonly changed details, so are shown prominently.

There is an “advanced” checkbox. When clicked, this shows fields that should only be changed if you are sure of what they do:

bbpress2

The fields shown here rarely need to be changed. They include database host, collation, character set and table prefix.

The default character set is UTF8, and I can’t imagine why anyone would need anything other than that (that also covers collation). My own opinion is that these ones shouldn’t even be in advanced – they should be hard-coded.

There are not many reasons why the table prefix would ever need to be changed, but in the case I was troubleshooting, the database host was held on a separate server to the site itself.

So, because the database host needed to be changed, that explains why the admin opened the Advanced section in the first place.

If the prefix input box was named “prefix”, this would be the end of the story – I might have assumed that the problem was autofill (as I did for a while), and suggest the above solution to them.

But, the authors of bbPress have obviously come across the autofill problem themselves, because they named their prefix button “bb_table_prefix”, exactly as I suggest every installer should do (now do it with WordPress, guys!):

<input [...] name="bb_table_prefix" class="text" value="bb_" tabindex="8" />

So, the problem is not the software.

In this case, I believe it’s human error. What follows is my own guess at what happened, in hind-sight and without the benefit of watching the proceedings in person.

I think what happened is that our admin’s techie had left some notes about the settings used for the WordPress installation, including database username, password, etc. Then the admin had gone to install bbPress, in the correct assumption that installation should be straightforward and compatible with WordPress.

However, I believe the problem is that the techie had left a note about table prefix, but had not explained to the admin that all the other settings must be as written but the prefix must not.

The admin had simply filled in what was written, and bbPress did what it does, and changed WordPress’s database tables.

So, who’s at fault?

I believe the problem here was a misunderstanding – to all of us who have done a million installations of a thousand different software applications, it’s very obvious what the table prefix is, and what it does. However, I don’t believe I’ve ever seen it written down, or the dangers of using an existing value.

Let’s look at the bbPress installation again, this time with the ‘?’ expanded:

bbpress3

It says “If you are running multiple bbPress sites in a single database, you will probably want to change this.” While literally correct, this actually doesn’t explain what the prefix is or why it should be changed. And most importantly, it does not say “This value must be unique in your database, or it may overwrite your existing database tables.” The help text in bbPress’s case does not actually explain the danger of using existing values for the prefix.

The techie is also at fault. If leaving notes for someone, always make sure to explain anything which needs explaining. In this case, it should be something like “If installing new software, use these database access details, but make sure that you do not use these database table prefixes.” In fact, the table prefix should not have been in the notes at all – if the value is a default, then don’t write it down.

The admin is also at fault, for following the notes to the letter and filling in the prefix where the default value was actually perfect. I won’t fault him for installing software when the techie wasn’t at hand to point out mistakes, as there really shouldn’t have been any – the prefix error was actually quite easy to make; it just happened to have devastating effects.

However, these are all minor faults. They’re all easy mistakes to make, and it was the combination of all three that caused the error.

Having said all that, I realise that I have some installers to check now to make sure I follow my own advice!

optimise all tables in all mysql dbs on your server

A quick script to do so:

<?php
mysql_connect('localhost','root','YOURPASSWORD');
$alldbs=mysql_query("SHOW DATABASES");
while ($dbs = mysql_fetch_assoc($alldbs)) {
	foreach($dbs as $db_name){
		if($db_name=='information_schema' || $db_name=='mysql')continue;
		mysql_select_db($db_name);
		echo '<strong>database: '.$db_name.'</strong><br />';
		$alltables = mysql_query("SHOW TABLES");
		while ($table = mysql_fetch_assoc($alltables)) {
			foreach ($table as $db => $tablename) {
				echo 'optimising '.$tablename.'<br />';
				mysql_query("OPTIMIZE TABLE `".$tablename."`") or die(mysql_error());
			}
		}
	}
}

Based on this original script for optimising all tables in a specific database.

webme-mu: create your own website

I’ve just finished the first version of the site creator. You can create your own website using this wizard.

This involved a bit of work on my part – I needed to separate personal files (images and other downloadables) from internal files (scripts and other resources), and ensure that personal files could be read from anywhere on the server, not necessarily from a web-accessible directory.

Then, I needed to write the site generator to write config files automatically.

Then, WebME needed to be hacked to load the correct config file, based on the domain name requested.

This has worked. I now have two sites running on that server (http://verens.webme.eu/ and http://webme.eu/), both of which use the same engine instance. The only things that are separate are their databases and their personal files.

The best thing about this was that it was possible to do this without compromising the structure of the downloadable version of WebME. The downloadable version and the multi-user version use exactly the same source code. The only difference is in the config file. In the case of the multi-user version, when the config file is loaded, it acts as a proxy, loading the correct site-specific config file.

After all that work, I need a drink! Please, give the site a chance, and report any problems you come across.