02 Mar

Rebuilding MP3Unsigned #1 – MSSQL to MySQL

Last week, I took on a project for my friend Phil Sneyd, who said he and Dave Kelly were on the verge of completely scrapping their long-term project MP3Unsigned, a website for unsigned musicians to upload their music, allowing the music to be charted monthly, commented on, etc.

I told him I’d have a look, and asked for some access details. I won’t get into the numbers, but when I saw the hosting costs, I did a quick check and found that moving everything to Digital Ocean with Volumes to handle the storage would cost 6 times less. A while later, I realised if I use Spaces instead of Volumes, it would be 22 times cheaper.

It was impossible to simply copy the code over, though. The website was written in an old form of ASP, the source-code of which was long gone, and the database was Microsoft SQL Server, which doesn’t play nice with anything other than Microsoft products.

So, I needed to extract the current data, and rebuild the website. This is a project that has data right back to 2003, so this is not going to be a single article rebuild 😉

This first article is about converting the database from MSSQL to MySQL.

Exporting the Database

The hardest part in this, in my opinion, was in extracting the database data. I could not find an easy way to dump the data in a format that could be converted to MySQL (the DBMS I would use on the new one) and imported.

The old hosting included a copy of myLittleAdmin, which claims on their website to “Export data to XLS, XML, CSV“. I tried this.

The “CSV” export actually exports into a semi-colon delimited format (not commas), and it doesn’t quote strings, so when data itself contains semi-colons, you can’t trust the export.

The “XLS” export is not XLS. It exports an XML file which I assume can, after some manipulation, be converted into an XLSX file, but it’s definitely not XLS. I couldn’t use it.

The “XML” export at least did what it said it would! I finally was able to export the data. But I had to do it one table at a time, and the old system had 94 tables… It would be handy if myLittleAdmin had a single “export all as XML” function, but I can only work with what I have, so 94 exports it is…

There were a lot of tables that I considered irrelevant – tables created for specific competitions in 2006, or tables created for sponsors and then never used. I ignored those.

Converting to MySQL

The XML format that was created looked a little like this:

In that, we have the table name as the root, a “row” object containing row data, and then each field’s data is contained in an item named after it. Makes sense.

To convert that to MySQL, I used PHP’s built-in SimpleXML support to convert the XML file to a navigable structure, then started iterating.

It wasn’t quite as straightforward as that, but that was the gist.

To begin with, I assumed that all fields are text strings, so I set longtext as the field type. Except the first item in each row, which I assumed was an auto_increment integer (the unique primary ID of each piece of data), unless the item was obviously text, as was the case in some tables… And unless the table was the Chart data, because …who knows. When creating a table, a rule of thumb is that you should always enter a primary key. Always.

I found that when I ran the script, there were some crashes where the original export used some characters that SimpleXML complained about. For example, “”, “ ” (yes, there’s a character there…), “”, “”. After checking the context of these characters, I simply removed them. I had tried various methods including UTF8 encoding, MB transliteration. In the end, deleting them was easiest.

Once the table structure was recreated in a vaguely correct MySQL form, I could then import the XML files directly using MySQL’s “load into” method.

Here’s the PHP code I used to automate this (the dbQuery function just runs a query)

$files=new DirectoryIterator('.');
foreach ($files as $f) {
if ($f->isDot() || $f->getFilename()=='table_import.php') {
$tname=strtolower(preg_replace('/mla_export_(TBL)?(.?)../', '\2', $fname));
echo $tname."\n";
dbQuery('drop table if exists '.$tname);
$xmlstr=str_replace(['', '', ' ', '', '', '', '', '', '', '', ''], ' ', $xmlstr);
$xml=simplexml_load_string($xmlstr, 'SimpleXMLElement', LIBXML_COMPACT | LIBXML_PARSEHUGE);
if (!$xml || !isset($xml->row) || !count($xml->row)) {
foreach ($xml->row[0] as $k=>$v) {
if ($i==0 && preg_match('/^[0-9]+$/', $v) && $fname!='mla_export_TBLOfficialChart.xml') {
$fields[]=$k.' int auto_increment not null primary key';
else {
$fields[]=$k.' longtext';
$sql='create table '.$tname.' ('.join(', ', $fields).') default charset=utf8 engine=innodb';
echo $sql."\n";
echo "importing data\n";
dbQuery('LOAD XML LOCAL INFILE "'.$f->getPathname().'" INTO TABLE '.$tname);

The import failed on a few of the XML files without error – simply saying “process killed”. I fixed this by increasing the RAM of the machine temporarily. There’s no need for a running production server that’s only serving simply things to have a load of RAM in it, but for the initial setup, it needs enough RAM to completely hold all of the imported data in memory, with plenty to spare while it manipulates it into a shape that can be recorded in MySQL.

The actual import took hours for the server to do with this project. At least 3 hours of non-stop crunching.

Next Steps

After the import, I have a load of data in very simple and inefficient format – inconsistent table names, inconsistent field names, and just primary keys and long-text formats.

I’m going to leave the names alone until after we’ve completed the initial rebuild, so that when I “refresh” the data just before we go live, I don’t have to do a lot of awkward remapping.

The long-texts, I’ll adjust as I rewrite the parts that read/update them.

Tomorrow, the plan is to create a test server that simply outputs the existing data in read-only format. This will act as a low-level site that will be “progressively enhanced” to what we want it to be.