Monthly Archives: March 2019

Rebuilding MP3Unsigned #2 – Basic Site

Yesterday, I finished importing the data from the old database to the new. This is far from finished, because the export didn’t include some vital information as the field types, keys, or that ephemeral quality, what it all means.

The way I’m building the new version, it is in four separate parts:

  1. plain HTML “archive” version of site, perfect for search engines
  2. JavaScript overlay for progressive enhancement of site
  3. CSS themes, which you can switch between
  4. API, for the JavaScript and any external tools to talk to

Today I’m working on the first part, the plain HTML archive. Technically, I should really do the fourth part first so the HTML archive can build itself up using data provided from the API, but I want to get something visual today, so will fix this up yesterday (technical debt!).

On the front page, I want to show a list of all bands. Unfortunately, the old database is …not great… for this basic function. After reading the tables, I found that “bands” are a combination of the table “tblbands” and “subscribers”, such that the band is named after the subscriber’s username.

This is not great because it means subscribers can only be in one band at a time, and they can either use their own name, or the name of the band, but not both. I’ll fix that after phase one of the project (rebuilding the existing site) is done.

I wrote a small “router” that can rip apart the URL and use that to determine what page to display. In the new site, we will use a language code to determine what language to use for user interface strings. This code will always be the first item in the URL.

Long story short, here’s the absolute basics of the new site, in English, Irish, and Japanese.

This stage of the rebuild is not designed to be pleasant to look at. It’s designed to be very quick to load, so search engines can find the content easily.

The next stage is to create the API so that database access is done through only one part of the system, and all requests (from a client, from the base I just built, from third-party tools) go through that API system instead of directly to the database.

Getting to the current stage (visible band names, visible MP3 lists, playable MP3s) was important for morale reasons. Now we can really get building.

Rebuilding MP3Unsigned #1 – MSSQL to MySQL

Last week, I took on a project for my friend Phil Sneyd, who said he and Dave Kelly were on the verge of completely scrapping their long-term project MP3Unsigned, a website for unsigned musicians to upload their music, allowing the music to be charted monthly, commented on, etc.

I told him I’d have a look, and asked for some access details. I won’t get into the numbers, but when I saw the hosting costs, I did a quick check and found that moving everything to Digital Ocean with Volumes to handle the storage would cost 6 times less. A while later, I realised if I use Spaces instead of Volumes, it would be 22 times cheaper.

It was impossible to simply copy the code over, though. The website was written in an old form of ASP, the source-code of which was long gone, and the database was Microsoft SQL Server, which doesn’t play nice with anything other than Microsoft products.

So, I needed to extract the current data, and rebuild the website. This is a project that has data right back to 2003, so this is not going to be a single article rebuild 😉

This first article is about converting the database from MSSQL to MySQL.

Exporting the Database

The hardest part in this, in my opinion, was in extracting the database data. I could not find an easy way to dump the data in a format that could be converted to MySQL (the DBMS I would use on the new one) and imported.

The old hosting included a copy of myLittleAdmin, which claims on their website to “Export data to XLS, XML, CSV“. I tried this.

The “CSV” export actually exports into a semi-colon delimited format (not commas), and it doesn’t quote strings, so when data itself contains semi-colons, you can’t trust the export.

The “XLS” export is not XLS. It exports an XML file which I assume can, after some manipulation, be converted into an XLSX file, but it’s definitely not XLS. I couldn’t use it.

The “XML” export at least did what it said it would! I finally was able to export the data. But I had to do it one table at a time, and the old system had 94 tables… It would be handy if myLittleAdmin had a single “export all as XML” function, but I can only work with what I have, so 94 exports it is…

There were a lot of tables that I considered irrelevant – tables created for specific competitions in 2006, or tables created for sponsors and then never used. I ignored those.

Converting to MySQL

The XML format that was created looked a little like this:

In that, we have the table name as the root, a “row” object containing row data, and then each field’s data is contained in an item named after it. Makes sense.

To convert that to MySQL, I used PHP’s built-in SimpleXML support to convert the XML file to a navigable structure, then started iterating.

It wasn’t quite as straightforward as that, but that was the gist.

To begin with, I assumed that all fields are text strings, so I set longtext as the field type. Except the first item in each row, which I assumed was an auto_increment integer (the unique primary ID of each piece of data), unless the item was obviously text, as was the case in some tables… And unless the table was the Chart data, because …who knows. When creating a table, a rule of thumb is that you should always enter a primary key. Always.

I found that when I ran the script, there were some crashes where the original export used some characters that SimpleXML complained about. For example, “”, “ ” (yes, there’s a character there…), “”, “”. After checking the context of these characters, I simply removed them. I had tried various methods including UTF8 encoding, MB transliteration. In the end, deleting them was easiest.

Once the table structure was recreated in a vaguely correct MySQL form, I could then import the XML files directly using MySQL’s “load into” method.

Here’s the PHP code I used to automate this (the dbQuery function just runs a query)

$files=new DirectoryIterator('.');
foreach ($files as $f) {
if ($f->isDot() || $f->getFilename()=='table_import.php') {
continue;
}
$fname=$f->getFilename();
$tname=strtolower(preg_replace('/mla_export_(TBL)?(.?)../', '\2', $fname));
echo $tname."\n";
dbQuery('drop table if exists '.$tname);
$xmlstr=file_get_contents($fname);
$xmlstr=str_replace(['', '', ' ', '', '', '', '', '', '', '', ''], ' ', $xmlstr);
$xml=simplexml_load_string($xmlstr, 'SimpleXMLElement', LIBXML_COMPACT | LIBXML_PARSEHUGE);
if (!$xml || !isset($xml->row) || !count($xml->row)) {
continue;
}
$i=0;
$fields=[];
foreach ($xml->row[0] as $k=>$v) {
if ($i==0 && preg_match('/^[0-9]+$/', $v) && $fname!='mla_export_TBLOfficialChart.xml') {
$fields[]=$k.' int auto_increment not null primary key';
}
else {
$fields[]=$k.' longtext';
}
$i++;
}
$sql='create table '.$tname.' ('.join(', ', $fields).') default charset=utf8 engine=innodb';
echo $sql."\n";
dbQuery($sql);
echo "importing data\n";
dbQuery('LOAD XML LOCAL INFILE "'.$f->getPathname().'" INTO TABLE '.$tname);
}

The import failed on a few of the XML files without error – simply saying “process killed”. I fixed this by increasing the RAM of the machine temporarily. There’s no need for a running production server that’s only serving simply things to have a load of RAM in it, but for the initial setup, it needs enough RAM to completely hold all of the imported data in memory, with plenty to spare while it manipulates it into a shape that can be recorded in MySQL.

The actual import took hours for the server to do with this project. At least 3 hours of non-stop crunching.

Next Steps

After the import, I have a load of data in very simple and inefficient format – inconsistent table names, inconsistent field names, and just primary keys and long-text formats.

I’m going to leave the names alone until after we’ve completed the initial rebuild, so that when I “refresh” the data just before we go live, I don’t have to do a lot of awkward remapping.

The long-texts, I’ll adjust as I rewrite the parts that read/update them.

Tomorrow, the plan is to create a test server that simply outputs the existing data in read-only format. This will act as a low-level site that will be “progressively enhanced” to what we want it to be.