02 Mar

Rebuilding MP3Unsigned #1 – MSSQL to MySQL

Last week, I took on a project for my friend Phil Sneyd, who said he and Dave Kelly were on the verge of completely scrapping their long-term project MP3Unsigned, a website for unsigned musicians to upload their music, allowing the music to be charted monthly, commented on, etc.

I told him I’d have a look, and asked for some access details. I won’t get into the numbers, but when I saw the hosting costs, I did a quick check and found that moving everything to Digital Ocean with Volumes to handle the storage would cost 6 times less. A while later, I realised if I use Spaces instead of Volumes, it would be 22 times cheaper.

It was impossible to simply copy the code over, though. The website was written in an old form of ASP, the source-code of which was long gone, and the database was Microsoft SQL Server, which doesn’t play nice with anything other than Microsoft products.

So, I needed to extract the current data, and rebuild the website. This is a project that has data right back to 2003, so this is not going to be a single article rebuild πŸ˜‰

This first article is about converting the database from MSSQL to MySQL.

Exporting the Database

The hardest part in this, in my opinion, was in extracting the database data. I could not find an easy way to dump the data in a format that could be converted to MySQL (the DBMS I would use on the new one) and imported.

The old hosting included a copy of myLittleAdmin, which claims on their website to “Export data to XLS, XML, CSV“. I tried this.

The “CSV” export actually exports into a semi-colon delimited format (not commas), and it doesn’t quote strings, so when data itself contains semi-colons, you can’t trust the export.

The “XLS” export is not XLS. It exports an XML file which I assume can, after some manipulation, be converted into an XLSX file, but it’s definitely not XLS. I couldn’t use it.

The “XML” export at least did what it said it would! I finally was able to export the data. But I had to do it one table at a time, and the old system had 94 tables… It would be handy if myLittleAdmin had a single “export all as XML” function, but I can only work with what I have, so 94 exports it is…

There were a lot of tables that I considered irrelevant – tables created for specific competitions in 2006, or tables created for sponsors and then never used. I ignored those.

Converting to MySQL

The XML format that was created looked a little like this:

In that, we have the table name as the root, a “row” object containing row data, and then each field’s data is contained in an item named after it. Makes sense.

To convert that to MySQL, I used PHP’s built-in SimpleXML support to convert the XML file to a navigable structure, then started iterating.

It wasn’t quite as straightforward as that, but that was the gist.

To begin with, I assumed that all fields are text strings, so I set longtext as the field type. Except the first item in each row, which I assumed was an auto_increment integer (the unique primary ID of each piece of data), unless the item was obviously text, as was the case in some tables… And unless the table was the Chart data, because …who knows. When creating a table, a rule of thumb is that you should always enter a primary key. Always.

I found that when I ran the script, there were some crashes where the original export used some characters that SimpleXML complained about. For example, “”, “ ” (yes, there’s a character there…), “”, “”. After checking the context of these characters, I simply removed them. I had tried various methods including UTF8 encoding, MB transliteration. In the end, deleting them was easiest.

Once the table structure was recreated in a vaguely correct MySQL form, I could then import the XML files directly using MySQL’s “load into” method.

Here’s the PHP code I used to automate this (the dbQuery function just runs a query)

$files=new DirectoryIterator('.');
foreach ($files as $f) {
if ($f->isDot() || $f->getFilename()=='table_import.php') {
continue;
}
$fname=$f->getFilename();
$tname=strtolower(preg_replace('/mla_export_(TBL)?(.?)../', '\2', $fname));
echo $tname."\n";
dbQuery('drop table if exists '.$tname);
$xmlstr=file_get_contents($fname);
$xmlstr=str_replace(['', '', ' ', '', '', '', '', '', '', '', ''], ' ', $xmlstr);
$xml=simplexml_load_string($xmlstr, 'SimpleXMLElement', LIBXML_COMPACT | LIBXML_PARSEHUGE);
if (!$xml || !isset($xml->row) || !count($xml->row)) {
continue;
}
$i=0;
$fields=[];
foreach ($xml->row[0] as $k=>$v) {
if ($i==0 && preg_match('/^[0-9]+$/', $v) && $fname!='mla_export_TBLOfficialChart.xml') {
$fields[]=$k.' int auto_increment not null primary key';
}
else {
$fields[]=$k.' longtext';
}
$i++;
}
$sql='create table '.$tname.' ('.join(', ', $fields).') default charset=utf8 engine=innodb';
echo $sql."\n";
dbQuery($sql);
echo "importing data\n";
dbQuery('LOAD XML LOCAL INFILE "'.$f->getPathname().'" INTO TABLE '.$tname);
}

The import failed on a few of the XML files without error – simply saying “process killed”. I fixed this by increasing the RAM of the machine temporarily. There’s no need for a running production server that’s only serving simply things to have a load of RAM in it, but for the initial setup, it needs enough RAM to completely hold all of the imported data in memory, with plenty to spare while it manipulates it into a shape that can be recorded in MySQL.

The actual import took hours for the server to do with this project. At least 3 hours of non-stop crunching.

Next Steps

After the import, I have a load of data in very simple and inefficient format – inconsistent table names, inconsistent field names, and just primary keys and long-text formats.

I’m going to leave the names alone until after we’ve completed the initial rebuild, so that when I “refresh” the data just before we go live, I don’t have to do a lot of awkward remapping.

The long-texts, I’ll adjust as I rewrite the parts that read/update them.

Tomorrow, the plan is to create a test server that simply outputs the existing data in read-only format. This will act as a low-level site that will be “progressively enhanced” to what we want it to be.

23 Feb

Optimising a WordPress website #2 – MotoWitch

In the last post, I had improved Kojii’s website from a PageSpeed score of 6 to a score of 28.

I just checked, and it’s at 29 now. The PageSpeed site may give slightly different values depending on network traffic, etc, so I don’t think the site has gotten 1 point better just through waiting.

I hadn’t finished optimising all images, so I worked further on those, using the PageSpeed results as a guide.

I optimised a few of them, but they’re mostly okay – the savings are tiny compared to the effort, and some I had already optimised and Google was claiming it could get even more compression on them (but didn’t say how – Google: how?)

Images are now number 2 in the PageSpeed recommendations. The new #1 is Server Response Time

Server Response can be affected by a number of things, including network, hard-drive speeds, computer load. In this particular case, though, I think it is the number of database queries that the website has to run in order to generate its pages.

A plugin called “WP Super Cache” fixes this – whenever a page is generated, its resulting HTML is stored in a cache file. When the page is next requested, the cached version is delivered, instead of generating a new one. Donncha O Caoimh from the People’s Republic of Cork wrote this, and it may just be the smartest thing to come out of that dismal place!

After activating the plugin, go into the settings and enable it (it’s off, by default!)


Ok – let’s check PageSpeed again and see what that did. I open up a different browser (if you’re logged in, caching is disabled, so you need either a different browser, to log out, or to use anonymous mode) and load up the front page to prime it.

Hmm… no change in the score? It’s still at 27, but after the slight optimisations to the images, they’re at number 4 now instead of 2. But the Server Response is still at #1.

Note that the number is actually higher (1.75s now vs 1.66s before). Again, I think this is just a blip in the various interactions between server speed and network speed. To be sure, I waited a few minutes then ran PageSpeed again.

Ah, that’s better πŸ™‚ I think that my Firefox priming didn’t work. So, Google’s first retrieval of the page after enabling WP Super Cache did the priming for us, so the second retrieval was quicker.

Now, what was #1 (server response time) isn’t even in the list of issues:

Next, we tackle render-blocking resources. When you include CSS, JavaScript, or other resources in your website, the browser “blocks” further loading until those resources are downloaded. This is in case something later in the page relies on those resources.

For example, if you have a script that uses jQuery to do its stuff, then you need jQuery to load first. In this case, jQuery is a blocking resource.

Since the browser doesn’t know what relies on what, it treats every resource as a blocker.

To solve this, you can use the HTML “defer” parameter, which tells the browser that the script the parameter is in is safe to load at a later time when it’s not busy.

The WordPress plugin “Async JavaScript” can do this, giving you the option to specify which scripts are deferred and which are important to load immediately. It’s written by the same guy that wrote the plugin Autoptimize I talked about in the last blog post, and is designed to work with it.

After installing and activating, I enable it, but only for non-logged-in users. Since the vast majority of visitors to a website will be anonymous, these are the users you want to optimise for. Users that actually log in should probably get the full blast of scripts, etc, in case there’s something they use that might be affected by the optimising tricks:

And then click the Apply Defer button. The difference between “defer” and “async” is that “defer” will delay the loading of the scripts, and will then load them in the same order that they were requested in the web-page. Async, though, will load everything in the background and run the results as soon as they’re returned. Async loading can cause a “race condition”, where you may accidentally load something and run it before the scripts it relies on are loaded.

After all that, run PageSpeed again (twice, a minute apart). I see that we’re down to 36, and Eliminate Blocking Resources is still #1

Unfortunately, to do more on this, I think I would need to start removing things from the page, and I really don’t want to do that.

I don’t have root access to the machine, and it uses a theme I am unfamiliar with and don’t want to start hacking, so I think I’m stuck at this for now.

As a last test, I went into Autoptimise and enabled a few more things:

I was unable to verify that the image optimisation worked. I assume that what’s meant to happen is that images are loaded through a CDN that optimises them. But I didn’t see any sign of that (probably because the theme puts most images as CSS backgrounds?).

However, there was a minor improvement in score which I’ll take.

At the end of this blog post, the scores are now at 38 in Mobile, up from 28:

And at 82 in Desktop:

09 Feb

Optimising a WordPress website #1 – MotoWitch

I recently offered to optimise friends’ websites for free, in the interest of trying out a few methods of improving responsiveness and improving the Google PageSpeed score, which Google uses as one indicator in their PageRank system.

MotoWitch.com is run by Kojii Helnwein, a friend from back when we were teenagers (a long time ago now!) – it’s a website for female bikers to get together and share stories from all over the world.

Kojii asked me to go easy on her – she built the website herself and her skills lie more on the visual side than the technical guts. That’s okay – that’s what I’m here for!

So, the first thing I did was to check out what kind of score the website was getting already.

6%! To be honest, I’m very impressed that it’s this low, as the website works well on a subjective test on good broadband, but obviously it has a long way to go to be good on a phone, and the percentage of people using mobile for Internet access is always increasing.

6% score on Google’s PageSpeed tool

The “Opportunities” section of the PageSpeed results gives some hints as to what’s wrong.

Opportunities

Okay – it’s basically saying that the images are a really big problem. Clicking the first row of results will expand it to give some more details:

More Detail

The list of images actually goes on for another page, but we can already see the first issues to address.

Those images are HUGE. When multiplied by the number of images on the front page, it’s obvious that this will cause a problem to mobile users.

I see three immediate tasks:
1. apply some compression to those images – there really is no reason they should be so large if they’re on a website. As a test, I opened the first image in Gimp, and saved it as an 86% compression .jpg file. I could not see the difference visually, but the filesize was 6 times smaller. That’s a start.
2. add some “lazy loading” to the page so that images are only loaded when page scrolls bring the images into view. WordPress has built-in tools for this within its JetPack plugin.
3. add some coding to allow the page to load different versions of the images depending on what size the screen actually is – no need to load a 2000px wide image if you’re using a screen that’s 320px wide.

The JetPack plugin comes as standard with most WordPress sites. Clicking into it, the first edit is to enable WordPress’s Site Acelerator and Lazy Loading.

enable all of these

The “site accelerator” changes the site code so that some of the site’s images and static files (like JavaScript and CSS) come through WordPress’s global CDN servers (content delivery network), which reduces stress on your own server, helping speed up the websites in a number of subtle ways.

Lazy Loading means that images are only loaded when they are about to become visible on the page. JetPack’s Lazy Loading does not apply to all images, so we may need to manually edit some later.

With just this change, we increase the PageSpeed from 6 to 8, with the opportunities changing to this:

pagespeed after enabling lazy loading and site accelerator

The largest changes are that items 1 and 3 in the image above were 43.8s and 43.5s (estimated download times on mobile networks), and are now 7.05s and 3.78s. That’s 8 times faster.

The next step is to go through the list of large images, and optimise them. Here’s what PageSpeed now says are the worst offenders:

images with potential savings

The PageSpeed interface does not give an obvious link to the original image, but if you right-click the little thumbnails on the left and click “open in new tab”, that lets you then download the originals and optimise them.

The simplest method is to use an auto-optimising tool to handle it for you. I used imagecompressor.com’s tool. It does a good job by default, but sometimes you can get a much higher compression without compromising visible details by tweaking the compression slider.

After editing a lot of the images (I couldn’t find where some of them are used, so will address them later), I re-ran PageSpeed. It was still at 8%, but the opportunities list looked better again:

Item 3 there was 7.05s (originally 43.8s) and is now 1.95s.

The “Eliminate render-blocking resources” one is tricky to correct in WordPress sites. Almost every plugin you add to WordPress includes CSS in some way, and they rarely if ever include “defer” keywords to let the browser download them later.

I opened the list:


In total about 30 items are in the list. How to address this?

There are plugins that collect the CSS and JS files that WordPress emits, and combine them before they are outputed. An article I read recommended Fast Velocity Minify, however, I found that this mangled some of the CSS of the website. Autoptimize did a much better job here.

After installing, activating, and then going into the plugin and enabling all items but the CDN on the front page, I primed the front page by loading it in a browser, and then checked PageSpeed again:

getting there…

The list of opportunities still has blocking resources as the first priority, but it is reduced from 30 items to only 14, and they’re all on external domain names:

But I notice something – the c0.wp.com references are from CSS/JS files that were offloaded to CDN by the JetPack plugin. Let’s turn that off…

Then prime the website by loading the front page, and check PageSpeed. The score is increased again to 23, and the number of blocking files reduced from 14 to 8:

Now I notice that half of those are Google Fonts references.

Autoptimize has some advanced settings that let you handle Google Fonts in various ways. I chose to load the fonts asynchronously through webfont.js:

The PageSpeed score is increased yet again, this time to 28, and the number of render-blocking resources is down to just 4:

I think that’s about as much optimisation I can do easily here, without needing to manually edit the CSS files to reduce their size.

I’ll leave it at that for today and carry on tomorrow.

14 Feb

rewriting the trading algorithm

I have the script now running at a realistic 1.14% per day return, taking into account the average number of times that I’ve had to pay a trading fee, the average offset of what price a trade happens vs what price the algorithm wanted it to happen, and other unpredictable things.

The script as it is, is very hard to train in any way other than almost a brute force fashion. Because most of the variables (length of EMA long tail, number of bars to measure for ATR, etc) are integers, I can’t hone into the right values using a momentum-based trainer, so I need to do it using long, laborious loops.

There is one improvement that I could do, which would probably double or more the return, without drastically changing the script. At the moment, there is a lot of “down time”, where the script has some cash sitting in the wallet, and is waiting for a good opportunity to jump on a deal. If the script were to consider other currencies at the same time, then it would be able to use that down time to buy in those currencies.

On second thought, I think the returns would be much higher, because when the script currently makes a BUY trade, its usually sold again within about 30 minutes. That means that it could potentially make 48 BUY trades in a day, each with an average of maybe a .5% return. That’s about a 27% return ((1+(.5/100))48 = 1.27).

That would be nice, but it’s impossible with the GDAX market, because the GDAX market only caters to four cryptocurrencies at the moment. Also, while the money aspect is nice, I’m actually doing this more for the puzzle-solving aspect. I get a real thrill out of adding in the real-life toils and troubles (fees, unpredictable trades, etc) and coming up with solutions that return interest despite those. So, I’m not going to refine the hell out of the script. Instead, I’m going to start working on another version.

Before I started on the current version, I had made a naive attempt at market prediction using neural networks. While some of the results looked very realistic, none of them stood up to scrutiny. I failed. But, I also learned a lot.

I’m going to make another neural-net-based attempt, using a different approach.

The idea I have is that instead of trying to predict the upcoming market values, I’ll simply try to predict whether I should buy or sell. This reduces the complexity a lot, because instead of trying to predict an arbitrary number, I’m only outputting a 1 (buy), 0 (nothing), or -1 (sell). This can then be checked by running the market from day 1 of the test data using each iteration of the network, and then adjusting the weights of the network based on whether the end result was higher or lower than the last time it was run.

I noticed that the best tunings from my current script only made a very few trades, on the very few days where the market basically dropped like a stone and then rebounded. But I can’t assume those will happen very often, so I like to see a few trades happen every day. With the neural network, I can increase the number of trades by simply adjusting the output function that decides whether a result is -1, 0, or 1 (this is usually done by converting the inputs into a value between -1 and 1, then rounding to an integer based on whether the value is above/below 0.7/-0.7. That value can be adjusted easily to provide more 1/-1 hits).

The current approach also involves a lot of math to measure ATR (average true range), running EMA (exponential moving average), etc. With the neural network approach, I will just need to squish the numbers down to form a pattern between -1 and 1 (where 0 is the current market price) and run directly against those numbers.

Because neural networks are a lot more “touchy-feely” than the current EMA/stop-gain/stop-loss approach I’m using, I will be able to “hone in” on good values – something I cannot do at the moment because of the step-like nature of integers.

I won’t be using any off-the-shell networks, as I can’t imagine how to write a good trainer for them. Instead, I’ll write my own, using a cascade-correlation approach.

Cascade-correlation is an approach to neural networks which allows a network to “grow” and gradually learn features of the input data. You train a network layer against every single input, until the output stops improving. You then “freeze” that network layer so is not adjusted anymore. You then create a second layer which trains against every single input and the previously trained network. You continue adding layers until there is no more noticeable improvement.

The point of this is that the first training layer will notice a feature in the data that produces a good result, and will train itself to recognise that feature very well. The second layer will then be able to ignore that feature (because it’s already being checked for), and find another one that improves the results. It’s like how you decide what animal is in a picture – does it have 6 legs (level one), is it red (level two), does it have a stinger (level three) – it’s a scorpion! Instead of trying to learn all the features of a successful sale at once, the algorithm picks up a new one at each level.

Around Christmas, I was playing with the FANN version of cascade correlation, and I think it’s very limited. It appears to create its new levels based on all inputs, but only the last feature-detection layer. Using the above example, this would make it difficult to recognise a black scorpion, as it would not be red. I believe that ideally, each feature layer should be treated as a separate new input, letting the end output make decisions based on multiple parallel features, not a single linear feature decision.

29 Jan

using LIMIT trades to reduce fees

I made two enhancements to my Litecoin market script over the weekend.

The first one was based on something I did last week, where it occured to me that if you buy when the ATR is low, you will probably lose money through fees because the market value will not shift enough for there to be a profit.

For example, if the value is 200 when you buy and 201 when you sell, then you will lose money. Let’s say you you €10 of coins. It will increase in value to €10.05 by the time you sell them, but you will pay just over €0.06 in fees.

An ATR limit large enough to discourage the script from buying when the margin is so small would stop this issue.

However, I realised that a limit that worked at around a value of 200 would not be effective at 400.

The solution was to make a new limit which is inversely proportional to the market value. Let’s say the number is 50. Then it would look for an ATR of 0.25 if the market value was 200, and an ATR of 0.125 if the value was 400.

This made a remarkable difference in my simulation. It now estimates a 3.15% return per day based on the configuration figures I found.

Last week’s version ended up with about 14 buys in a 50 day period, which meant that there was only about one buy every 4 days, making it look like it wasn’t doing anything.

Now, it has what looks like about 32 events per day. A lot of them are repeats, where a sell signal might pop up from a chandelier exit followed by another from exponential moving average or simple moving average, but it’s still a lot more, making the script feel a lot more alive.

This is helped by me changing the trade method.

I had it on MARKET trades, which are virtually guaranteed sales/buys, but also are guaranteed 0.3% fees.

I’ve changed that to LIMIT trades that work in a way that might not trade at exactly what was requested, but should not trigger a fee at all (at least, I haven’t had a single fee yet in my tests!).

How it works: let’s say the market value is 200 right now. The script will check the order books, which might currently have an “ask” of 200.01 and a “bid” of 199.99 (for example). If we are trying to sell, then we add an sell/ask order of 200.02 (current+0.01). If the market value goes down, then we cancel that order and create a new one based on whatever the new value is.

And vice versa with buys/bids.

This means we probably won’t get exactly what we want, because we are relying on market jitter to make the trade happen. But at least we won’t have a fee to worry about!

25 Jan

using ATR to restrict market purchases

After re-checking the simulations I’d done recently, I realised I had the wrong broker fee marked in, which meant that even though the simulation said I was making a 1% return every day, I was actually losing money in reality.

The script was originally making LIMIT buys and sells, which my experience showed rarely caused a fee, so I’d marked that as a 0.1% fee in the simulation vs the normal 0.3% that a MARKET trade has. But the simulation didn’t take into account that LIMIT trades sometimes don’t get filled. The market might be at 147, for example, so you try to sell your 4 Litecoin (eg) for 147, but by the time you get the order uploaded, the market might have shifted down. Now you’re stuck with an unfulfilled order.

To solve this, I changed the script to only use MARKET trades, which are guaranteed to sell/buy, but also are guaranteed to incur a fee of 0.3%.

When I plugged 0.3% into my simulation, I was suddenly not making any money at all. Instead, I was losing money. In fact, the simulation showed I was losing money so badly that if I started with €1,000,000 46 days ago, I would be down to €14 today. That’s bad.

Looking at a 6hr chart of the Litecoin market, we can see where the money vanished.

On the 6hr chart, the highlighted area looks very flat, like nothing is happening there. So you would not expect any trade to happen. But if you zoom into the area, you can see that trades are still happening, even though they value isn’t changed very much in absolute terms.

Let’s zoom in even further, to the area that’s highlighted:

You can see that the value is rising and falling vigorously, but this is an illusion. If you look at the figures on the right, you see that the LTC is oscillating between about €142 per coin to €150 per coin. That’s about a 5% range. Remember that the fee we’re trying to avoid is 0.3%, so this /might/ be okay to trade with.

But the script trades minute by minute, not hour by hour. So let’s look at what the market does in that range by zooming in even further:

That’s one hour of data. In the preceding image, the above image is what’s contained in the single bar all the way on the right. In it, the price of Litecoin rises steadily from €144 to €146.

Worth buying?

144 to 146 is less than 1.4% of a rise. If you buy €100 worth of LTC at the beginning of this rise, and sell it at the end, it will cost you €0.30 to buy (so you’ll actually spend €100.30), the value will increase from €100 to €101.39, and then when you sell, there will be a fee of €0.31.

So even though the price went up, from €100 to €101.39, you will only have made €0.78.

This is still a profit, but if the rise was less, it might have been a loss.

Example, let’s say it was 200 to 201

You invest €100 and pay your €0.30 fee. The value increases to €100.50. You then cash out. Now you’ve just lost money, because the fees add up to €0.61, but the price increase was only €0.50.

Buying when the increase is so small can be dangerous, because if the increase isn’t enough, then you will lose money.

So how to solve this?

The problem has to do with the volatility of the market. For the last few days in Litecoin land, there pretty-much hasn’t been any!

After realising the above, I made a small adjustment to my script so that it refused to allow any trades at all if the volatility was too small. I did this by measuring the ATR (average true range), and if it was below a certain range (1.5 or so), then even if the script signalled a buy or sell, it was stopped in its tracks.

This had an immediate and amazing effect on the returns.

Beforehand, with a 0.1% fee, I was getting a 1.25% return daily in my simulations, but now I’m getting a 2.8% return daily on a 0.3% fee. That’s HUGE! In money terms, if I invested €20 46 days ago on the 9th of December (chosen by finding the oldest data point which had a value higher than the most recent data point), then that €20 would now mbe worth €73.80.

You can check these figures for yourself.

Here’s the list of buys and sells that it came up with for that time. Compare that with the GDAX charts to see where it was making its decisions:

DateTotal HoldingsEuroLitecoinDecision
2017-12-09 06:16:0020200sell (EMA)
2017-12-09 18:19:0019.940636021120.152643061120.1467952buy (SMA)
2017-12-12 03:31:0029.36742425154629.0685282375460.0014974sell (EMA)
2017-12-12 03:48:0029.3042030636560.2218644596560.1352604buy (EMA)
2017-12-12 15:14:0038.52175170325638.1299169032560.0013797sell (EMA)
2017-12-12 15:31:0038.4582437192560.2910117192560.1192726buy (EMA)
2017-12-12 16:15:0035.48967840693635.1295769729360.0012166sell (EMA)
2017-12-13 19:09:0035.3404169406610.2681059496610.1354091buy (SMA)
2017-12-19 21:14:0039.43499144845439.0342915164540.0013812sell (Chandelier exit)
2017-12-20 07:04:0039.2997793092440.2979209192440.1412599buy (SMA)
2017-12-20 13:56:0040.35716812173439.9473329347340.0014409sell (SMA)
2017-12-22 04:12:0040.1758308880540.3048847450540.1655289buy (SMA)
2017-12-22 05:22:0038.59355655705438.2018477570540.0016884sell (EMA)
2017-12-22 07:54:0038.4303411583720.2915521483720.188349buy (SMA)
2017-12-22 08:14:0035.96596461087435.6009942468740.0019212sell (EMA)
2017-12-22 08:37:0035.8814845811740.2717014811740.1771631buy (EMA)
2017-12-22 11:24:0039.40210318013439.0017763171340.0018071sell (EMA)
2017-12-22 15:15:0039.2239389743340.2976636743340.2081619buy (SMA)
2017-12-22 23:36:0049.65572643277849.1507632267780.0021233sell (EMA)
2017-12-24 03:46:0049.5232776772680.3751258122680.2013031buy (SMA)
2017-12-24 03:53:0048.58655807989448.0933348868940.0020533sell (EMA)
2017-12-24 12:22:0048.4228433058940.3670573058940.2089382buy (SMA)
2017-12-25 02:34:0051.42157858321450.8992640872140.0021312sell (Chandelier exit)
2017-12-28 19:25:0051.2085454123310.3884559393310.2352673buy (SMA)
2018-01-06 05:27:0053.76221736848153.2161668764810.0023998sell (SMA)
2018-01-16 22:55:0053.3952865952710.4061476172710.3772543buy (SMA)
2018-01-17 01:04:0058.59785130425258.0025272242520.003848sell (EMA)
2018-01-17 16:11:0058.3115950181180.4426687601180.4627293buy (SMA)
2018-01-17 22:56:0073.79815697631873.0476928763180.0047199sell (EMA)

 

Notice that the script has not made any decisions in the last few days (today’s the 25th). That’s because the last few days, there’s been nothing interesting happening in the market so it’s holding on to what it has.

Here’s a pretty picture showing the above in line format

21 Jan

automated script for the Litecoin market

Over Christmas, I started looking into how stock markets work and decided to give it a shot. The simplest way I found to start off on it was actually the cryptocoin market. I chose Coinbase’s GDAX market as the one I’d work on.

At first, I had a naive idea that I just needed to watch the numbers on the market for times when the latest figure is the lowest in a long time. Then you buy. And then the opposite happens -if the latest bar is the highest in a long time, sell.

It turns out that doesn’t work. I wrote a testing application that downloaded 6 months of per-minute data about the LTC-EUR market and ran simulations against it to figure out what would happen if I was to trade based on my algorithms. The first one (above) sucked.

So I started looking a bit further into how traders actually do it themselves.

It turns out it’s pretty simple, if you’re willing to put the testing time in and come up with some good configuration numbers.

The first thing I checked out was called “MACD” (Moving Average Convergence Divergence). That uses a simple moving average (SMA) of the market value to generate two lines – a “long” average based on 26 figures, and a “short” average based on 12 figures. When the short average crosses over the long, it signals an action. For example, if the current short value is higher than the current long, and the last calculations were the opposite (short under long), then that indicates you should Buy, because it looks like there is an upwards trend. The opposite happens when the crossover shows the short going under the long. Sell.

The 12 and 26 figures are traditional. You could work based on them, but my tests showed that there are different figures that can give you better results. I guess it depends on the market. My current settings here are 25/43 instead of 12/26.

The next thing I worked on was a “Chandelier Exit”. This is a strategy for cutting your losses when the market suddenly drops more than usual. To do this, you measure the “ATR” (average true range) for the last n periods (traditionally 22). You then multiply the ATR by a volatility value (traditionally 3), subtract that from the current High value, and if the current market value is below that, Sell. My current values for this are a volatility of 5.59 based on an ATR of 18 bars.

I then looked at exponential moving average based MACD. The standard moving average is a straightforward average of n numbers, but the EMA puts more weight on the more recent numbers, so it reacts quicker to changes in the market.

After trying to tune the EMA for a while, I found that if I use EMA instead of SMA, then I get worse results, because the script would buy quickly when it saw an upward trend, but that might turn out to be just jitter and you’ll lose it all immediately afterwards. It’s safe to sell when the market drops, it’s not safe to buy when the market looks like it’s just starting to rise. it’s better to take your time.

So, I added a switch to my code so that I could decide whether to use SMA or EMA for buys and sells, etc.

I found that the combination that gives the best results uses only SMA for buys, but then uses all of SMA, EMA and Chandelier exits to signal a sell. Oh – EMA of 40 and 80.

Doing this, I’ve been able to come up with a configuration of my script that gives an average return of about 1.1%. This means that if you were to invest €5000, then there would be “interest” of about €55 per day. Or if you can keep your money in the game, it starts to grow. €50 invested for 365 days at 1.1% interest per day is €2711.

If you’re interested in going the script a shot, you can download it from here.

I keep on having more ideas on how to improve this.

29 Oct

salepredict3: automated test results

Based on a suggestion by ChΓ© Lucero (LinkedIn), I wrote a test to see exactly how accurate this machine is.

I had 41 domains already entered into the engine and categorised as Sale or Fail, so the test was based on those.

For each of the domains, the test:

  1. changed the domain’s type from sale/fail to prospect
  2. retrained the neural net using the rest of the domains as its reference data
  3. calculated how much of a match the domain was to a sale using that neural net
  4. if the calculation indicated correctly a sale or a fail, then that counted as a correct test
  5. finally, clean up – reset the domain’s type back to sale/fail, ready for the next test

After 41 tests, it got 27 correct – an accuracy of 65.85%. That’s much more than chance (50%).

I’m going to get some more data now, but I expect it will only improve the value, not decrease it.

What does this mean for your own business?

Well, let’s say you have 100 companies you can potentially sell to, and you expect that 50% of them might end up being a waste of time, but you still need to spend about 2 hours on each in order to find that out.

Without using my engine, after 100 hours of selling, you will have made 25 sales. (100 hours is 50 companies. 50% success rate so 50/2 = 25).

With my engine, after 100 hours of selling, you will have made 33 sales, because it will have pre-ordered the companies and got it 66% correct, so in the first 50 companies, it will have correctly placed 66% of all successful sales.

29 Oct

salepredict2: we are live

I’ve finished the base engine of the Sale Predict project. If you go to SalePredict.com and fill in some of your sales and fails, then it will be able to predict the chance of success for any prospective jobs you have.

For example, let’s say you have successfully sold to 25 companies before, and 25 other companies have turned you down. Let’s say you also have a list of a further 25 companies that you want to approach, but because each of these takes a few hours of research and negotiation, you would prefer to work on them in order of which are most like companies that you have already sold to.

All you need to do is put in the 25 sales and fails, and a neural network will be automatically trained up based on that data, which will then be able to analyse the 25 prospects that you have.

The engine currently accepts logins via LinkedIn and Facebook. I will add more.

You are given 100 free credits as soon as you login. This lets you test it out to see if it works for you. I will add a payment method shortly for increasing the number of available credits. 100 should be enough for anyone to realise how effective this is.

I’m working on an automated test at the moment to figure out exactly how successful the engine actually is. The test works by taking a list of sales and fails, then doing a round of tests on each of those websites, temporarily changing the website to a “prospect” so the system does not know if it was a sale or a fail, then retraining the network on the other domains, and seeing if it accurately predicts the original value (sale or fail) for the test domain. This will take a while to run, so I’ll post the results in the next article.

27 Oct

salepredict1: the Sale Predict project

Around April of this year, I had an idea that I wanted to pursue. I felt it would make an important difference to our business (FieldMotion). The itch was so strong that one weekend, I set up a server and wrote a prototype of the idea in my own time. It worked perfectly. But, it solved a problem that we weren’t interested in anymore, so the work I did on it was mostly wasted. A by-product of it become some useful information, but not the main part of it.

Okay – let’s look at the problem.

Let’s say you’re a business that is trying to expand. You get your work by contacting other businesses that you think may need your product, and trying to get them to work with you. Cold-calling, or trying to arrange a meeting through mutual friends, etc.

The old way to do this would be to get a phone directory, find a list of companies in an industry that you think is right, and just start calling, working each number one by one until you find one that sticks.

But this is usually a waste of time. Either the prospects already have a solution, have no interest, or are too dissimilar to those you’ve sold to before so you can’t establish a common ground.

The problem, shortened, is this: How can you take a long list of potential clients, and order them so that those most likely to buy from you are first in the list?

A solution to this came to me earlier this year. You need to find companies that are similar to those that you have already signed with, but that are not similar to companies you failed to sign with. This is a top-level description, obviously. The technical details of how to measure similarity are beyond this article.

To do this, I wrote a program that takes three lists of domain names:

  1. domain names of companies that you have done business with.
  2. domain names of companies that you cannot do business with (either they’re too unsuitable for your work, or they just said No for any reason).
  3. a long list of domain names of companies that you want to figure out what order in which to call them.

The program reads the front page and all pages linked to the front page of all mentioned domain names, extracts words and “n-grams” (groups of words), and figures out using a neural network what kind of language is used by companies that you usually sell to.

After this, it can then come up with individual numerical scores of how suitable each prospective company is.

I ran this on a list of about 50,000 companies as a test back in April, to see what it would say about my own company’s chances with those prospects. In the top 10, it named a company that we had actually talked to a few years before and that had said they would go with us except we were too young at the time. In the bottom 10, it listed a charity shop, which is totally not our target audience. The thing worked!

But, we don’t work in that way anymore, so it turns out that the list generated by the machine was never used. Oh well.

This week, I’ve decided to revive it and make it generally available. So this weekend, I will work on a simple website to make it possible to generate your own domain lists. It will allow a list of, say, 50 domains free, but anything beyond that will cost. Hosting costs money, and this uses a lot of heavy computation.