07 Nov

word->html thing

If you use a rich-text editor to allow your clients to post stuff to the net, then you’ve probably come across the annoying problem of people pasting content from Word into their sites, and then complaining that their Internet Explorer crashes when they view it.

This is because when content is pasted from Word, it includes crap such as <o:p></o:p> which Inernet Explorer does not like.

One very small regexp to clean up bits like that is:

$body=preg_replace('#</?([ovw]|st1):[^>]*>#','',$body);

That’s cleaned up one site I’m working on anyway. There may be more pseudo-elements out there still, but I haven’t noticed them yet.