converting html to pdf in php

I have a client who asked us to generate PDF reports that he can then send out to his own clients.

The way we are settling on (through long and arduous twisty paths!) is to generate HTML versions of the report, which can then be “tweaked” in FCKeditor before being finalised as PDF reports.

When converting the final HTML report to PDF, I started off using HTML_ToPDF (huh? why not “HTML_To_PDF” or “HTMLtoPDF”?).

The API was very simple to use, and conversion was simple and almost perfect – except that it ignored the CSS that our designer had placed in. Specifically, the most obvious example was that tables were missing their solid black borders.

So, I went searching for other APIs that might render the CSS correctly.

I tried DOMPDF, which claims to be CSS 2.1 compliant, but failed to render anything – it kept falling down with some obscure errors such as “Frame not found in cellmap” – what? I don’t use Frames, so the error makes no sense to me – I /guess/ that cellmap refers to the table cells, but there’s no problem with my HTML code, damnit!

Then I tried HTML2FPDF, which is very similar to HTML_ToPDF in API style. It also did not render the border.

I finally tried shifting how the CSS was entered – instead of adding it in a style block in the head of the document, I placed the CSS inline, in each element – such as <table style="border:black 1px solid">

That didn’t work in HTML2FPDF, but /did/ work in HTML_ToPDF.

Long story short? Write your CSS inline if you want to convert to PDF. As a side-effect, writing the code inline also made the CSS render in FCKeditor.

4 Comments.

  1. mmm, I don’t think it is good practice to write inline css code, at best it is a work-around for the pdf problem.
    As for FCKEditor, just use FCKConfig.EditorAreaCSS to set your css file (yes, external file, not embedded in the head part), and it will render it.

    I understand that it is a very specific report you are talking about, but the last sentence could be understood as a guide for generic web dev

  2. True. I would never advocate inline CSS unless it is a very specific case. External CSS is preferable, followed by CSS in a <style> tag if you cannot use an external sheet.
    In this case, though, it’s a very special case, and inline CSS was very much a hack.

  3. So, logical next step would be…

    …writing some javascript to dynamically parse the external and head CSS and apply all styles to your new document as inline styles.

    That would certainly be an undertaking. Not that parsing CSS and applying inline styles would be hard at the most basic level (not at all), but eventually you must deal with inheritance and the DOM hierarchy, specificity and such.

    At some point the scripting (non-OO) nature of Javascript may start to be cumbersome, and as the number of page elements and CSS styles increase so does the processing time.

    But Kae, I think you could do it. 🙂

    Man, what a toolbox that would be. But it still begs the question… would the only usage scenario be with HTML_ToPDF? Or is there a larger interest? Could it be coupled with something else to appeal to a wider audience?

    Well…

    An interesting addition to this idea would be to write it in a server-side language such as PHP (would almost have to), and also convert all externally referenced images to Mozilla’s “data:” resources as well. Inline styles + embedded images = fully contained rich document, all in one file. And it would be semi-standard.

    Just some early morning thoughts…

  4. Bradley, I came across another html2pdf parser which does almost what you describe – it’s virtually a PHP-written browser all unto itself – html2ps/html2pdf – it’s a massive undertaking, but unfortunately, falls down in PHP5. I spent almost an entire day trying to fix it, as I could see from the code that this was “The One”.

    There is definitely more than one application to this kind of thing – instead of just using the code for a PDF conversion, this could could, with a tiny bit of work, be used to create a server-side webpage renderer – complete with optional quirks to emulate some of the *ahem* less-correct browsers. This would then allow you to measure the differences between the browsers virtually unattended!

    Not that I have the time for such an undertaking at this point in time (thanks for the vote of confidence though!) – right now, I’m writing a search engine for my KFM project, and don’t want to get distracted by other projects, no matter how rewarding they will end up.

    But… once KFM 1.0 is released (a few months off at least!), I might revisit the idea.

%d bloggers like this: