Contents
A while ago I promised a how-to on the process of converting a CHM file to a PDF in a decent way. I didn’t write it right away, but being late is better than not doing it. The basic requirements for this guide are a CHM file, a UNIX-like OS and either a package manager or the source codes that will be stated. Also, this guide is not a thorough how-to on chm2pdf, it only shows the basics on using it.
This is the first guide I write using a default schema that I hope will be good enough to use from now on. If you read this and find it hard to read in any way, please leave a comment or write an email so I can correct it. Feedback is greatly appreciated.
- (Required) chm2pdf. chm2pdf is a Python script that uses Python’s bindings of chmlib and htmldoc. The script decompiles a CHM document, strips it off the useless elements for a PDF and uses htmldoc for heavy document processing. Download chm2pdf here or install it using a package manager.
- (Required) A CHM file that you want to convert. (Required)
- (Optional) The book’s cover artwork.
- (Optional) Any PDF printer to easily create the book’s cover page.
- (Optional) Adobe Acrobat. Acrobat will be used only to correct the bookmarks table and add the book’s cover. Acrobat is the PDF editor I know how to use, but any PDF editor that can edit bookmarks will do the trick.
- Install chm2pdf using the preferred method for your platform.
- On OS X, there is an available port on MacPorts depending on Python 2.6. Issuing
: sudo port install py26-chm2pdfwill install version 0.9.1 and all its dependencies. - On Ubuntu, issuing:
sudo apt-get install chm2pdfwill install version 0.9 and it can be further upgraded (Highly recomendable) to version 0.9.1 from source. - On other systems, chm2pdf can be installed by compiling chmlib, pychm and htmldoc, followed by issuing:
sudo python setup.py install
- On OS X, there is an available port on MacPorts depending on Python 2.6. Issuing
- chm2pdf accepts a number of parameters that are passed to htmldoc. For example,
chm2pdf-2.6 --book <input chm>will try to decompile the given file and output a PDF file as if it were a book (which is most probably what you want) by creating its bookmarks and printing the PDF in a continuous way. If chm2pdf outputs a PDF file, the conversion process ends here. Unfortunately, this will fail for most CHM files (for more information read the next section) - If htmldoc couldn’t output a PDF file, the CHM file you’re trying to convert is not structured. Two things can be done: Restructure the CHM file (will not be discussed in this guide) or use the –continuous option introduced on version 0.9.1. Issuing:
chm2pdf-2.6 --continuous <input chm>will successfully output a PDF with two problems to be corrected: A bad structured bookmarks tree and an outputted table of contents that can’t be read.
- Open the PDF file using Acrobat or your editor of choice. Reorganize the bookmarks by pulling them into their proper headings. Be careful when doing it, as you my find bookmarks that are either repeated or are useless.
- In case your file does not have a proper disorganized bookmark tree, the generated table of contents contains links that can be followed in order to manually generate the bookmarks tree.
- Delete the generated pages that contain the table of contents as most of the times, they won’t be properly formatted and as you have already a properly structured bookmarks tree, it won’t be necessary.
- Print the artwork of your book with the original dimensions and margins of 0. Add it as the first page and save your finished file.
Congratulations. You have successfully converted your CHM file into a PDF file. You may have noticed that chm2pdf/htmldoc generates a document using default settings. In order to further format the file, read the next section.
As stated, the –book option will most probably fail as htmldoc depends on the internal structure of the chm files (that is, html files) to be correctly structured. A correctly structured html file is a file that is both well written and well formed. Most probably, this incorrect structure is triggered by a bad use of heading tags (<h1>, <h2>, …).
The –continuous option introduced in version 0.9.1 solves this at the expense of a badly structured bookmarks tree and format errors. Another solution that was introduced with this version is using the beautifulsoup package, which attempts to correct the files. So far, I have only tested it on a few files without success.
Using the –continuous option is the safest way as it will (almost) always produce the expected output.
Another thing worth mentioning is the final format that will be set. chm2pdf uses the built-in default format if no further arguments are passed. A longer and more specific way of calling chm2pdf is:chm2pdf-2.6 --continuous --no-duplex --size 17.78x23.34cm --no-numbered --top 1.9cm --bottom 2cm --left 2.5cm --right 2.5cm --header ... --footer ... --no-toc --fontsize 10 --textfont times <input chm> which is the aproximate configuration of an O’Reilly book, for example.
For further format options, read chm2pdf’s man page
- chm2pdf Google Code page
- Chris Karakas’ (author) explanation on version 0.9
- chmlib page
- htmldoc page
- pychm page

