Sunday, February 20, 2011

DoggyBook


DoggyBook is an open project to digitize books, pamphlets and papers for use in E-readers like Kindle and Nook.

I have recently gotten very interested in the availability of classic texts in digital format. The reason? Dad just bought himself an Amazon Kindle 3 e-reader, and he lets me use it as well. Most of my reading list is older technical manuals, but many of them are PDF format. The text is "frozen", and will not wrap to fit a small page.  Many of these PDFs are actually scanned as images, making them very inflexible. You can view a page "fit to screen" (very small) or full size (hard to navigate). So I have been converting  the Handbook of Downdraft Gasifiers Engine Systems to a more "e-friendly" format. This is a government publication in the public domain.

After looking around, I discovered that the process of digitizing an old book consists of three major steps.

1) Take a snapshot of the physical page, using a scanner or similar device. This can be exported directly for "snapshot" PDFs.

2) "Read" the text on the page using Optical Character Recognition (OCR), which leaves you with an unformatted mass of misspelled text - but text nonetheless. All images, tables and equations must be skipped by the OCR, as they only produce garbled nonsense. 

3) The text must be spell-checked (mostly by hand), edited back to original formatting as much as possible, and the images and tables added back in as graphics. This is the human element, and this is why so many older texts are only available as PDF snapshots. Once formatted properly, it can be saved out to an e-book format like epub or mobi.

It is a lot of work, especially if you are not in the business of doing it. I have found some free tools to help with the job. First is Adobe Reader. Most versions will export to a text file. The only catch is, there has to be some text to export. "Snapshot" PDFs are only images, no text. So instead, we need FreeOCR, reportedly the heart of the Google Books OCR software. It is plain and unadorned; you see a panel with the PDF page, draw a box around the text area, and hit "Convert". Depending on the amount of text, each chunk takes about a minute to process. Then you see in the other panel, the plain text output. Copy this to a Word document, then clear the cache and do some more. I did a 140 page book in about 2 hours. Then you have a massive .doc file. Books this size are difficult to work on with a slow PC like mine, so breaking it into chapters makes sense. 

Then I decided to move the whole project out into the open, so that more eyes can help spot the typos. I created a wiki called DoggyBook, and the whole text is posted on various pages there. I will continue to edit it online as I can, and maybe interested folks will join me and speed this thing along. Periodically I will compile the whole thing as a .doc file and post it for download on the front page. Eventually, all the typos will get smoothed out, and images will be inserted in the right places.

This is only worth the effort for a special book. In the area of biomass gasification, this is certainly one of the classic texts. In fact, it is still in print - the Biomass Energy Foundation will sell you a spiral bound photocopy for $35. But nobody has it available for e-readers yet. I aim to fix that.

Wednesday, February 2, 2011

Cheese grating day

As we phased out industrial prepared foods from the menu, one of the most important ingredients turned out to be cheddar cheese. It is delicious, inexpensive, and goes on lots of our favorite meals. There are nine eaters at our table, meaning everything is done in large quantity. We just bought 30 lbs of sharp cheddar from Sam's Club last week, in the form of 5 lb blocks. (At the same time we bought 36 lbs of butter, but that's another story.)



Most of this cheese will end up grated. Grating large quantities of cheese calls for sturdier tools than you might expect. An ordinary food processor balks at the soft, high-friction material; we have burned out a couple of them already. Nowadays we use a King Kutter, a manual crank rotary grater that you might expect to see in a Lehman's catalog. It is very sturdy, and the only motor to burn out is your right arm. Grating 25 lbs of cheddar is about my limit. We slice the blocks lengthwise using a guitar string, and each two-and-a-half pound slab when grated fills a gallon Ziploc bag. 




The only caveat to the King Kutter is that it leaves a 1/8 strip of ungrated cheese the length of the block; this is because of the design and cannot be avoided. We just break it up and save them for nibbling. It's a small price to pay for avoiding grated knuckles. Cleanup is a snap; wash the bowl, cone, handle and suction-base. We also grate parmesan with a finer toothed cone; one five pounder lasts for a long time. We usually grate cheese once every six weeks or so.