I am engaged in a project with the Naval History and Heritage Command, converting documents to HTML for them. I acquire the documents and run them through an OCR program. Then they get translated to HTML and are proofread by volunteers. A lot of this material is WWII related, but NHC is currently on a push to add over ten thousand pages of documents from the Secretary of the Navy to the President, being part of POTUS's State of the Union report to Congress. The years covered for this run from 1821 to 1861. You can tell by the number of pages that this is a lot of material and most of it has never been available to any but the most dedicated researchers until now. So, if you want to help out with this project and read some rather interesting documents while checking for errors we are sorely in need of help. The documents run from ~20 pages to over 800, and the pace of proofing is up to you. NHC has a set of criteria that we should adhere too, but even for a government agency it's not too irrational. We're also looking for people to format the output from the OCR program according to the above-mentioned criteria. I admit to being a total yutz with HTML, so help there would be greatly appreciated as well. It would mostly be finding "flags" I put in the output, such as [move] [block] etc., and centering, right aligning, that kind of thing. NHC is interested in keeping the files as "trim" as possible because people at sea will be downloading these documents for professional development purposes and bandwidth at sea is limited. If you'd like to see what we're doing I can give you a URL of the work pending right now so you can get an idea of what can be available in the future. And thanks in advance for any offers of help.
It's all "lost history" for the most part. I keep doing it because there are so many things I didn't know about that period.
Bump-a-lump. And thanks to the folks that volunteered already. Glenn is thrilled with the additional minds. BTW, HTML people are needed. You'd be checking and fixing the code on items that have been OCR'd and/or proofread. The NHC's emphasis is on "tight" code, no extraneous baggage. The reason for this is the limited bandwidth on ships at sea. Every byte counts when you're using your month's allotment of downloads to pull in reference works.