Re: OCR Software..?!

Andrew Kuchling (amk@magnet.com)
Wed, 12 Nov 1997 19:17:28 -0500 (EST)

Random thoughts on OCR:

An OCR program would require a user interface and a recognition
engine. You can see a screen shot of my interface at the previous
mentioned URL; my idea was that scanned data would wind up in a Tk
text editing box, with possible errors (where the confidence value of
the recognition is low) highlighted in red. You would click a 'Next
error' button to move to the next highlighted problem area, edit the
text so it's correct, and continue onward; a Save button would write
the edited data out to a file.

Recognition is the complicated part, of course. First you need to
scan the image, then it's usually converted from grey-scale to 2-level
black-and-white. Documents are often not perfectly aligned when
they're scanned, so the angle at which they're tilted (called the
"skew angle") has to be measured and compensated for. Then the image
has to be segmented into words, and words into letters; each letter is
then recognized, and usually a confidence value is attached to each
letter. Often there's a post-processing step which uses a language
dictionary to correct errors; for example, if you're scanning English
text, 'rn' might be a scanning error for "m".

The two major techniques for recognizing letters seems to be either
neural networks, or making a vector from easily measured
characteristics of the bitmap containing a letter; for example, xocr
takes a histogram of the letter at 128 different angles. This
technique dates back at least to the 1970s, but neural networks seem
to be what all modern systems use.

There's a very helpful volume of IEEE reprints entitled "Document
Image Analysis" edited by Lawrence O'Gorman and Rangachar Kasturi:
ISBN 0-8186-6547-5.

Hey, I just noticed that Stuart Inglis' page at
http://www.cs.waikato.ac.nz/~singlis/ocr/ has been updated as of
Oct. 31. Inglis' name was forwarded to me by the FSF, and seems to
know what he's doing (he has lots of machine learning expertise), but
the C/C++ code still isn't available from that page. We should
approach him, and get a freeware-OCR mailing list set up.

Andrew Kuchling
amk@magnet.com
http://starship.skyport.net/crew/amk/

--
Source code, list archive, and docs: http://www.mostang.com/sane/
To unsubscribe: echo unsubscribe sane-devel | mail majordomo@mostang.com