Recognition of text in arbitrary real world images is a largely unsolved problem in Computer Vision and Machine Learning. Contrast this with document-based OCR of scanned text which is solved to a near-human degree of accuracy.
Image Text Recognition includes all of the difficulties of traditional OCR methods with some additional challenges.
- The first problem I will refer to as "text detection." Text detection is the problem of finding the bounding boxes of possible text in the sample image. It is essentially a binary classification problem where the goal is to determine whether or not a given region of an image may contain letters or words. Humans are very good at this problem. Even when looking at a language we don't understand or when words are obscured or too distant to recognize entirely, we can still determine the presence of written language.
- The second problem I will refer to as "word recognition." This is the challenge of taking an image that contains text and outputting the text as a string. This is what OCR engines do but they are designed for the very basic case of black text on a white background, with uniform font face, size, distortion, color, angle, lighting, lexicon, etc. In recognition of text in an arbitrary real-world image, all of these variables are unconstrained.
My goal is to implement a text-detection and word-recognition algorithm and apply it to Google Street View as demonstrated in the following slides:
Work To Be Done
- Implement and train text detection algorithm
- Implement word recognition algorithm
- Collect google street view data (screenshots)
- Examine the results of correlating metadata from the map with text recognized in images
I won't be training my detector with GSV images, I will instead use pre-made data sets designed for this kind of work. There are a few here which I'll be using primarily to train my algorithm:
A Few Links to Work I'll Be Building Off