Wednesday, January 13, 2010

Introduction and Project Outline

Theory

Recognition of text in arbitrary real world images is a largely unsolved problem in Computer Vision and Machine Learning. Contrast this with document-based OCR of scanned text which is solved to a near-human degree of accuracy.

Image Text Recognition includes all of the difficulties of traditional OCR methods with some additional challenges.

  • The first problem I will refer to as "text detection." Text detection is the problem of finding the bounding boxes of possible text in the sample image. It is essentially a binary classification problem where the goal is to determine whether or not a given region of an image may contain letters or words. Humans are very good at this problem. Even when looking at a language we don't understand or when words are obscured or too distant to recognize entirely, we can still determine the presence of written language.
  • The second problem I will refer to as "word recognition." This is the challenge of taking an image that contains text and outputting the text as a string. This is what OCR engines do but they are designed for the very basic case of black text on a white background, with uniform font face, size, distortion, color, angle, lighting, lexicon, etc. In recognition of text in an arbitrary real-world image, all of these variables are unconstrained.
Application

My goal is to implement a text-detection and word-recognition algorithm and apply it to Google Street View as demonstrated in the following slides:

http://dl.dropbox.com/u/3301354/190ppt.ppt

Work To Be Done
  • Implement and train text detection algorithm
  • Implement word recognition algorithm
  • Collect google street view data (screenshots)
  • Examine the results of correlating metadata from the map with text recognized in images
One addition I would like to make to my text recognition algorithm if things move on schedule would be to allow different angles of the same text to be used to boost the recognition algorithm. This would require registering the same text from different angles and combining the results of recognizing the text in either image. This would help to provide invariance to view angle, lighting angle, and occlusions of some parts of the text.

Training Data

I won't be training my detector with GSV images, I will instead use pre-made data sets designed for this kind of work. There are a few here which I'll be using primarily to train my algorithm:

http://algoval.essex.ac.uk/icdar/Datasets.html#Text%20Locating

A Few Links to Work I'll Be Building Off

http://www.comp.nus.edu.sg/~cs4243/projects2008/text_natural_scene.pdf
http://www.yaroslavvb.com/papers/lucas-icdar2003.pdf
http://www.tu-chemnitz.de/etit/proaut/paperdb/download/lowe99.pdf
http://dtpapers.googlecode.com/files/hog_cvpr2005.pdf


No comments:

Post a Comment