Wednesday, January 27, 2010

Week 4 Update

Change in Course

I was going to be doing this project in Matlab and using a multilayer perceptron as my classifier. However I've since made the decision to work on this in C++/OpenCV and use an SVM.

Reason for C++:
I've been working with Kai Wang, a graduate student in the CSE dept. He is also working with text detection and his project is implemented in C++ with OpenCV. I figured that it would be a lot easier to utilize his help if the two projects were under the same platform.

Reason for SVM:
In my experience, ANNs can be very effective classifiers but can take a very long time to train properly. Since this project will only be active for 6 more weeks, I think that an SVM is a safer bet.

---

First feature set I will be trying: Image integral boxes

By taking various per-pixel features, such as brightness, gradient magnitude, and gradient orientation, and then computing the sum/std of various boxes of that information, we can achieve a reasonable feature set for detecting text.











(Image from Chen & Yuille)

Once the features have been computed for all regions of an image, this vector can be sent to the classifier to determine whether or not a given region contains text.

To give everyone an idea of what kind of image I'll be training on, here is an example from the training set:

















TODO: for next week.

Implement the full pipeline of load image -> compute feature set -> pass into training algorithm for text detection. I want to get a basic pipeline running and then will debug and add more interesting features for detection.

Wednesday, January 13, 2010

Introduction and Project Outline

Theory

Recognition of text in arbitrary real world images is a largely unsolved problem in Computer Vision and Machine Learning. Contrast this with document-based OCR of scanned text which is solved to a near-human degree of accuracy.

Image Text Recognition includes all of the difficulties of traditional OCR methods with some additional challenges.

  • The first problem I will refer to as "text detection." Text detection is the problem of finding the bounding boxes of possible text in the sample image. It is essentially a binary classification problem where the goal is to determine whether or not a given region of an image may contain letters or words. Humans are very good at this problem. Even when looking at a language we don't understand or when words are obscured or too distant to recognize entirely, we can still determine the presence of written language.
  • The second problem I will refer to as "word recognition." This is the challenge of taking an image that contains text and outputting the text as a string. This is what OCR engines do but they are designed for the very basic case of black text on a white background, with uniform font face, size, distortion, color, angle, lighting, lexicon, etc. In recognition of text in an arbitrary real-world image, all of these variables are unconstrained.
Application

My goal is to implement a text-detection and word-recognition algorithm and apply it to Google Street View as demonstrated in the following slides:

http://dl.dropbox.com/u/3301354/190ppt.ppt

Work To Be Done
  • Implement and train text detection algorithm
  • Implement word recognition algorithm
  • Collect google street view data (screenshots)
  • Examine the results of correlating metadata from the map with text recognized in images
One addition I would like to make to my text recognition algorithm if things move on schedule would be to allow different angles of the same text to be used to boost the recognition algorithm. This would require registering the same text from different angles and combining the results of recognizing the text in either image. This would help to provide invariance to view angle, lighting angle, and occlusions of some parts of the text.

Training Data

I won't be training my detector with GSV images, I will instead use pre-made data sets designed for this kind of work. There are a few here which I'll be using primarily to train my algorithm:

http://algoval.essex.ac.uk/icdar/Datasets.html#Text%20Locating

A Few Links to Work I'll Be Building Off

http://www.comp.nus.edu.sg/~cs4243/projects2008/text_natural_scene.pdf
http://www.yaroslavvb.com/papers/lucas-icdar2003.pdf
http://www.tu-chemnitz.de/etit/proaut/paperdb/download/lowe99.pdf
http://dtpapers.googlecode.com/files/hog_cvpr2005.pdf