Wednesday, February 17, 2010

Week 7 Update

Current progress:

  • Image loading and feature extraction is working. I am computing the mean and standard deviation in grayscale brightness over adjacent rectangular regions. These values are my features for training.
  • I have processed available training that I have and I'm currently working with roughly 12000 negative samples (no text in image or text not centered) and 6000 positive samples (text completely fills window).
  • I'm constructing a giant training matrix labeled with 0 or 1 and passing it into OpenCV's SVM class.
  • I am having issues with training -- I believe the SVM is overfitting the data. I'm researching more into training SVMs to see if I can prepare my training data better or set the parameters to better initial values. Right now my results are terrible. I get all false positives or all false negatives.
  • In addition to the training code I have written the testing code. This will take an arbitrary image and pass a 48x24 window over it at various resolutions in order to detect text.
  • Essentially the only thing preventing a demo at this point is that my classifier is not working!
  • Need to incorporate gradients into feature analysis. The trouble with mean and standard deviation is that they don't take into account spatial relationships within the regions I'm summing up. In other words, I could randomly permute all of the pixels within each region and still get exactly the same results. Using gradient magnitude and angle would create some sort of metric of the texture in the region rather than just average brightness. This should be simple to include in my existing pipeline.
  • Clean up code a little bit. Everything has been written really quickly in prototype-mode but my code is getting to the point (375+ lines) that I need to organize it a little better to keep working.
  • Once my detector is reasonably accurate, perform OCR on the results. I've looked into Tesseract as an OCR engine I can add at the end of my pipeline. I'm not going to look into optimizing the results at this stage, I'm going to hope it will "just work."
  • Lastly, I need to work in Google Street View data and find a way to test results.
  • Week 7/8: Debug classifier, add gradient angle/magnitude as features, clean up code
  • Week 8/9: Incorporate Tesseract & GSV data into pipeline
  • Week 9/10: Freeze any implementations of new features, focus on getting optimal results from what I have, and work on final report
CONCEPT: (excuse my crude drawings)

Feature extraction:

Sliding window to detect text (with background scaled)