Benchmark OCR

Choose a dataset

The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution. In dealing with outdoor street level imagery, we note two characteristics. Image text often comes from business signage and business names are easily available through geographic business searches. These factors make the SVT set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses.

Overview
Files
Tag(s)
/svt/img/00_00.jpgDOLL, HUT
/svt/img/00_01.jpgASTORIA, BEST, INN, SUITES, VALUE
/svt/img/00_08.jpgMARBLE, YARD, ORION, TILE
/svt/img/00_12.jpgSUBWAY
/svt/img/00_14.jpgHOUSE, ORIGINAL, PANCAKE
/svt/img/01_02.jpgSOL
/svt/img/01_09.jpgNICK
/svt/img/01_10.jpgPAYLESS, SHOE, SOURCE
Results
Google Cloud
AWS
Microsoft
Tesseract
Percentage of words detected89.8832684869.6498054538.132295726.61478599
Average execution time (second)1.726860891050581.323964513618680.1949672737430170.37840094
Pricing ($) (per image)0.00150.00150.001Free