Wednesday, August 28, 2013

Tesseract is an open source OCR engine available, currently maintained by google. Tesseract can be deployed in server. It can also be deployed in Android. This article concentrates on how to deploy Tesseract in Android.

Overview: Tesseract is written using C/C++, hence you might need to use Android NDK for it. Instead of developing from scratch, we will use an existing code written by Robert thesis (https://github.com/rmtheis).

Installation / setup Prerequisites:
This article assumes the reader is aware of android, hence the following should be preconfigured in order to use tesseracct android – JDK 1.7, android sdk and ndk, latest eclipse and adt eclipse plugin.

Source /configuration :
1.       Download https://github.com/rmtheis/tess-two
Tess-two is the tesseract library project written in cpp. 

2.       Import the project within eclipse.
3.       Compile the tess-two project.
Linux or windows with cygwin is not required for compiling the tesseract - However you might need to have the latest  NDK builder.
Hence ensure that you have downloaded the latest Android NDK
a.       Configure your eclipse to use the ndk (similar to SDK configuration).
Windows -> preferences -> Android -> NDK. Give your NDK installation directory.

b.      Go to project properties of tess-two project and add the android ndk builder as follows:



4.       Compile the tess-two project, this should take some time. 



Once compiled you should see the following in the libs folder.
 


5.       Now download the android-ocr project in https://github.com/rmtheis/android-ocr
Android-ocr is a client test project which interacts with tesseract ocr library.

6.       Add the project to the workspace.
7.       Ensure that you have added tess-two project as a reference in android-ocr project.


8.       Now build the project. And run the app in your android phone, you should get the Android ocr project running in your android phone with no issues.