ViewerDebugging tesseract-ocr on Ubuntu 16.04

Refers:
https://github.com/tesseract-ocr/tesseract/wiki/Compiling
https://github.com/tesseract-ocr/tesseract/wiki/ViewerDebugging


The following components are required to run the viewer:
  • Java runtime
  • piccolo2d-core-3.0.jar
  • piccolo2d-extras-3.0.jar
  • ScrollView.jar, built from the source in tesseract/java 
commands to install Java runtime on your ubuntu.
sudo apt-get install default-jre
sudo apt-get install default-jdk

 
Download piccolo2d-core-3.0.jar, and piccolo2d-extras-3.0.jar to tesseract/java to build ScrollView.jar.
commands to make ScrollView.jar:
cd java
make ScrollView.jar

!!!Important!!!
export SCROLLVIEW_PATH=~/tesseract/java
Otherwise you will get errors like:
ScrollView: Waiting for server...
Error: Unable to access jarfile ./ScrollView.jar

  
Add below blue codes to api/tesseractmain.cpp to dump some internal results
...
  api.SetVariable("tessedit_dump_pageseg_images", "true");    //show no lines and no image picture
  api.SetVariable("textord_show_blobs", "true");  //show blobs result
  api.SetVariable("textord_show_boxes", "true");  //show blobs' bounding boxes
  api.SetVariable("textord_tabfind_show_blocks", "true"); //show candidate tab-stops and tab vectors
  api.SetVariable("textord_tabfind_show_reject_blobs", "true");   //show rejected blobs
  api.SetVariable("textord_tabfind_show_initial_partitions", "true"); //show initial partitions
  api.SetVariable("textord_tabfind_show_partitions", "1");    //show final partitions
  api.SetVariable("textord_tabfind_show_initialtabs", "true");    //show initial tab-stops
  api.SetVariable("textord_tabfind_show_finaltabs", "true");  //show final tab vectors
  api.SetVariable("textord_tabfind_show_images", "true"); //show image blobs


  if (!renderers.empty()) {
    if (banner) PrintBanner();
    bool succeed = api.ProcessPages(image, NULL, 0, renderers[0]);

...



Then make tesseract again.

Ok. let's run tesseract now.
tesseract  ~/rotate.png  ~/out -l chi_sim+eng
Info in bmfCreate: Generating pixa of bitmap fonts from string
Info in bmfCreate: Generating pixa of bitmap fonts from string
Tesseract Open Source OCR Engine v4.00.00alpha-337-g7c27088 with Leptonica
Starting sh -c "trap 'kill %1' 0 1 2 ; java -Xms1024m -Xmx2048m -jar /home/lenger/tesseract/java/ScrollView.jar & wait"
ScrollView: Waiting for server...
Socket started on port 8461
Client connected


The image to handle is:

The intermidiate results showed by steps:









And the final output is:
E Creeks a member of a
Native American people, many of whom now live in
the US state of Oklahoma 3% M % A. ( 36 435, men
Rerum fiche Bri )

noun 1 (8r6) a narrow area of water where
the sea flows into the land AW; /\ W GZID inlet
2 (WAmE, Australé, NZB) a small iver or stream VM;
mig

IED up the "creek (without a 'paddle) (nforma) in a
difficult or bad situation # F (M ( WRM ) : I was
really up the creek without my car. 18 T RLB M F 2B
无 方 俪

creel /kril/ noun a masket for holding fish that have
Just been caught. Ct dB ) fae

creeplkrtpl verb, noun

wverb (crept. crept /kropt/) REG in the phrasal verb
ereep sh out, creeped is used for the past tense and
past participle. fla ll shid creep sb out "b, creep Bit
表 式 和 过 去 分 词 地 为 ceeped。 1 [1 (+ (of
people or animals A # #1 #) to move slowly, quietly
and carefully, because you do not want to be seen or
heard #i X0 F lts " erept up the
stairs, trying not to wake my parents. J; T RRR EM
©. E TMB. 2 ll (+ adv./prep.) (Name) to
move. with your body. close to the ground; to. move
slowly on your hands and. knees. 信 智 行 连 ; 爬 行
GRD crawl 3 ( (+ adv./prep.) to move or develop very
slowly 4 # H D6 Frit ANE t.. WMH DL:. Her
arms crept around his neck. ti 8.97 t W xs 8 ( 7 10

"NEF... o A slight feeling of suspicion crept over me. %%
War ts T -# Ned, -A Il (+ adv./prep.) (of plants tlt)



In summary, the result is not that good.
The abc are recognized well, but the chinese are wrong.
And tesseract-ocr engine can't read any phonetic symbol. 

I will try to traning tesseract-ocr to recognize the phonetic symbol later.


Keep learning.



 

 




 

Comments

Popular posts from this blog

How to fix error : no module named sendgrid when try to use sendgrid python lib in PHP.

react-native run-android : sun.security.provider.cert path.SunCertPathBuilderException : unable to find valid certification path to req uested target

react-native run-android : do not build/update modified code(App.js)