https://github.com/tesseract-ocr/tesseract/wiki/Compiling
https://github.com/tesseract-ocr/tesseract/wiki/ViewerDebugging
The following components are required to run the viewer:
- Java runtime
- piccolo2d-core-3.0.jar
- piccolo2d-extras-3.0.jar
ScrollView.jar
, built from the source in tesseract/java
sudo apt-get install default-jre
sudo apt-get install default-jdk
Download piccolo2d-core-3.0.jar, and piccolo2d-extras-3.0.jar to tesseract/java to build ScrollView.jar.
commands to make ScrollView.jar:
cd java
make ScrollView.jar
!!!Important!!!
export SCROLLVIEW_PATH=~/tesseract/java
Otherwise you will get errors like:
ScrollView: Waiting for server...
Error: Unable to access jarfile ./ScrollView.jar
Add below blue codes to api/tesseractmain.cpp to dump some internal results
...
api.SetVariable("tessedit_dump_pageseg_images", "true"); //show no lines and no image picture
api.SetVariable("textord_show_blobs", "true"); //show blobs result
api.SetVariable("textord_show_boxes", "true"); //show blobs' bounding boxes
api.SetVariable("textord_tabfind_show_blocks", "true"); //show candidate tab-stops and tab vectors
api.SetVariable("textord_tabfind_show_reject_blobs", "true"); //show rejected blobs
api.SetVariable("textord_tabfind_show_initial_partitions", "true"); //show initial partitions
api.SetVariable("textord_tabfind_show_partitions", "1"); //show final partitions
api.SetVariable("textord_tabfind_show_initialtabs", "true"); //show initial tab-stops
api.SetVariable("textord_tabfind_show_finaltabs", "true"); //show final tab vectors
api.SetVariable("textord_tabfind_show_images", "true"); //show image blobs
if (!renderers.empty()) {
if (banner) PrintBanner();
bool succeed = api.ProcessPages(image, NULL, 0, renderers[0]);
...
Then make tesseract again.
Ok. let's run tesseract now.
tesseract ~/rotate.png ~/out -l chi_sim+eng
Info in bmfCreate: Generating pixa of bitmap fonts from string
Info in bmfCreate: Generating pixa of bitmap fonts from string
Tesseract Open Source OCR Engine v4.00.00alpha-337-g7c27088 with Leptonica
Starting sh -c "trap 'kill %1' 0 1 2 ; java -Xms1024m -Xmx2048m -jar /home/lenger/tesseract/java/ScrollView.jar & wait"
ScrollView: Waiting for server...
Socket started on port 8461
Client connected
The image to handle is:
The intermidiate results showed by steps:
And the final output is:
E Creeks a member of a
Native American people, many of whom now live in
the US state of Oklahoma 3% M % A. ( 36 435, men
Rerum fiche Bri )
noun 1 (8r6) a narrow area of water where
the sea flows into the land AW; /\ W GZID inlet
2 (WAmE, Australé, NZB) a small iver or stream VM;
mig
IED up the "creek (without a 'paddle) (nforma) in a
difficult or bad situation # F (M ( WRM ) : I was
really up the creek without my car. 18 T RLB M F 2B
无 方 俪
creel /kril/ noun a masket for holding fish that have
Just been caught. Ct dB ) fae
creeplkrtpl verb, noun
wverb (crept. crept /kropt/) REG in the phrasal verb
ereep sh out, creeped is used for the past tense and
past participle. fla ll shid creep sb out "b, creep Bit
表 式 和 过 去 分 词 地 为 ceeped。 1 [1 (+ (of
people or animals A # #1 #) to move slowly, quietly
and carefully, because you do not want to be seen or
heard #i X0 F lts " erept up the
stairs, trying not to wake my parents. J; T RRR EM
©. E TMB. 2 ll (+ adv./prep.) (Name) to
move. with your body. close to the ground; to. move
slowly on your hands and. knees. 信 智 行 连 ; 爬 行
GRD crawl 3 ( (+ adv./prep.) to move or develop very
slowly 4 # H D6 Frit ANE t.. WMH DL:. Her
arms crept around his neck. ti 8.97 t W xs 8 ( 7 10
"NEF... o A slight feeling of suspicion crept over me. %%
War ts T -# Ned, -A Il (+ adv./prep.) (of plants tlt)
芸
In summary, the result is not that good.
The abc are recognized well, but the chinese are wrong.
And tesseract-ocr engine can't read any phonetic symbol.
I will try to traning tesseract-ocr to recognize the phonetic symbol later.
Keep learning.
No comments:
Post a Comment