-c preserve_interword_spaces=1
to remove unnecessary spaces)for %i in (*.tif) do tesseract %i %~ni --oem 1 --psm 6 -c preserve_interword_spaces=1 -l jpn lstm.train
dir /b *.lstmf > listfile.txt
extract this from original model
combine_tessdata -e jpn.traineddata jpn.lstm
maybe dont need to do that and just use the same jpn.traineddata instead of the .lstm?
lstmtraining --model_output output --continue_from jpn.lstm --traineddata jpn.traineddata --max_iterations 500 --train_listfile listfile.txt
resume from generated checkpoint
lstmtraining --model_output output --continue_from output_checkpoint --traineddata jpn.traineddata --max_iterations 400 --train_listfile listfile.txt
lstmeval --model output_model.checkpoint --eval_listfile listfile.txt
--stop_training
on step 4lstmtraining --model_output output --continue_from output_checkpoint --traineddata jpn.traineddata --stop_training --max_iterations 400 --train_listfile listfile.txt
output
file to <name>.traineddata
, put that in tessdata then in the command specify -l <name>
)tesseract <image> output -l <traineddata-name>