pdf java algorithm

Parsing hyphenated words in PDF

In developing of a program for parsing elements of PDF files my task was to develop a feature of “dehyphenation” - a function that merges hyphenated words. The naive approach is to search for hyphens on very last place of each page line and match it with the first word in the next line containing text. This approach would not only exclude the case when the hyphen appears as a last char of a page, but also the case of column page layout, where the matching candidates can be above the text object.

Taggings:

pdf java algorithm

Submitted by Stefan Puhalo on Sat, 11/17/2018 - 22:20

Main menu

Navigation

Tags in Web Engineering

Tags in Social Tags

pdf java algorithm

Parsing hyphenated words in PDF

Taggings:

Main menu

You are here

Navigation

Tags in Web Engineering

Tags in Social Tags

pdf java algorithm

Parsing hyphenated words in PDF

Taggings: