Parsing hyphenated words in PDF

In developing of a program for parsing elements of PDF files my task was to develop a feature of “dehyphenation” - a function that merges hyphenated words. The naive approach is to search for hyphens on very last place of each page line and match it with the first word in the next line containing text. This approach would not only exclude the case when the hyphen appears as a last char of a page, but also the case of column page layout, where the matching candidates can be above the text object.

Taggings:

pdf java algorithm

Submitted by Stefan Puhalo on Sat, 11/17/2018 - 22:20

Complexity

Relevance

1 answer

I honestly did not understand your solution. Aren't their any 3rd party libraries which can help in this regards?

Submitted by Michael Zronek on Fri, 12/07/2018 - 16:24

Taggings:

algorithm

Comments

I also have issues understanding your solution. Maybe you should make it more clear.

Daria Piacun - Mon, 12/10/2018 - 09:29 :::

Main menu

Navigation

Tags in Web Engineering

Tags in Social Tags

Parsing hyphenated words in PDF

Taggings:

Taggings:

Comments

Main menu

You are here

Navigation

Tags in Web Engineering

Tags in Social Tags

Parsing hyphenated words in PDF

Taggings:

Taggings:

Comments