parsing

Display pdf document in an more pleasant way

I had an pdf with about 1000 Sites consisting only of tables. To navigate and find something in this Document wasn’t that easy. This leads me to develop a web application that should represent the data in a more pleasant way. First I had to analyze the PDF Document, there are several different pdf versions and standards, the second thing that have to be checks is the structure of the document. For example the document could consists of only pictures, which would make the task harder, since there is no text. In my case the pdf structure was straight forward and consists of text boxes. The parser was based on the the PDFClown library which is written in c#. After the parsers was written and able to parse the document, the data was stored in an SQL database. For the basis of the web frontend Umbraco was used. It’s a cms that is based on asp.net.
Subscribe to parsing