Reading webpages in Java applications

Many web application fall into the category of "mash ups", meaning that they collect information from different sources (often other web applications) and combine it in a single, coherent user interface that offers additional value to the user. Often this additional value is enough to justify the mash up as a service of its own. In order write such an application, one must first be able to access the web pages in question in the Java code of the own application.
2 answers

Accesing webpages with Java JTidy

The Java library "JTidy" offers capabilities to read webpages from URLs.

First, create a new Tidy-Object and an InputStream in. Call the method parseDOM() to get a new DOM-Object of the webpage.

Tidy tidy = new Tidy();
String urlString = "URL of the desired webpage";
URL url = new URL(urlString);
URLConnection uc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
org.w3c.dom.Document node = tidy.parseDOM( in, null );

The API documentation is available online, see http://jtidy.sourceforge.net/apidocs/index.html

Taggings:

Accesing webpages with Java

Java offers capabilities to read webpages from URLs.

First, store the URL of the webpage in a String. Instantiate a new instance of the built-in Java class URL and pass the string as parameter in the constructor. Call the method openConnection() to get a new URLConnection object. Use that object to create a new BufferedReader and read the webpage line by line.


String urlString = "URL of the desired webpage";
URL url = new URL(urlString);
URLConnection uc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String inputLine;
String htmlText = "";
while ((inputLine = in.readLine()) != null) {
htmlText = htmlText + inputLine;
}
in.close();

Taggings: