NY Times Community API only returns 25 comments per request

The Community API from The New York Times allows to retrieve user comments from articles. I want to collect all comments from a given date and analyze the content, by looking at the most common words and display them in a tag cloud. Now the API allows me to search comments by date, but the problem is that it only returns 25 comments. The problem is that on some days roughly 5000 comments are submitted and the API doesn't give me the option to fetch them all at once. The only way to get more comments is by using the parameter "offset". This parameter allows me to set the starting point of the result set and is a multiple of 25, e.g. offset=25 displays me comments 26-50. Fetching all comments in this way needs too long, because each request needs ~1.5 seconds. I am using a web interface to present the results and collect the data on a server using java.
1 answer

Make multiple requests by using asynchronous threads

It is possible to make multiple requests by using multithreading.

For managing concurrency you can use ExecutorService

http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/Execut...

Example:

ExecutorService executor = Executors.newFixedThreadPool(size);
Future response1 = executor.submit(new GetComments(offset));
Future response2 = executor.submit(new GetComments(offset));
.....

class GetComments implements Callable {

private String url;

public ParralelCommentsRequest(int offset) {
this.url = ....
}

@Override
public InputStream call() throws Exception {
return new URL(url).openStream();
}
}

The method submit extends base method Executor.execute(java.lang.Runnable) by creating and returning a Future that can be used to cancel execution and/or wait for completion.

The class GetComments implements Callable which is similar to Runnable, but it can return a result.

Taggings: