rome-fetcher results in "too many open files"

Description

I was running a Java batch program (using rome-feedfetcher) from the command
line (call it client), that sends about 10,000 terms to a a RSS producing
service (call it server A), which then proxies to another RSS producing service
(call it server B) for part of its data using rome-feedfetcher.

The client instantiates a single HttpClientFeedFetcher and calls server A with a
fetcher.retrieveFeed(URL). Server A is a Spring app with a single
HttpClientFeedFetcher set into its application context. Server A then sends a
request to server B with its own fetcher.retrieveFeed(URL).

Occasionally, the server A would return a 500 error to the client. The stack
trace indicates that too many sockets are open on server A (too many open
files). If I stop the client and restart few minutes later, it will keep going
and then stop again with a 500 error after a couple of 100 terms. Waiting longer
between restarts increases the number of terms processed.

Looking at the source, I see that HttpClientFeedFetcher instantiates an
HttpClient object for every invocation of retrieveFeed(URL). According to this
page. My theory is that each HttpClient grabs a socket which eventually gets
garbage collected, but my client is asking for sockets faster than the server A
can garbage collect them.

According to this page:
http://hc.apache.org/httpclient-3.x/performance.html

it is advisable to have a single HttpClient reused across multiple invocations.
The attached patch does this. In addition (because it has to work within a
server environment), I am using the MultiThreadedHttpConnectionManager (default
pool size 20), and have added a poolSize member with getPoolSize() and
setPoolSize().

Using the patched JAR allowed me to run through all the terms in one shot
without the need for a restart.

Environment

None

Status

Assignee

ROME Jira Lead

Reporter

sujitpal

Labels

None

Participants

None

Affects versions

current

Priority

Major