Saturday, November 11, 2006

java.net.URL.equals and hashCode make (blocking) Internet connections....

Sometimes simple calls have unexpected side effects. I wanted to update some plugins, but the update manager was hanging my UI. Looking at the stack trace reveals:

at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(Unknown Source)
at java.net.InetAddress.getAddressFromNameService(Unknown Source)
at java.net.InetAddress.getAllByName0(Unknown Source)
at java.net.InetAddress.getAllByName0(Unknown Source)
at java.net.InetAddress.getAllByName(Unknown Source)
at java.net.InetAddress.getByName(Unknown Source)
at java.net.URLStreamHandler.getHostAddress(Unknown Source)
- locked <0x15ce1280> (a sun.net.www.protocol.http.Handler)
at java.net.URLStreamHandler.hashCode(Unknown Source)
at java.net.URL.hashCode(Unknown Source)
- locked <0x1a3100d0> (a java.net.URL)


Hmm, I must say that it is very dangerous that java.net.URL.hashCode (and URL.equals) makes an Internet connection. java.net.URL has the worst equals/hasCode implementation I have ever seen: equality depends on the state of the Internet. Well in the javadoc of URL.equals it says: "Since hosts comparison requires name resolution, this operation is a blocking operation.", but who reads the documentation of equals? There is a general contract around equals. Joshua Bloch writes in Effective Java: "Don't write an equals that relies on unreliable resources" (Chapter 3, page 34). Hey Sun, as far as I know, the Internet is not reliable ;-)

Do not put java.net.URL into collections unless you can live with the fact that comparing makes calls to the Internet. Use java.net.URI instead.

URL is an aggressive beast that can slow down and hang your application by making unexpected network traffic.....


I wonder if other people find this behaviour as shocking as I do....

27 comments:

Bananeweizen said...

This is also the reason why the update manager hangs Eclipse if you try to find updates in a network with some aggressive proxy, that will not answer your DNS requests if you don't give user/password to the proxy.
Happens all the time for me at work with some Microsoft proxy software and hangs Eclipse for half an hour.

Ciao, Michael.

AlBlue said...

Actually, both the URL and URI classes in Java are pretty diabolical, but the URL equality takes the biscuit.

You've also got to be careful of using 'instanceof' in equals (i.e., don't use it) since it's non-reflexive and thus can break transitivity. Bloch gets that wrong in Effective Java, but at least Eclipse generates the right implementation.

David Williams said...

I'm not all that shocked, but its certainly an important tid-bit to know for expert programmers. I'm going to search our WTP code, I bet we do this a lot. Plus ... I hope you opened a bug on UpdateManager ... with a patch? :)

murphee said...

WTP and the internet:
There's a huge problem in WTP, where some XML handling code looksup some DTD (or something). Nothing sinister... except it does it on the EDT... which is evil in itself. Even more so, if you happen to be offline - in this case the whole GUI will stop working until the timeout dropkicks the code to continue. I only happened to have the Eclipse/WTP under the debugger, where a quick suspend showed that the GUI thread was blocking in a DNS lookup method...
(Yes, I believe there is a bug for that in the Bugzilla already).

Anonymous said...

java.net.URL is legacy and dates back to JDK1.0. Its spec clearly documents that both equals and hashCode can block. The newer java.net.URI is well designed and written (I don't know what alblue is taking about).

Greg Vaughn said...

It's great to find this! I've spent the last day trying to figure out why eclipse udpate is so slow on our corporate network and had just come to the same conclusion. It's good to know we're not the only ones affected by it. Our corporate DNS only resolves corporate server names, and we have the MS proxy the first poster mentioned. I'll try to get involved in the eclipse bug report you started.

Bill Pugh said...

We just added a check for this to FindBugs. In Eclipse 3.2.1, we found 29 places where hashCode or equals is called on a URL, and 6 places where a Map or Set of URLs is used.

Contact me for the full details. I didn't feel that listing all of them on the blog was particularly useful.

gernot eger said...

I posted a bug a year ago at https://bugs.eclipse.org/bugs/show_bug.cgi?id=121201.

Btw: I stumbled over a similar behaviour in the constructor of InetAddress.getByName(), when provided a literal IP address. This seems to be a bug, even if Sun refused to accept ist as one.

Mike Samuel said...

java.net.URI has some problems too.

The multi String constructor seems to decide whether to escape arguments by looking at it's inputs.

which means that for a simple uri like
http://foo.com/?qmark=%3f&ersand=%26
neither
uri.equals(
new URI(uri.getScheme(),
uri.getRawAuthority(),
uri.getRawPath(),
uri.getRawQuery(),
uri.getRawFragment())
);
nor
uri.equals(
new URI(uri.getScheme(),
uri.getAuthority(),
uri.getPath(),
uri.getQuery(),
uri.getFragment())
);
are true since the former yields
http://domain.com/?qmark=%253f&ersand=%2526
and the latter yields the malformed
http://domain.com/?qmark=?&ersand=&

The latter also demonstrates that URI::getQuery is quite useless.

cheers,
mike

Anonymous said...

Sorry, my last post got mangled:

java.net.URI has some problems too.

The multi String constructor seems to decide whether to escape arguments by looking at it's inputs.

which means that for a simple uri like
  http://foo.com/?qmark=%3f&ersand=%26
neither
  uri.equals(
    new URI(uri.getScheme(),
            uri.getRawAuthority(),
            uri.getRawPath(),
            uri.getRawQuery(),
            uri.getRawFragment())
    );
nor
  uri.equals(
    new URI(uri.getScheme(),
            uri.getAuthority(),
            uri.getPath(),
            uri.getQuery(),
            uri.getFragment())
    );
are true since the former yields
    http://domain.com/?qmark=%253f&ersand=%2526
and the latter yields the malformed
    http://domain.com/?qmark=?&ersand=&

cheers,
mike

Anonymous said...

"Do not put java.net.URL into collections unless you can live with the fact that comparing makes calls to the Internet."

... and you can also live with different hostnames on the same IP being considered "equal" -- when the vast majority of the web doesn't work that way. Why would anyone ever want this behavior? WTF was sun smoking?

Anonymous said...

Hey guys, there's another English person about, :)
I'm a new on michaelscharf.blogspot.com
looking forward to speaking to you guys soon

annerose said...

These comments have been invaluable to me as is this whole site. I thank you for your comment.

debt said...

super )

Jon said...

Thanks to the various posters here who mentioned Eclipse. I had a strange problem where I'd just added "proxy" to the proxy page in Eclipse, and this seemed to allow a connection to the main Eclipse site, then hung when contacting the mirror. I switched this to "username:password@proxy" and it started to work fine - thanks! :)

Munsey said...

Not only do you have the issue of equals returning true for ”http://foo.example.com” and ”http://www.example.com” when it should return false, the opposite also occurs. If you are doing load balancing with a round robin DNS, there is a good chance that “http://www.mysite.com” and “http://www.mysite.com” return false when they should return true.

Himanshu said...

java.net.URI are nice but must be used carefully. Their single argument constructor and multi-argument constructor encode the URL components in different ways which is quite confusing.

Himanshu said...

See how java.net.URI can be confusing too:

http://blog.limewire.org/?p=261

vfdvgf said...
This comment has been removed by a blog administrator.
Ayesha Sadiq said...

You have done a marvelous job! I am really inspired with your work.

Javin @ FIX Protocol Tutorial said...

Nice article but I believe its important to understand the consequences of not following this contract as well and for that its important to understand application of hashcode in collection classes e.g. How HashMap works in Java and how hashcode() of key is used to insert and retrieve object from hashMap.

Javin

Anderson said...

I just couldn’t leave your website before saying that I really enjoyed the quality information you offer to your visitors...

JP@ java heap memory said...

learned a lot on this site , thank you very much for putting effort and providing such a quality content.

Thanks
eclipse remote debugging tutorial

pranav said...

yes I agree that it is the reason for going hang.

order singulair online said...

I want have to know more and more, on your blog just interesting and useful information.

Boeffi said...

"Do not put java.net.URL into collections unless you can live with the fact that comparing makes calls to the Internet."

unbelievable side effect :-(

CU
Boeffi - http://boeffi.net

Java Experience said...

The lesson is that you mess with hashcode and get yourself messed.