Saturday, November 11, 2006

java.net.URL.equals and hashCode make (blocking) Internet connections....

Sometimes simple calls have unexpected side effects. I wanted to update some plugins, but the update manager was hanging my UI. Looking at the stack trace reveals:

at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(Unknown Source)
at java.net.InetAddress.getAddressFromNameService(Unknown Source)
at java.net.InetAddress.getAllByName0(Unknown Source)
at java.net.InetAddress.getAllByName0(Unknown Source)
at java.net.InetAddress.getAllByName(Unknown Source)
at java.net.InetAddress.getByName(Unknown Source)
at java.net.URLStreamHandler.getHostAddress(Unknown Source)
- locked <0x15ce1280> (a sun.net.www.protocol.http.Handler)
at java.net.URLStreamHandler.hashCode(Unknown Source)
at java.net.URL.hashCode(Unknown Source)
- locked <0x1a3100d0> (a java.net.URL)


Hmm, I must say that it is very dangerous that java.net.URL.hashCode (and URL.equals) makes an Internet connection. java.net.URL has the worst equals/hasCode implementation I have ever seen: equality depends on the state of the Internet. Well in the javadoc of URL.equals it says: "Since hosts comparison requires name resolution, this operation is a blocking operation.", but who reads the documentation of equals? There is a general contract around equals. Joshua Bloch writes in Effective Java: "Don't write an equals that relies on unreliable resources" (Chapter 3, page 34). Hey Sun, as far as I know, the Internet is not reliable ;-)

Do not put java.net.URL into collections unless you can live with the fact that comparing makes calls to the Internet. Use java.net.URI instead.

URL is an aggressive beast that can slow down and hang your application by making unexpected network traffic.....


I wonder if other people find this behaviour as shocking as I do....

25 comments:

  1. This is also the reason why the update manager hangs Eclipse if you try to find updates in a network with some aggressive proxy, that will not answer your DNS requests if you don't give user/password to the proxy.
    Happens all the time for me at work with some Microsoft proxy software and hangs Eclipse for half an hour.

    Ciao, Michael.

    ReplyDelete
  2. Actually, both the URL and URI classes in Java are pretty diabolical, but the URL equality takes the biscuit.

    You've also got to be careful of using 'instanceof' in equals (i.e., don't use it) since it's non-reflexive and thus can break transitivity. Bloch gets that wrong in Effective Java, but at least Eclipse generates the right implementation.

    ReplyDelete
  3. I'm not all that shocked, but its certainly an important tid-bit to know for expert programmers. I'm going to search our WTP code, I bet we do this a lot. Plus ... I hope you opened a bug on UpdateManager ... with a patch? :)

    ReplyDelete
  4. WTP and the internet:
    There's a huge problem in WTP, where some XML handling code looksup some DTD (or something). Nothing sinister... except it does it on the EDT... which is evil in itself. Even more so, if you happen to be offline - in this case the whole GUI will stop working until the timeout dropkicks the code to continue. I only happened to have the Eclipse/WTP under the debugger, where a quick suspend showed that the GUI thread was blocking in a DNS lookup method...
    (Yes, I believe there is a bug for that in the Bugzilla already).

    ReplyDelete
  5. java.net.URL is legacy and dates back to JDK1.0. Its spec clearly documents that both equals and hashCode can block. The newer java.net.URI is well designed and written (I don't know what alblue is taking about).

    ReplyDelete
  6. It's great to find this! I've spent the last day trying to figure out why eclipse udpate is so slow on our corporate network and had just come to the same conclusion. It's good to know we're not the only ones affected by it. Our corporate DNS only resolves corporate server names, and we have the MS proxy the first poster mentioned. I'll try to get involved in the eclipse bug report you started.

    ReplyDelete
  7. We just added a check for this to FindBugs. In Eclipse 3.2.1, we found 29 places where hashCode or equals is called on a URL, and 6 places where a Map or Set of URLs is used.

    Contact me for the full details. I didn't feel that listing all of them on the blog was particularly useful.

    ReplyDelete
  8. I posted a bug a year ago at https://bugs.eclipse.org/bugs/show_bug.cgi?id=121201.

    Btw: I stumbled over a similar behaviour in the constructor of InetAddress.getByName(), when provided a literal IP address. This seems to be a bug, even if Sun refused to accept ist as one.

    ReplyDelete
  9. java.net.URI has some problems too.

    The multi String constructor seems to decide whether to escape arguments by looking at it's inputs.

    which means that for a simple uri like
    http://foo.com/?qmark=%3f&ampersand=%26
    neither
    uri.equals(
    new URI(uri.getScheme(),
    uri.getRawAuthority(),
    uri.getRawPath(),
    uri.getRawQuery(),
    uri.getRawFragment())
    );
    nor
    uri.equals(
    new URI(uri.getScheme(),
    uri.getAuthority(),
    uri.getPath(),
    uri.getQuery(),
    uri.getFragment())
    );
    are true since the former yields
    http://domain.com/?qmark=%253f&ampersand=%2526
    and the latter yields the malformed
    http://domain.com/?qmark=?&ampersand=&

    The latter also demonstrates that URI::getQuery is quite useless.

    cheers,
    mike

    ReplyDelete
  10. Sorry, my last post got mangled:

    java.net.URI has some problems too.

    The multi String constructor seems to decide whether to escape arguments by looking at it's inputs.

    which means that for a simple uri like
      http://foo.com/?qmark=%3f&ersand=%26
    neither
      uri.equals(
        new URI(uri.getScheme(),
                uri.getRawAuthority(),
                uri.getRawPath(),
                uri.getRawQuery(),
                uri.getRawFragment())
        );
    nor
      uri.equals(
        new URI(uri.getScheme(),
                uri.getAuthority(),
                uri.getPath(),
                uri.getQuery(),
                uri.getFragment())
        );
    are true since the former yields
        http://domain.com/?qmark=%253f&ersand=%2526
    and the latter yields the malformed
        http://domain.com/?qmark=?&ersand=&

    cheers,
    mike

    ReplyDelete
  11. "Do not put java.net.URL into collections unless you can live with the fact that comparing makes calls to the Internet."

    ... and you can also live with different hostnames on the same IP being considered "equal" -- when the vast majority of the web doesn't work that way. Why would anyone ever want this behavior? WTF was sun smoking?

    ReplyDelete
  12. Hey guys, there's another English person about, :)
    I'm a new on michaelscharf.blogspot.com
    looking forward to speaking to you guys soon

    ReplyDelete
  13. These comments have been invaluable to me as is this whole site. I thank you for your comment.

    ReplyDelete
  14. Thanks to the various posters here who mentioned Eclipse. I had a strange problem where I'd just added "proxy" to the proxy page in Eclipse, and this seemed to allow a connection to the main Eclipse site, then hung when contacting the mirror. I switched this to "username:password@proxy" and it started to work fine - thanks! :)

    ReplyDelete
  15. Not only do you have the issue of equals returning true for ”http://foo.example.com” and ”http://www.example.com” when it should return false, the opposite also occurs. If you are doing load balancing with a round robin DNS, there is a good chance that “http://www.mysite.com” and “http://www.mysite.com” return false when they should return true.

    ReplyDelete
  16. java.net.URI are nice but must be used carefully. Their single argument constructor and multi-argument constructor encode the URL components in different ways which is quite confusing.

    ReplyDelete
  17. See how java.net.URI can be confusing too:

    http://blog.limewire.org/?p=261

    ReplyDelete
  18. Nice article but I believe its important to understand the consequences of not following this contract as well and for that its important to understand application of hashcode in collection classes e.g. How HashMap works in Java and how hashcode() of key is used to insert and retrieve object from hashMap.

    Javin

    ReplyDelete
  19. yes I agree that it is the reason for going hang.

    ReplyDelete
  20. "Do not put java.net.URL into collections unless you can live with the fact that comparing makes calls to the Internet."

    unbelievable side effect :-(

    CU
    Boeffi - http://boeffi.net

    ReplyDelete
  21. The lesson is that you mess with hashcode and get yourself messed.

    ReplyDelete
  22. Can't believe this is a Blocker Rule in Sonar (via Findbugs).
    Title should be
    "java.net.URL.equals and hashCode make (blocking) Internet connections _once_ per Instance..."

    So this a _not_ a problem for long lived (unchanged) URL instances.
    The result of the dns resolution is cached in URL.hostAddress.

    Behavior of equals() is documented correctly since 1.4 (http://docs.oracle.com/javase/1.4.2/docs/api/java/net/URL.html#equals%28java.lang.Object%29): "Two hosts are considered equivalent if both host names can be resolved into the same IP addresses"

    ReplyDelete
  23. Have you also check out
    muhammadkhojaye.blogspot.com/2010/02/java-hashing.html‎

    ReplyDelete
  24. Great articles and great layout. Your blog post deserves all of the positive feedback it’s been getting.
    algorithm代写

    ReplyDelete