Friday, October 10, 2008

ConcurrentModificationException: Why do Java collections not have robust iterators?

Ever got a ConcurrentModificationException? It just hit me. How does this happen? It happens when one method iterates over a collection while another method (that bis recursively called from the for loop) modifies the collection.

Note: ConcurrentModificationException has nothing to do with threading! (Well, this is a bit too strong statement, as Rafael points point out in a comment: it might occur due to a threading race condition (but in such cases you probably have other problems as well)). A ConcurrentModificationException may have nothing to do with threading!. Here I am talking about about the non threading related case.

Here's a simple example to show the problem. We have a class that can register new listeners and when fireChange is called it calls changed on the listeners. So the test listener here removes itself on the change call and boom, we get the ConcurrentModificationException:

public class IteratorTest {
final Collection fListeners;
public IteratorTest(Collection listeners) {
fListeners = listeners;
}
static class Listener {
public void changed(IteratorTest subject) {
subject.removeListener(this);
}
}
public void addListener(Listener listener) {
fListeners.add(listener);
}
public void removeListener(Listener listener) {
fListeners.remove(listener);
}
public void fireChange() {
for (Listener listener : fListeners) {
listener.changed(this);
}
}
static void test(Collection coll) {
IteratorTest t = new IteratorTest(coll);
t.addListener(new Listener());
t.fireChange();
}
public static void main(String[] args) {
test(new ArrayList());
}
}


The problem can happen if you call out to "other code" (code someone else has written) and "other code" can change the collection while you are iterating. One solution is to iterate over a copy of the collection:

public void fireChange() {
Listener[] listeners=(Listener[]) fListeners.toArray();
for (Listener listener : listeners) {
listener.changed(this);
}
}

That helps. But there is a lot of code out there that iterates over a collection and calls "other code" and there is always a chance that the "other code" calls back to modify your collection and you get a ConcurrentModificationException....

The good news is: Unlike threading race conditions it happens deterministically. The bad news is: if you are the client it is often not easy to find a way out.

15 years ago, ET++ (the cool framework created by Erich Gamma and Andree Weinand in the 80ies) suffered form missing robust iterators. "Robust iterators" means robust to changes of the underlying collection. At that time, making a copy of a collection seemed an unacceptable overhead. So, Thomas Kofler added robust iterators to ET++ (the PDF has the pages on reverse order -- here is a readable version). The implementations are efficient and robust.

Robust iterators are so fundamental, I am really surprised that Java does not have them....

.

The bridge between Interaction Design (IxD) and Domain Driven Design (DDD)

Some time ago I read Alan Cooper's book on About Faces 3. I am currently reading his book "The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity". He makes a strong point that interaction design has to be based on research and that is has to be made by interaction designers and not engineers. Engineers think too technical and therefore will create too complicated solutions. Engineers don't know how "normal users" think. And he is right! But I think, it's not enough to have some "interaction designers" designing the interactions and then let the programmers write the code. That might be important. But it is equally important if not more important that the the developers deeply understands the problems domain and goals the users have.

There is a great one hour talk I really enjoyed: Martin Fowler and Dan North Point Out a Yawning Crevasse of Doom (I had to look up the words Yawning, Crevasse and Doom -- those British guys speak hard to understand English ;-)). Their point is that there has to be a bridge between developers and users in order to communicate as apposed to a ferry, where information is transported from one side to the other by someone like the interaction designer, marketing person or analyst.

The solution requires what Eric Evens describes in his book as Domain Driven Design (DDD) (BTW: Domain Driven Design Quickly is a 100 page book available online describing the essentials). One of the central ideas is to have a dialog between the developers and users to come up with a "ubiquitous language" to describe the "core domain". That is: use the same language when talking to the users as when talking about the code. Create classes and methods that model the domain using the same terminology used by the users when they talk about the problem domain.

To get better software we have apply the techniques of interaction design and we have to have a dialog with our users to be able to create models that match their mind sets.

Yes, I am guilty myself of not doing this. So, this post is a reminder for myself....

P.S.: After writing this post, I did some search on ("DDD and IxD") and I figured that about a year ago I started a discussion on this topic on the IxD mailing list. I wish I would have a better memory....

Friday, August 01, 2008

Emfatic downlowad and update site available...

I like Emfatic a lot. It is a very cool textual representation of EMF ecore files. The Emfatic has a nice text editor for .emf files. Much easier to edit than the graphical ecore editor. It was originally created by Chris Daly and made available at IBM alphaworks under a restrictive license. Some time ago it became an eclipse project.

Unfortunately, I could not find a download or update site for the open source emfatic (which has quite some enhancements over the alphaworks version). Only some build instruction. I build it following the "instructions" (which is a simple screen-shot of the projects in cvs, enough for "experts"). Then I realized it also needs org.antlr-2.7.7, which cannot be hosted at eclipse.org because of IP (copyright) issues. I could not find the org.antlr-2.7.7 plugin. I had to create it.

I am sure I am not the only one who simply wants to download emfatic. So, I decided to make my build available.

Here is my emfatic download site it has a zip with emfatic and antlr file that can be dropped into an p2 drop-in folder or used as extension location (it already contains the .eclipseextension file).

And for those who like to use the update manager, I created an update site with emfatic and antlr: http://scharf.gr/eclipse/emfatic/update/

Note the emfatic plugins it require Java 1.5!

Saturday, June 07, 2008

Looking for a JavaVM with the weakest possible memory model

The Java Memory Model gives some minimal guarantees about what happens if two threads are accessing the same variables. Brian Goetz describes this in chapter 7 of Java Concurrency in Practice. And here is a good summary of the new (1.5) Memory Model. Doug Lea describes the old memory model. If you really want to dig into it read some more formal specs.

The essence of the java memory models is: If one thread modifies a variable another thread may not see the change unless "some synchronization" happens (a synchronized block, a shared lock etc). But most real java VMs implement a much stricter memory model, that means if one thread modifies a variable the other thread sees it even without synchronization. And that is the problem. It is almost impossible to find those threading problems without a java VM that implements only the minimal memory model guarantees.

My favorite "theoretical threading bug" is NullProgressMonitor.cancelled should be volatile. If a java VM would implement only the minimal memory model the code would not work, but as John points out: "This is true in theory, but never happens in practice. In practice, the thread calling isCanceled may obtain a stale result for a short period of time, but the thread cache is soon synchronized. It is common practice to omit synchronization in cases where obtaining a stale value is acceptable.". This is unfortunately true. At least I could not construct an example where one thread would not see the changes made by another thread.

Getting threading right is extremely hard. Deadlocks often occur only if you have a bad day. You can get away with obviously wrong code just because Java VMs are so gracious.

I wonder if there is java VM that implements only the minimal memory model? It would be cool if it would be possible to force the VM to behave like the worst possible memory model. For example: it would not make changes visible to other threads unless the data is synchronized. This would be extremely helpful for testing and debugging purposes. I wonder how well eclipse would behave on such a "minimal memory model VM"....

Thursday, April 17, 2008

Is OSGi the enemy of JUnit tests?

I wrote a set of JUnit(3.8.1) tests for the terminal. Originally those were normal unit-tests. To be able to test non public methods and non public classes, I put the tests into the same package but into a separate test plugin. I also added the test plugin as friend plugin of the packages I want to test. This works fine if I run the tests as normal JUnit tests. But if I run the same tests as JUnit Plug-in Tests I get java.lang.IllegalAccessError. The reason is simple: each OSGi bundle runs it's own class loader and therefore the classes appear not to be in the same package.

There are different solutions:

  • Make all methods and classes you want to test public (really bad idea)
  • Put the unit tests into the same plugin as your code and make the dependency to JUnit optional (not a good separation of concerns).
  • Only test public classes and methods (I think this is to restrictive and often to coarse grain)
  • Make your test plugin a fragment. One problem is that other plugins cannot access classes defined in fragments (as Patrick Paulin points out in a more detailed discussion about fragments in unit tests). Another problem is that plugin.xml in a fragment is ignored. And therefore you test plugin cannot contribute


I tried out turning a plugin into a fragment. It is as simple as adding adding a line following line to your MANIFEST.MF
Fragment-Host: org.eclipse.the.plugin.you.want.to.test
and removing the plugin you want to test from the required plugins. As long as your test plugin is not part of a bigger test case and it does not need to contribute extensions Patrick Paulin describes a solution for that using reflection, fragments are a good solution.

A good way to avoid having to use extensions (plugin.xml) in your test plugin is to use dependency injection for your classes.

But I think there should be a better way to write a test plugin that can access non public classes and members. I understand why the security concept of OSGi introduces those problems, but I am still looking for a solution for my JUnit tests.

Any ideas?

Tuesday, March 18, 2008

Databinding Tutorial and Sample Projects

I promised I'd add the sample projects and the slides add our "Understanding JFace Data Binding" tutorial. I added them yesterday to the eclipscon web page, but it took a day to show up. I also added a zip file with the presentation and the sample projects here.

As Wayne suggested, the projects also show it the "hard way". The best way to follow the tutorial is to use "compare with each other" between the projects. The projects are numbered and the "nodatabinging" projects show it the "hard way". But I gave up at the end doing it the "hard way" because it simply was not trivial. I'd be happy if someone could do the master detail the hard way (without binding), so we could add it to the sample projects...

Monday, September 10, 2007

A month ago, my blog was hijacked by spammers for a few days

About a month ago (just before my vacation), several people notified me, that my blog has been hijacked by spammers. Wassim Melhem even created a bugzilla entry (which was a great idea!).

Spammers replaced my blog with their stuff including a new pink layout. If you want to see it: scharf.gr/hijacked_blog (I don't link it from here, but you can paste the link into your browser).

Google/blogger.com recognized that my blog was spam and they locked my account (I was still able log in but I could not add new blog entries). All my old bog entries were gone. I send some mail to Google and after a day or so they restored my old blog and apologized.

But how could they get into my blog? I don't think they were able to guess my username/password, because it was pretty safe. Else I guess, they would have changed my password. Looking at the spam blog, I just realized that the same content is there two times (looks like pasting it two times). I think, they just replaced the entire blog template with their crap. I see two possibilities how they could have done it:
  • They hacked into the blogger server and did it from inside. In this case other blogs would have been hijacked too.
  • They used some clever javascript that navigated to my blogger template site and they dumped their stuff. Because my blogger account is my google account and I was lazy logging out, a script could possibly have done that. Changing the template is much simpler than creating a new blog entry, because there is no "enter the text from the scrambled image" type of verification needed.

Do you have other theories how this could happen?

There is a interesting new type of spammer attack: they use pieces of real blogs in their spam blogs to make their spam blog appear "real". Marko Schulz has sent me a link, but that spam blog is fortunately gone....

Talking about spam: since a week I get much less e-mail spam (on some of my accounts) than I used to get (tens instead of hundreds per day). Maybe spammers started thinking about spam efficiency: to get most out of their spam bot nets they concentrate on Internet newbies. Therefore it pays off for them to eliminate e-mails that are in their lists for years. The probability that someone who is new to the Internet believes the spam is much higher than for experienced users. If they would mostly attack new e-mail addresses, they would also get out of focus of the experts who are fighting spam and therefore have a much higher success rate in delivering spam. In addition, if spam is sent in low volume, spam defense would probably miss new variations of spam. Or is there another explanation for this decease in spam?

Thursday, July 05, 2007

Eclipse is worse than any commercial IDE (in an ideal world)...

At his keynote at EclipseCon 2007, Robert Lefkowitz made the following provocative statement: "Eclipse is probably worse than any commercial IDE!". Why? Because it's free! In an ideal (capitalistic!?) world you don't pay for anything that you can get for free. Any tool that is worse than a free tool would just run out of business (if both tools would cover the same market). Well, and we have seen quite some commercial IDEs die, since eclipse is available.

One implication of this observation is, that when eclipse increases it's quality, competing commercial products have to increase their quality to survive. But wait! Instead of improving the quality of your commercial product, you could just prevent eclipse from become better. I know, this is very provocative. Fortunately, making (and saving) money is not the only force in our world. And fortunately, the way eclipse is organized, this will not happen:

Although, most of the work on eclipse is sponsored by companies that sell products based on eclipse, the work is done by individual committers. And the human factor is very important. Everybody can see what committers do. It's not anonymous hackers that do the work, because unlike wikipedia, anonymous users cannot change eclipse. Eclipse is a social community with a focus on individual responsible committers. This is very important for the survival of eclipse. That's why new committers have to be voted into protects after they have shown credibility. That's why new projects must have committers from multiple companies. Eclipse committers are humans with emotions. Nobody wants to be accused of being destructive.

It is an honor to be an eclipse committer.

Saturday, June 30, 2007

My favorite 3.3 feature: Quick Access

I just looked into Eclipse 3.3 - New and Noteworthy. Lots of cool new things! My favourite is Quick Access. This essentially allows you to type commands instead searching the command in the menus. If you have customized your perspective, to get rid of annoying tool-bar buttons, you still have access to commands that are not visible in any menu or tool-bar!

I have bound Quick Access to ESC-X (like in emacs) and it feels a bit like the emacs minibuffer.... Very cool!

BTW: I wish eclipse would allow links into the "News and Noteworthy" pages (see bug 194993), that's why I created my own copy of "News and Noteworthy"...

Monday, June 18, 2007

I don't like XML! But what are the alternatives?

I think XML is one of the worst formats for data. It is extremely ambiguous. There are so many ways to put a simple data structure into XML. For example:
<book 
title="The Return of the King"
author="J.R.R. Tolkien"/>
or
<book>
<title>The Return of the King</title>
<author>J.R.R. Tolkien</author>
</book>
And in the second case, can there be more than one author? And more titles?

XML is not self describing. Look at the XML below. It is very ambiguous (if you don't use one of the many schema descriptions (DTD, XML-Schema, XMI, DSD, ...)).
<data
x="null"
y="true"
z="42">
<a>NULL<a/>
<a>false<a/>
<b>TRUE<b/>
</data>
What is a String? What is Boolean? What is a number? Is x a list? You simply can't infer it from the XML. When you want to write the data, which fields are written as tags which are written as attributes?

There is also a huge problem with IDs and references. From a plain XML file there's no way to figure out what an id of an object is and what references are. A good example are plugin.xml files. They are full of IDs and references but it is so hard to know which string refers which other XML element. Control-click does not work. Why? Because references are difficult to resolve!

Are there any good alternatives?

JSON is much simpler, self-describing, less verbose and much better suited for data storage and exchange. But it has no notion of IDs and references. And it does not name object (record) types (it has only lists, maps, strings, numbers and boolean).

What else is out there? I want something like JSON + an ID/Reference model + named Records...