Thursday, June 11, 2009

ConcurrentMultimap please

This blog seems to be a good place to tackle hard problems. It helps that you can modify and delete posts in case you're wrong. That's certainly preferable to making an ass of yourself on a mailing list.

I just signed Google Collections issue 135 and additionally shared one of my use-cases: session management.

Solving the problem of ConcurrentMultimap.remove(K,V) is a cat and mouse game of "where are the holes" in cascading visible CAS changes. If any code deserves to be formally proven, concurrency problems like this do. (tho I'm not equal to the challenge tonight)

UPDATE: I've found 2 additional data races since making this post. Will update when I've sussed out the conditions under which they can occur.

Because all of the mutating operations guarantee visibility the path to reach a genuine data race is quite twisted. At one point I thought CopyOnWriteArrayList's visibility saved me, but then reconsidered.

In add and remove methods I have indicated points where visibility to other threads is critical. Here's the code:


private final ConcurrentMap<GUID, Collection<SessionHandle>> sessions = ...; // mapmaker make me a map!

...

public boolean addSession(SessionHandle handle) {
Collection<SessionHandle> sessionsForGUID = sessions.get(handle.guid()); // A
if(sessionsForGUID != null) {
sessionsForGUID.add(handle); // B
return true;
}

// put if absent double check to avoid extra list creation
sessionsForGUID = new CopyOnWriteArrayList<SessionHandle>();
sessionsForGUID.add(handle);
Collection<SessionHandle> old = sessions.putIfAbsent(handle.guid(), sessionsForGUID); // C
if(old != null) old.add(handle); // D
return true;
}

public boolean removeSession(SessionHandle handle) {
// 0
Collection<SessionHandle> sessionsForGUID = sessions.get(handle.guid());
if(sessionsForGUID == null) return false;

// O(n) remove is the least of my worries
// 1
if( ! sessionsForGUID.remove(handle)) return false;

// 2
if( ! sessionsForGUID.isEmpty()) return true;

// double-check remove
// 3
if( ! sessions.remove(handle.guid(),sessionsForGUID)) return true; // already removed! (yikes)

// 4
if(sessionsForGUID.isEmpty()) return true;

// another entry was added!
// 5
Collection<SessionHandle> old = sessions.putIfAbsent(handle.guid(), sessionsForGUID);

if(old == null) return true;

// 6
old.addAll(sessionsForGUID);
return true;
}



What follows is the exchange of four adds and two removes which cause both methods to reach their very last return statement.

Given 3 threads: T1, T2, T3.
T1 calls add and reaches statement A and the list is not present so T1 does not reach statement B.
T2 calls add and reaches statement A, skips B as did T1 and reaches statement C then returns. The map is now populated with list alpha.
T2 calls add again and reaches statement A and B then returns.
T1 reaches statement C but sees list alpha present.
List alpha now has two members and the map's total size is 2 tho T1's element is still in-flight.
T2 calls remove and reaches statement 0.
T3 calls remove and reaches statements 0 thru 3 and removes list alpha from the map.
T2 reaches statements 1 thru 3 and returns since it sees list alpha was removed from the map.
T1 reaches statement D and adds its in-flight element to list alpha which T3 successfully removed with a CAS. T1 returns not knowing its element is on the orphaned list alpha.
T3 reaches statement 4 and sees that list alpha which it just removed is no longer empty!
T2 calls add again just to be difficult and completes the operation at statement C creating list beta on what appeared to be an empty map.
T3 reaches statement 5 and replaces list alpha on the map, but then sees list beta prevented the CAS put.
T3 reaches statement 6 and adds all elements in list alpha in list beta and finally returns.

In a table the seemingly improbable narrative looks as follows:
T1 T2 T3 size
0
A 0
A 0
C 1
ret 1
A 1
B 2
ret 2
C 2
0 2
0 2
1 1
2 1
3 1
1 0
2 0
3 0
ret 0
D 0 T1's add is on a removed list
ret 0
4 0
A 0
C 1
ret 1
5 1
6 2
ret


If you cling to the idea that this sequence of events is impossible, please check out the JavaOne 2009 session by Cliff Click and Brian Goetz - "This Is Not Your Father's Von Neumann Machine; How Modern Architecture Impacts Your Java™ Apps". The session content should be available a few weeks from now.

A ConcurrentMultimap could share its region locking with its constituent lists and prevent the catastrophic rollback you see here. Until then I will use the GCollections synchronized Multimap wrapper. The hit in performance is a small price to pay for confidence in the implementation.