CLJ-DS is worth checking out

I like Clojure. I really like it. I think it’s the best language available on the JVM. Why would anyone want to use Clojure? One word: Concurrency. Clojure is largely designed to address the needs of dealing with concurrency without resorting to primitive constructs such as locks.

Unfortunately using Clojure on projects isn’t always a feasible option because many projects are locked into the existing Java paradigm technically, culturally and economically. I often end up writing concurrent code in Java so having good data structures that minimize locking is a must. I find that most code I inherit that uses a lot of “synchronized” can often be rewritten to drastically cut down on the number of locks thanks to java.util.concurrent and java.util.concurrent.atomic. The part I always missed was an immutable data structure that could be returned to the calling code. This could be achieved by returning defensive copies of every mutable data structure but there is a slicker way.

CLJ-DS is another solution to the problem. It’s a library of Clojure’s persistent data structures ported back to Java with nice Generic type signatures and convenience static methods.

Here’s a typical example of code I often inherit. All the business logic and business variable names have been removed.

    class FooManager {
        private Map> idToSet = Collections.synchronizedMap(new HashMap<>());
    
        public synchronized void addElement(UUID uuid, String element) {
            throwIfNull(uuid);
            throwIfNull(element);
    
            Set set = idToSet.get(uuid);
    
            if(set == null) {
                set = new HashSet<>();
                set.add(element);
                idToSet.put(uuid,set);
            } else {
                set.add(element);
            }
        }
    
        public Set getElements(UUID uuid) {
            Set results = idToSet.get(uuid);
            return Collections.synchronizedSet(results != null : results ? new HashSet<>());
        }
    }

Some obvious problems with this code?

  • Since getElements returns a mutable set, there is no guarantee that some code outside of FooManager won’t .clear() or mutate the returned set any further.

  • This code has subtle differences depending on uuid existing in idToSet. When results is null there might be an expectation of the empty set to be referenced by idToSet just as it is in the non-null case.

  • Once the calling code gets a handle on the synchronized results from getElements it’s not guaranteed that everything is safe since addElement uses a different lock to writes to the set in the non-null case.

There’s a better way using CLJ-DS and java’s concurrent package:

    class FooManager {
        private final AtomicReference>> idToSetState = 
    new AtomicReference<>(Persistents.hashMap());
    
        public PersistentSet addElement(UUID uuid, String element) {
            throwIfNull(uuid);
            throwIfNull(element);
            
            for(;;) {
    
                PersistentMap> state = idToSetState.get();
                
                PersistentSet oldSet = state.get(uuid);
                
                PersistentSet newSet = oldSet != null ? oldSet.plus(element) : Persistents.hashSet(element);
    
                if( idToSetState.compareAndSet(state, state.plus(uuid,newSet)) )  {
                    return newSet;
                }
            }
        }
    
        public PersistentSet getElements(UUID uuid) {
            PersistentSet results = idToSetState.get().get(uuid);
            return results != null ? results : Persistents.hashSet();
        }
    }

No subtle mutations of state and a lot fewer locks and by definition less lock contention. I consider the revised version a lot easier to reason about in no small part because of CLJ-DS library. PersistentMap and PersistentSet implement java.util.Map and java.util.Set respectively.