From Bubbles to List
Where we learnt—interpretability matter more than transparency.
Role: Lead Design / Lead UX Research / Product
Collaborated with: Product / UX Research / Design / ML Research
Challenge of M&A Due Diligence
Due Diligence for Merger and Acquisition transactions is like looking for a needle in haystacks. Among the hundreds to thousands contracts to review, lawyers need to find those that are problematic and put the deal at risk.
Traditionally, the process of delegating work and prioritization are sometimes done without much rhyme or reason, especially when the documents don’t come in organized. To make things worse, there is usually time and resource constraints.
A Genius Idea Was Born
I lied. It was not that genius (as we soon would learn). And it was not my idea.
Our machine learning researcher at the time, being research scientist, thought document clustering using doc-to-vec may be a good way to group documents by their similarity, and by definition, it will weed out the “outliers” — the anomalies, the risky things, since they were so different than the rest of their kinds.
All we needed then, was a way to visualize the clusters, and that’s the end of all the legal problems lawyers would face.
Bubbles
There really wasn’t so much to say about “design” for this initial prototype of document clustering, other than we highlighted the “outlier” group to be in red, signifying danger, as we assumed that outliers are more likely to be risky documents.
We also assumed that reviewing similar documents together would make document review faster.
Needless to say, the fact that we presented this visualization also meant we assumed an abstract visualization like this was desirable.
The first living prototype that groups documents based on their similarity. Red bubble denotes risky outliers.
The first living prototype that groups documents based on their similarity. Red bubble denotes risky outliers.
After presenting this prototype to a few interested users, hoping that they would be thrilled that we’ve solved their biggest problem, they all appeared to be very confused by this feature.
Undoubtedly machine learning (and therefore unsupervised learning) was still not well-known in the legal world (and probably still isn’t today, but at least it was less of a hype in 2015), which could contribute to the confusion. However, people were very puzzled by how the documents within a group were similar.
We decided to get to the bottom of things.
For the following months, we went through 3 major design iterations, tested 10+ prototypes with 27 unique users, which ultimately led us to a completely different design solution than where we started with.
Research Questions
Throughout the research and design of document clustering, we formed the following research questions:
What is the most appropriate representation for document level clustering to help users make sense of the collection, while achieving their goals in the due diligence context?
What does it mean for a user to “trust” the clustering algorithm? Should we focus on transparency of information, or a representation that meets their expectations?
What are our users’ mental models when it comes to “similar documents”? How do we design an experience to match their mental models?
How would users’ expectations of the clustering results and their inclination to negotiate with the clusters be influenced by their particular roles in a due diligence project?
First Round of Testing
For the first round of testing, we explored different visualizations to understand what is more helpful and appropriate for lawyers to make sense of document collections.
We hypothesized that:
Dissimilar documents would be prioritized for review, as we assumed they were anomalous documents (i.e., “outliers”).
Abstract visualizations are desirable.
Similar documents will be assigned to the same users for faster review.
What We Learnt
The learnings came out of this round of testing are:
Abstract visualization is not desired. This is because lawyers—who are more prone to think logically and rationally—are more comfortable with structured and hierarchical visualizations. Additionally, the main goal at this point is not to “explore” the collection but to get to the relevant information faster.
Users won’t intuitively trust a “black box” algorithm, and they would want some kind of explanation (what they call “transparency”).
What our users mean by “outliers” are completely different than what “outliers” mean as the technical term. “Outlier” to our users means something that doesn’t belong, which most likely are irrelevant for review purposes.
This feature may be best for project management purposes where the amount of work can be estimated, unnecessary work can be eliminated (e.g., revealing exact or near duplicate documents to be deleted), and similar documents can be assigned to the same reviewer to be more efficient.
“The Coca-Cola bubbles are more trouble than its worth, I like percentage, number and name but no bubbles.”
“This format isn’t helpful since circles are misleading. Bars are more simple and straightforward.”
“I don’t know what the computer is looking at, I’m hesitant to trust it...I don’t know how it’s grouping.”
“There’s gotta be a punchy way of explaining why something is an outlier.”
Some earlier design prototypes that we tested given what we’ve learnt from the first round of testing. The design is still largely “visual” as we sought to use more structured and hierarchical visualizations.
Some more prototypes being tested — this time we focused on visualizing groups of documents as decks of cards with more metadata about the documents to help users make sense of why certain documents are grouped together.
Refining the Design
One major changes we made for this phase of design, apart from abandoning any abstraction in visualization, is that we adopted the jargons from the users in terms of how they described the ways how documents can be similar, such as “Near Duplicates” (describing documents that may be differed by a couple or words, or the same documents with different versions), “Based on a form” (describing documents that may be drafted based on the same template), etc.
Technically, we added superficial thresholds to separate the categories of similarities. Still, at this stage we assumed accuracy in the grouping is essential in establishing trust and likelihood of using this feature. Similarly, users are unlikely to want to correct the algorithm by moving documents from one group to the other.
We kept the metadata of the documents (e.g., titles, parties, dates) as a sense-making guide.
What We Learnt
The doubt in why documents are grouped together—in other words, trust—was not an issue during this round of testing. As both the representation, labeling, and metadata helped users make sense of the algorithm.
The tasks to be performed for each category are likely to be different, which further reinforced the idea that this feature is suitable for project management related tasks.
As such, being able to quickly toggle to a document list view from a given cluster is crucial, which was not supported by this visual representation.
Final (-ish) Design
Based on our findings in this phase, we moved forward with three major changes. The first was to merge the “Near Duplicates” and “Based on a Form” categories into a new “Similar Structure and Content” category. The second was to provide a list view to allow users to view-at-a-glance all documents and document metadata for a cluster, which had the intended goal of reducing sensemaking cost. Finally, we added the ability to move documents between groups, create groups, and delete groups (aka “negotiation” with the algorithm).
What We Learnt
Surprisingly, users see the ability to negotiate with the algorithm when it’s not perfectly accurate to be a positive. This may be due to the fact that clusters can act as “starting points” for assigning reviewers. Thus, starting with an “almost correct” cluster and negotiating may result in a net gain as they are able to assign documents to users in a more efficient manner. It may also mean that once trust is established, people are more likely to negotiate. This is probably not too difficult to understand, as we are more likely to tolerate a person’s mistakes when trust is there.
We noticed a trend to want to assign users to documents that are not necessarily similar but are related (e.g., amendments, signed versions). While at a high level, these documents may be similar (e.g., referring to the same contract or deal) they textually look and feel different. This may be a future direction to explore.
Transparency…? Interpretability.
The biggest take-away from this project is when designing for a ML-driven feature, the goal is to achieve interpretability by giving users something that they are familiar with and can somewhat predict, instead of trying to achieve transparency (i.e., revealing the technical details) or overly abstract the concepts.
It is certainly easier said than done, and the key to designing ML-driven features successfully is to be aware of the user’s tasks and contexts. In this case, the design changed based on the change in understanding of who this feature is beneficial for, and in what circumstances and for what purposes.
Design + ML Research
It’s getting increasingly common for the design team to closely collaborate with ML research team when exploring solutions for ML-driven features—solutions that are either purely design or technical. ML research team can really help the designers understand the feasibility of a design solution, but can also be the one driving “magical” solutions since they know the state-of-the-art technologies. They also can be extremely helpful when checking the scientific rigour of our user research methods.
In this kind of collaborations, it is always helpful for the designers to learn more about machine learning basics, so that in the end, design becomes the translation that bridges the gap between user needs and technology that can be ambiguous and hard to explain.
Recognition
The short paper on this research and design is published at CHIIR (ACM SIGIR Conference on Human Information Interaction and Retrieval) 2019.