Multidimensional Data Reversion: Harnessing Analytics and Visualization | Insights

Speakers:

On this second episode of Ropes & Gray’s Insights Lab’s four-part Multidimensional Data Reversion podcast series, Shannon Capone Kirk, managing principal and global head of Ropes & Gray’s advanced e-discovery and AI strategy group, is joined by David Yanofsky, director of data insights, analytics and visualization at the R&G Insights Lab. Together, they discuss the importance of using analytics to organize and analyze large datasets, such as emails and documents, to uncover key concepts and trends. The episode also highlights the power of data visualization in storytelling, compliance monitoring, and project management within the legal context. Discover how interactive and well-designed visualizations can drive understanding, engagement, and decision-making, and learn why building and maintaining robust data systems is essential for long-term success.

Transcript:

David Yanofsky: Hello, and welcome to Multidimensional Data Reversion, a show where we are digging into where data analysis intersects with the law. I’m David Yanofsky, director of data insights, analytics and visualization in the R&G Insights Lab.

Shannon Capone Kirk: And I’m Shannon Capone Kirk, managing principal and global head of Ropes & Gray’s advanced e-discovery and AI strategy group.

David Yanofsky: Shannon, last we spoke, we went deep on e-discovery, how we got here, what’s changing, why we do it in the way in which we do it, and what companies can do now to set themselves up better to interact with the legal system, be it a government regulator, be it a civil lawsuit, be it an internal investigation. You brought up a couple of things that I want to go deeper on, and the first is analytics. When you think about the analytics of discovery and the analytics that businesses can be having, what are the things that you see as working?

Shannon Capone Kirk: In our last episode, I mentioned that it is critical that we use terminology correctly or at least use a term as we define it for application in the law. This is one of those terms where I’ve seen people get tripped up, and they either use it as an overgeneralized term, or they think that it is something beyond their realm of knowledge, so they shut down. Again, like machine learning as applied to litigation—as we discussed in episode one—for me, it’s actually pretty simple in concept.

When I say “analytics” in the discovery realm, in application to litigation or investigations, I actually mean something fairly specific, and that is conceptual clustering of documents in a way that we can take, for example, all of my Gmail and an analytics tool will help me to organize it in an automated fashion around concepts that are found within all of my Gmail or I direct it to create—concepts around holidays, family events, my son’s college application process when we went through that—or I could ask a tool such as Brainspace to take 1 million documents that we’ve collected and build me some conceptual clusters or build them all together around certain concepts that I’m interested in investigating for a given case. That is how we primarily conduct investigations now for our clients. We can take the data and essentially take our hands, shove them inside the computer, and build clusters of data around concepts so that we’re not just fishing around in search terms. You can also, obviously, beyond just concepts, look at analytics tools that help us build timelines around certain events or create visual maps of who’s talking to whom and when about what. All of that is analytics in the context of discovery. Now, the question to you: Do you define it differently?

David Yanofsky: I define it more broadly than what you just did but basically the same, in which I would say analytics is using data, software, and visualization to answer specific questions. And that question may be, “What emails in this group are about my son’s college application?” That question may be, “What kind of off-label marketing have we been doing with this product?” It could be, “How does the spend that our salespeople are out in the world entertaining our clients with compare to a year ago?” Analytics is providing the tools to answer the business and legal questions that someone has based on the data that they also have.

Shannon Capone Kirk: My focus has been on this specific investigation and, “How am I going to use analytics to answer this investigation’s question?” Why don’t you tell me how yours would complement or expand on my focus on that one investigation?

David Yanofsky: The genesis from an investigation is a good one to talk about because, say it’s a pharmaceutical company that we do an investigation for, and that pharmaceutical company has found that some salespeople in some jurisdiction are logging their expenses wrong, or they say they’re splitting expenses. No one’s supposed to split their expenses—the most senior employee at a meal is supposed to pay, but that’s not happening. This investigation has found that there is a culture of people taking two corporate cards, putting them down at a dinner, splitting the bill, and now, that meal looks like it was half as expensive as it was in the systems. Those meals are being approved by different approvers, and so, no one really—from the standard controls that exist there between people just following the rules, to a manager approving that expense and seeing that it follows the rules—catches that. But it came up in the investigation—we looked much more specifically into these transactions. We were looking for that very specific behavior, and we were able to take a step back and look at lots more data to see where else this is happening. Now, there may be some discipline or some remediation that needs to happen in that one investigation, but the company wants to stop this from happening in the future.

So, how do we monitor that behavior over time? That is one piece of analytics. In this case, how can we automate and display our success in preventing salespeople from splitting meals? We do that by looking at all the data at regular intervals and generate reports that are either real-time or at an interval that helps answer the question. Along with that, we can say, “We sent out an all-company email reiterating the policy. We had new training of certain people.” And we can see these on a timeline: Where did these things happen? And how is that changing the flags? Now, those analytics are not going to be as precise as looking at that one investigation. When we do an investigation, humans are looking at the receipts. Humans are interviewing people at the meal. Analytics, you have to be a little more comfortable with some squishiness. We can say, “If we look at this trend over time, sure, there are going to be some false positives in there, but, on the whole, we can still track this.” That’s the thing that I think about with analytics is the ability to monitor, the ability to use the same analysis over and over and over again on data that is new and constantly updating to provide answers to these questions.

Shannon Capone Kirk: Let’s talk about visualization. I have had the benefit of sitting through meetings with you and seeing what you mean by visualization, and it is powerful. This is an audio recording, a podcast, so when we talk about visualizing the data, it’s a little difficult, but I am going to ask you why it is important to be able to visualize the data, especially over time when we are talking about summary statistics. That’s part A. And not to make this an entire diss track on static Excel spreadsheets, maybe explain what the difference is.

David Yanofsky: It’s funny you say that because I didn’t realize it until just now that, when I see folks out in the world doing complex analysis in Excel, I think, “They’re not like us.” The diss track of the day just fits so well. But to your question about visualizing the data, at its very basic level, it is an extremely effective storytelling technique: how did something change over time? We are used to seeing that as a line chart or a bar chart: up and to the right is a good thing; down and to the right is a bad thing. When I go to companies and I talk to compliance teams or otherwise about how you tell your story with your data, it’s all about up and to the right. The sales team has a very easy-to-understand chart that goes up and to the right. The marketing teams have a very easy chart to understand that goes up and to the right. Even R&D teams can show their success in a chart that is up and to the right. Legal and compliance teams struggle to come up with the chart that shows their efforts moving the company up and to the right, and so, just visualizing the data will put you on par with the other functions at your organization. They are communicating that way—you need to communicate that way, and it is a way that tells a story very quickly and very efficiently.

When it comes to summary statistics, my favorite way to describe it is this: it is a fact of our lives in our data. Data visualization expert, Alberto Cairo, created a data set called the “Datasaurus.” The Datasaurus is a plot that has very specific summary statistics. You would look at this data and think, “This is just like, a blob.” If you were to plot this data, it would just look like a cluster—no trend in any other direction. However, when you actually plot this data, it makes a picture of a dinosaur. In a similar way, you can create data sets that have the exact same summary statistics as that dinosaur that look like an oval, that look like an X, that look like a star, that look like multiple slanted lines, that look like the six points on a die—six clusters of dots. But if you were only looking at the summary statistics, you wouldn’t realize that. You wouldn’t realize, “These summary statistics, I shouldn’t be looking at them on the whole—I should be looking at them for each one of these six dots. I should be more interested in the fact that there are five diagonal lines, and I should look at each of those diagonal lines individually to see why they are separated in the way that they are.” So, just as a matter of understanding what you’re looking at, you’ve got to visualize the data, and that’s before you even get to trying to communicate what the data actually says. This is sense checking, this is understanding your data, and you just cannot understand your data if you can’t visualize it properly.

Shannon Capone Kirk: That’s right. Having seen some of your visualizations over massive amounts of data, you definitely more quickly get folks who aren’t living in the muck and the mire of the data to get to where you are so that critical decisions can be made if you have it visualized.

David Yanofsky: Another really important thing about visualizing your data—or at least a really powerful thing that you can do—is to make your displays interactive. People get fixated on visualizing the data and showing a single point in time or a single view into the story that you’re trying to tell. Interactivity is an extremely powerful way to do that, and the user experience of that interactivity is really important to driving understanding because it means that it gets rid of gatekeepers (in a good way) and it allows people to explore their curiosities. One of the issues that I constantly see with clients and that I am constantly being asked about with clients is, “How do I get my stakeholders engaged in the information that I’m telling them? I’m telling a great story about it, but they just don’t care about our work.” Being able to have well-designed, good user experience, interactive versions of the story that you’re telling, even if they never touch it, allows you to show it to them in a way that they don’t expect and in a way that they are usually excited by.

Shannon Capone Kirk: I totally agree. The interactivity of it is critical to me, especially when we’re in a compliance, litigation, or investigation world because the reality is—and this is why a lot of the project management tools have not worked for me—we don’t live in a world where things are static. We live in a world where things change, sometimes on the hour. We need to be able to accommodate these changing worlds that we live in—the data’s changing; the realities and rules are changing—but we also need to be able to change what we’re viewing, the visualization, to say, “But what if we look at it from this angle? And what if this element were different, or we change this element? How can I interact with this data in a way that helps, if we’re talking about litigation, shape how I build the defense of this case?” To bring it back to our diss track on Excel spreadsheets that are static, you can’t really do that with your static Excel spreadsheet so much. Like any educational tool, it’s always better when you have a visualization. A basic example here is trying to teach a group of grade schoolers about the different states and the different capitals in the United States. If you did that by just speaking through it or reading words on a page, you would probably have a longer road to go than if you just threw up a map of the United States and walked through it with them.

David Yanofsky: The other aspect of maps that is really powerful is that, even if you’re visualizing data on a map or you’re talking about information about the map, because we are visual people, we have memories of shapes and we have associations with shapes. When we think about the idea of California, I think most people in America can see the shape of California in their head. Now, maybe they can’t discern the difference between, say, Colorado and Wyoming, but people know what Texas looks like. A mnemonic device, but visual, is helpful to both understand the information you’re looking at and communicate that information to someone else.

Shannon Capone Kirk: I’m one of those. I remember by color, and I know a lot of your visualizations employ color. Isn’t that also right?

David Yanofsky: Yes. Especially when we’re talking about creating analytics and analytics dashboards for monitoring, to not get fixated on a specific number, using color in those situations to categorize the status of things is really powerful. People are trying to get from red to yellow, or from yellow to green, or whatever the color ramp is or whatever the shape symbol is that we’re using to show status over time, and you start being able to very easily see really powerful stories about progress or regression on the metrics that matter and the questions that you’re trying to find answers to.

Shannon Capone Kirk: Do you find that that kind of color use can also be not just a decision or a decision prompt, but also a behavioral prompt? In other words, people are motivated to change that color from red to green. Do you find that that’s helpful psychologically?

David Yanofsky: There’s a huge amount of the male population that’s red-green colorblind, and so, that’s why I talk about symbols and talk about very specific color choices and different color ramps because we do need to make sure that when we’re telling these stories, or that when we need people to understand these stories, that these stories are accessible. But yes, there is a huge motivating factor in that, and it is a conversation that we have very early in the process actually around the colors that we’re using and the language that we use to describe them to be able to motivate the target audience properly. There are some situations where, if we express our concern for the status of some aspect of the company’s business in the wrong way, that company may no longer want to engage in the process of improving because we have made something red, it has revealed a difference of opinion that is too stark to bridge. The end goal is to get people to improve their policies and programs, to get the thing that they’re monitoring to a better place, and you need to make sure that they’re willing to come on the journey with you. Color can be a very big motivator, but can also be a big deterrent, and so, making sure that everyone’s on the same page about the language that’s being used and what these things mean, and what it takes to get from status to status is a really important conversation to have.

Shannon Capone Kirk: It’s really interesting, and I’m glad you walked through that. You could see how some might just assume, “We’re going to use colors, and where something is negative, we’re going to use red.” But it’s really important to have that balance and that conversation. That was really helpful.

David Yanofsky: I’m talking about this work in a situation where we’re dealing with an investor client who has done an analysis of one of its investments. We’re engaged with an investor to help them understand another company—this dynamic comes up when we start having indirect involvement with these companies. In that situation, we also have investors who are having us look at multiple investments at the same time, and so, to be able to show, “Sure, in an absolute sense, we think you really need to improve this policy. But if you look at all of the other investments that your investor has made across its portfolio, you’re actually outperforming. If we look at the whole industry, the whole world, you have a lot of improvement, but in terms of pressure from your investor right now, you’re leading the portfolio even though you’re under-performing in the world.” To be able to tell a complex story like that is also really important to get buy-in and tell the complete story with your visualizations.

Shannon Capone Kirk: Glad you touched on a portfolio of investments and comparing that and making sure the right context is available and visual. I want to also talk about a different goal with respect to visualizing a lot of data, and that is project management in the litigation space. Everything you just said about context around multiple investments, and visualizing and congealing a lot of data can also apply to civ litigation involving multiple jurisdictions, multiple lawsuits and helping lead attorneys in those situations distribute and triage resources in a really effective way, rather than feeling like you’re living in chaos and everybody has an emergency every single day. Can you talk through how this same concept of visualizing for purposes of compliance could also apply in that context?

David Yanofsky: When we think about a portfolio of companies, we have one investor and a portfolio of companies under it—that isn’t so different than one company with a portfolio of litigation in front of it. You have these atomic units of cases that, in the aggregate, you want to understand. You want to understand how many are in pre-litigation. How many of them are in discovery? How many of them are currently being argued? How many of them have motions going back and forth? How many are on appeal? You want to understand the severity of them. What is your downside risk of these cases? And then, being able to aggregate all of that up to a portfolio-wide level so you can say, “Our docket right now is leaning really risky. All of our cases are in discovery. We need to devote resources to that.” There are also specific questions that each client is going to have that are somewhat unique to them, and being able to assign and capture the data that you need to answer those questions is the most important thing that you can do because the only way in which you’re going to answer the question is if you are capturing and collecting the data on each individual atomic level so that you can then roll it up and have a portfolio-wide view.

Shannon Capone Kirk: You and I were talking—we were analogizing the importance of this so that you aren’t making mistakes or forgetting, for example, that you actually do have a triage issue down in Texas, and it is important to have that visualization available to you. But we also talked about how the data is there, and there are programs. There are folks like you and groups like you that can help us from the technology end, but we do have to have, and place an importance on, building the garden and continually tending the garden. Your dynamic, visualized map of the United States of all of this litigation for corporation X, Y, Z could be great, but you’ve got to build it, you’ve got to define the rules for it, and you’ve got to assign a person who lives with it and tends that garden.

David Yanofsky: Absolutely. You reap what you sow. If you want tomatoes, put the seeds in the ground, because otherwise, you’re going to get to summer, and you’re not going to have tomatoes. The reason why there are so many business metaphors that are about agriculture and gardening, “The fruits of your labor,” you can only pick this fruit if you grow the tree. Trees by their very nature are slow to grow. They can take years to be productive. But once they are productive, you have a bounty of fruit of your labor. You can make things from it. You can understand things from it. You can use it to enrich your understanding of things and make sure that you can sustain your business through these efforts.

Shannon Capone Kirk: You won’t see the ROI from day one, but you will over time. We both agree that all of this data that we’re under can be crushing, but we can use it if we visualize it correctly and, again, from day one, put an emphasis of importance on building and tending that garden, and that has to be in human capital.

David Yanofsky: That’s going to be it for Multidimensional Data Reversion for today. On our next episode, we will be talking about data visualization, so be sure to subscribe wherever you get your podcasts. I’m David Yanofsky.

Shannon Capone Kirk: And I’m Shannon Capone Kirk. Thank you for listening.