How To Get the Most Bang for your AI Buck
"AI ROI" can be found almost anywhere...if you look hard enough.
The Path #22
Written by Nate Buchanan, COO & Co-Founder, Pathfindr
Those of you who are avid readers of this esteemed publication may recall that way back in Edition #4 we talked about why it’s so difficult to calculate value from AI. This week we’re going to show you how to overcome those difficulties and put together a value framework that will help your team decide where to invest in AI capabilities and how to maximize the return on that investment.
We begin by establishing a simple foundation, one that is likely familiar to anyone who has had to create a business case or track metrics for a project: the difference between quantitative value and qualitative value.
Note that there are some aspects of value that could fall into both categories, such as quality.
It’s possible to measure quality in a quantitative fashion - for example, you could calculate the number of defects found in a production application relative to how many were found during development and testing - but that doesn’t tell the whole story. As any moviegoer or hip-hop connoisseur will tell you, quality is also a qualitative metric that is heavily dependent on an individual’s point of view and past experience. For example, if I were to posit that the majority of hip hop released after 2010 is of low quality, that is a qualitative, not quantitative, statement (however true it might be).
When it comes to AI, there are many different ways to calculate the benefits that your team is getting. However, the best way is to isolate the process you are looking to improve in a test environment or “sandbox” that mirrors the real world as closely as possible. Once you’ve done that, you can introduce the AI enhancements that you’ve developed to improve the process and ask a seasoned practitioner to execute the process with and without the help of AI.
Suppose you supervise a team of 10 that is responsible for taking invoices received via email in PDF format and manually converting them into line items in a table in your accounting system. A simple experiment to calculate value could consist of splitting the team into two groups of five, giving each team the same stack of 100 invoices, and allowing one team (but not the other) to use a new AI application that you have developed that allows the user to upload a PDF invoice and converts it into an entry in a table that can then be uploaded into the accounting system. More than likely, the team using AI will be able to process the invoices in far less time - let’s say it’s 2 hours. By multiplying 2 hours by the average hourly rate of the team, you can arrive at an estimate of cost savings per 100 invoices processed. Dividing the number of invoices processed in a typical year by 100, multiply that by the cost savings number you just calculated, and you’ve got an estimate of annual cost savings from an AI-powered invoice processing solution.
For processes with discrete inputs and outputs - such as invoice processing, document translation, customer call summarization, and so on - this is a fairly straightforward way to calculate quantitative value. Other tasks may lean more towards qualitative value, usually because they are inconsistent or subjective in some way. These include:
Writing emails - while Microsoft Copilot, Google Gemini, and ChatGPT can be used to write emails for you, the length of an individual email varies widely depending on who’s writing it and the topic being discussed. While writing emails often takes up a significant portion of my day, I’d be hard-pressed to put a number on it because it changes often.
Content creation - creating content with AI, whether it’s a presentation, an image, or a video, can be a time saver but also highly unpredictable. For use cases where the specifics of content don’t matter so much - like a stock photo accompanying a marketing email blast for example - AI-generated images may suffice, but you wouldn’t want to rely on AI to create a client-facing presentation for you.
Coaching or advisory - LLMs can be great at helping you think through multiple angles to a problem or suggest ideas you might not have otherwise thought of, but it’s difficult to calculate value from this. Looking at outcomes in the aggregate - for example, the long-term success of an advertising campaign that used AI to help brainstorm ideas - is one way to show benefit from this type of application.
This is not to say that you can’t calculate both types of value for these use cases - you can - just that each one is different and your team needs to be able to understand value in different contexts. Of course, value is only one side of the ROI equation. The other side is cost, and cost can be quite tricky to calculate when you’re working with LLMs because of the nature of token-based pricing.
Let’s set aside considerations such as the build and maintenance of the AI application itself - those costs are not insignificant, but they are fixed (more or less) while each call to the LLM may vary in complexity and, therefore, cost. So any calculation of AI ROI needs to factor this in. There are many free calculators online to help you estimate the cost of an individual LLM call - here is one example, with a screenshot provided below (credit to DocsBot for making this available).
Suppose I want to build a custom application that uses an OpenAI service to automatically craft responses to customer complaints received via email. Let’s further suppose that the quality of the response is more important to me than speed, so I want to ensure I’m using the best model available for high-order analysis and reasoning. If I use GPT-4, and an average incoming complaint is 200 words (or about 150 tokens), and I estimate that a response would be more than double that at 500 words once we have accounted for profuse apologies and offers to “make it right”, it’ll cost me $4.80 per response. A good estimate for the per-minute wage of a typical customer service representative in Australia is about 60 cents, so in order to make “cents” (get it?) it would need to take a human at least 8 minutes to craft a customer complaint response manually. That might be unrealistic…but what if we were able to get satisfactory performance out of GPT-4 Turbo or even GPT-4o? Then the ROI changes significantly. Using GPT-4o, the threshold drops from 8 minutes to 2 minutes, which is much more reasonable. There’s also the opportunity cost to factor in - a person trained to respond to customer complaints could spend time on the phone ironing out more pressing issues directly with human-to-human contact instead of responding to emails. It’s hard to quantify that in the short term, but if the number of customer complaints drops in the long term, you know you’re doing something right.
There’s a lot to unpack here, and we will be revisiting this topic in future editions. In the meantime, I encourage you to join our Practical AI for Leaders webinar next Thursday (20 June) at 2 PM AEST - we’ll be addressing where you can find pockets of AI value in your organization and a whole lot more.
Until next week!
Nate