Source: ShutterstockGenerative AI has the potential to drive a once-in-a-generation step-change in business performance and productivity, but a recent, first-of-its-kind scientific experiment demonstrates that generative AI can also be a double-edged sword.
When used correctly for appropriate tasks, it can be a powerful enabler of competitive advantage. However, when used in the wrong ways or for the wrong kinds of tasks, generative AI will diminish, rather than boost, performance.
This Thursday, November 30th, will mark the one-year anniversary of OpenAI’s public release of ChatGPT, the generative AI application based on the company’s GPT large language model. For the past year, generative AI has been the hottest topic in marketing and one of the most widely discussed developments in the business world.
Several surveys conducted this year have consistently shown that most marketers are using – or at least experimenting with – generative AI. For example, in the latest B2B content marketing survey by the Content Marketing Institute and MarketingProfs, 72% of the respondents said they use generative AI tools.
The capabilities of large language models have been evolving at a breakneck pace, and it now seems clear that generative AI will have a profound impact on all aspects of business, including marketing. Some business leaders and financial market participants argue that generative AI is the most significant development for business since the internet.
Given this importance, it’s not surprising that generative AI is becoming the focus of scholarly research. One of the most fascinating studies I’ve seen was conducted by the Boston Consulting Group (BCG) and a group of scholars from the Harvard Business School, the MIT Sloan School of Management, the Wharton School at the University of Pennsylvania, and the University of Warwick.
Study Overview
This study consisted of two related experiments designed to capture the impact of generative AI on the performance of highly skilled professional workers when doing complex knowledge work.
More than 750 BCG strategy consultants took part in the study, with approximately half participating in each experiment. The generative AI tool used in the experiments was based on OpenAI’s GPT-4 language model.
In both experiments, participants performed a set of tasks relating to a type of project BCG consultants frequently encounter. In one experiment, the tasks were designed to be within the capabilities of GPT-4. The tasks in the second experiment were designed to be difficult for generative AI to perform correctly without extensive human guidance.
In both experiments, participants were placed into one of three groups. One group performed the assigned tasks without using generative AI, and one used the generative AI tool when performing the tasks. The participants in the third group also used generative AI when performing the tasks, but they were given training on the use of the AI tool.
The “Creative Product Innovation” Experiment
Participants in this experiment were instructed to assume they were working for a footwear company. Their primary task was to generate ideas for a new shoe that would be aimed at an underserved market segment. Participants were also required to develop a list of the steps needed to launch the product, create a marketing slogan for each market segment, and write a marketing press release for the product.
The participants who completed these tasks using generative AI outperformed those who didn’t use the AI tool by 40%. The results also showed that participants who accepted and used the output from the generative AI tool outperformed those who modified the generative AI output.
The “Business Problem Solving” Experiment
In this experiment, participants were instructed to assume they were working for the CEO of a fictitious company that has three brands. The CEO wants to better understand the performance of the company’s brands and which of the brands offers the greatest growth potential.
The researchers provided participants a spreadsheet containing financial performance data for each of the brands and transcripts of interviews with company insiders.
The primary task of the participants was to identify which brand the company should focus on and invest in to optimize revenue growth. Participants were also required to provide the rationale for their views and support their views with data and/or quotations from the insider interviews.
Importantly, the researchers intentionally designed this experiment to have a “right” answer, and participants’ performance was measured by the “correctness” of their recommendations.
Given the design of this experiment, it should not be surprising that the participants who used generative AI to perform the assigned tasks underperformed those who did not by 23%. The results also showed that those participants who performed poorly when using generative AI tended to (in the words of the researchers) “blindly adopt its output and interrogate it less.”
The results of this experiment also raise questions about whether training can alleviate this type of underperformance. As I noted earlier, some of the participants in this experiment were given training on how to best use generative AI for the tasks they were about to perform.
These participants were also told about the pitfalls of using generative AI for problem-solving tasks, and they were cautioned against relying on generative AI for such tasks. Yet, participants who received this training performed worse than those who did not receive the training.
The Takeaway
The most important takeaway from this study is that generative AI (as it existed in the first half of 2023) can be a double-edged sword. One key to reaping the benefits of generative AI, while also avoiding its potential downsides, is knowing when to use it.
Unfortunately, it’s not always easy to determine what kinds of tasks are a fit for generative AI . . . and what kinds aren’t. In the words of the researchers:
“The advantages of AI, while substantial, are similarly unclear to users. It performs well at some jobs and fails in other circumstances in ways that are difficult to predict in advance . . . This creates a ‘jagged Frontier’ where tasks that appear to be of similar difficulty may either be performed better or worse by humans using AI.”
Under these circumstances, business and marketing leaders should exercise a significant amount of caution when using generative AI, especially for tasks that will have a major impact on their organization.
(Note: This post has provided a brief and necessarily incomplete description of the study and its findings. Boston Consulting Group has published an article describing the study in greater detail. In addition, the study leaders have written an unpublished academic “working paper” that provides an even more detailed and technical discussion of the study. I encourage you to read both of these resources.)