AI Generates Complex Text, Challenging Many Readers

Jakob Nielsen

Jul 6, 202312 min read

Summary: Generative AI tools often produce text at high reading levels, posing difficulty for low-literacy users. Specifying lower reading levels in the prompt can go wrong, making a copy review imperative before publishing.

“Overwhelming documents” by Midjourney.

The content generated by prevalent AI tools (ChatGPT, Bard, Bing Chat, etc.) is typically formulated at a 12th-grade reading level or beyond, posing a significant challenge to low-literacy users. To ensure that your content is universally comprehensible, I recommend that you request a lower reading level in your prompt.

Based on usability research with low-literacy users, it has long been the recommendation to write online text at an 8th-grade reading level to target a wider consumer demographic. Let’s see how generative AI adheres to this guideline and strategize solutions to address its shortcomings.

Calm your horses, folks, before you get all "Do as I say, not as I do." I concede that the present article gleefully flaunts a 12th-grade reading level commensurate with my usual writing style. (For example, AI Vastly Improves Productivity for Business Users and Reduces Skill Gaps employs a 12th-grade reading level, and The Articulation Barrier: Prompt-Driven AI UX Hurts Usability ironically reaches the 13th-grade mark, despite discussing the hurdles less-literate users face in formulating articulate AI prompts.)

However, my articles are not intended for readers with limited literacy skills. I intentionally cater to a highly educated, intellectually refined audience of user experience professionals and discerning business leaders keen to enhance the quality of their digital presence. Maybe not all my readers have college degrees, but they surely have so high IQs that their reading skills are at the very right end of the distribution, meaning they guzzle sophomore-year texts like their morning coffee. As for the average consumer, my articles are not their cup of tea.

But if your company does business with that broad consumer audience, you can’t churn out digital copy that only the top 20% of literate individuals can decipher. Although the term "twelfth grade" may imply a high school graduate, the stark reality is that many individuals barely scrape through high school without acquiring the necessary skills. Only about a third of high school graduates in the United States can read at grade level.

[Follow me on LinkedIn for more UX updates.]

Current AI-Generated Text is Excessively Complex

For a brief copywriting example, I prompted ChatGPT and Bing Chat to “explain the strategic implications of the series of fortresses constructed by the Viking king Harald Bluetooth in Denmark.”

Bing Chat offered a response suited for a university freshman, reaching a 13th-grade reading level:

----------

The series of fortresses constructed by the Viking king Harald Bluetooth in Denmark are known as Viking ring fortresses or trelleborgs. These fortresses were built during a time when Harald Bluetooth tried to strengthen royal power in Denmark. They form an outstanding and innovative example of Viking architecture, engineering, technology and a demonstration of force. They are manifestations of new centralized powers of a Viking king and illustrate a significant stage in North European human history.”

----------

ChatGPT 4 replied at an eye-popping 16th-grade level. Only top-of-the-class college grads would find this text easy. Owing to the length of the answer, I am only sharing the initial two paragraphs to give you a taste of the text:

----------

Harald Bluetooth, the Viking King of Denmark from 958 to 986, initiated an extensive construction project of fortresses, known as Trelleborg fortresses, during his reign. These structures hold strategic importance in many aspects and understanding them gives us insights into the military, political, and social landscape of that time.

1. Military Strategy: Harald's fortresses were predominantly a military innovation. These ring-shaped fortresses were positioned strategically across the kingdom, creating a network of defensive structures that provided security against potential invaders. Each fortress was within a day's march of another, ensuring rapid deployment of troops and quick reinforcement. This reflected a significant change from the typical Viking strategy of quick raiding parties to a more organized, defensive military structure.

----------

(Admirably, ChatGPT adheres to my 1997 advice for web writing, enhancing the scannability of the content by highlighting keywords. The complete answer discussed 6 points; each frontloaded with a highlighted key term.)

What insights into Harald Bluetooth can ChatGPT provide? A surprising amount, assuming we're part of the top 10% literate segment of the population. (And no, he didn’t invent the famous communication protocol that bears his name. Image generated by Midjourney.)

Rewrite for Targeted Reading Level, But Check the Output

Fortunately, we can generate simplified text by stipulating a targeted reading level in the prompt. Regrettably, current AI tools don’t always perfectly comply with these directives, necessitating a post-generation review.

In another example, published by Microsoft, ChatGPT 4 was asked to condense a scientific paper for a 7th-grader. (Prompt: “Can you summarize the paper for a 7th grader, perhaps with additional background that explains the important issues at stake that these results read on?”) Although the ChatGPT output is well articulated and undoubtedly superior to the original academic writing, the resulting 3-paragraph summary has a Flesch-Kincaid readability score corresponding to the 9th grade. With many students reading below their current grade level, it would be prudent to target the summary at a 6th-grade reading level to ensure the comprehension of most 7th-graders. Regardless, a 9th-grade reading level is unquestionably too advanced for 7th-graders, save for the 5% highest-IQ students.

For a final example, I tasked Google's Bard and ChatGPT 4 to compose 500-word biographies of yours truly. The complete ChatGPT biography is reproduced at the end of this article to avoid interrupting the main narrative with extraneous content. Bard's bio contained an abundance of inaccuracies that I'd rather not propagate on the Internet, where it would become training data for future AI services. (Among other falsehoods, Bard claimed that my Ph.D. thesis was about eyetracking. While I did write a book about eyetracking, this occurred more than two decades after my Ph.D. — a bit too much time travel by Bard for my taste.)

Bard’s bio was written at a 12th-grade reading level, and ChatGPT’s bio was written at a 14-th grade reading level. It was more accurate but more challenging to read.

I prompted ChatGPT again, incorporating the directive to “write at an 8th-grade reading level.” Regrettably, this produced text written at a 10th-grade reading level, two levels higher than requested.

I managed to elicit from ChatGPT a biography written at an 8th-grade reading level only by adjusting the prompt to request it to “write at a 6th-grade reading level.” This text is also shared at the end of this article, allowing you to assess the contrast in comprehension ease between text written at a high reading level versus text written at a lower reading level.

It’s a shame that current generative AI frequently churns out text that overshoots the requested complexity. As long as this remains the case, it’s incumbent upon you to subject the copy to a readability test before publishing. You’ll often discover that you need to revise the prompt to request a lower reading level than you intend to target. (Checking reading levels is relatively straightforward, with numerous online tools available, and Microsoft Word even offering a built-in readability score as part of the “document stats” generated by the “Editor” feature.)

(Update July 29, 2023: Several readers commented that they have been successful at getting copy generated at a desired reading level by using prompts like "write at a Kincaid 8th grade reading level." I tried this trick in both Claude and ChatGPT, but without success. Claude generated 10th grade text and ChatGPT generated 15th grade text, despite being asked for an 8th grade reading level.)

A shortcoming of merely specifying a low reading level in the prompt is that the AI tools do not differentiate between the skills acquired by an adult who reads at a particular grade level and the mindset of a child attending school in that grade.

Take, for example, the ChatGPT biography of myself that employs an 8th grade reading level after I prompted it to write at a 6th grade reading level. Here’s a short quote: “People all over the world know about Nielsen's work. In 2000, he was listed as one of the "Top 100 Most Influential People in IT." That's a big deal!” Yes, the vocabulary and sentence structure make this text easy to read. But the tone of voice is decidedly too juvenile.

For a broad consumer audience, confine yourself to uncomplicated, familiar words and employ simple grammatical structures, with succinct sentences, predominantly written in the active voice, while largely steering clear of complex sentence structures, such as subordinate clauses that necessitate more cognitive processing to decode. In other words, refrain from sentences like the previous one.

But don't belittle your adult audience by treating them like children just because they aren't academic heavyweights. They are grown-ups and appreciate a mature tone of voice.

You should write differently for these two readers, even if ChatGPT doesn’t. (“12-year-old reading” and “30-year-old reading” images by Midjourney.)

Thus, we have two guidelines for online copy targeting a broad consumer audience:

Aim for an 8th-grade reading level — simple, succinct, and straightforward.
Do not adopt a tone of voice suited for 8th-grade children but compose for adults.

Until generative AI can hold these two thoughts in its little mind simultaneously, you must do this work for it during the editing process. (You do have an editing step, don’t you? Never post the unprocessed output from current AI tools. That’s too terrifying even to contemplate.)

If you target a higher-IQ audience, you can publish copy written at the 12th-grade reading level that current AI tools tend to generate by default. Unfortunately, existing AI tools seem to have a mischievous streak: without warning, they can crank up the difficulty dial to a staggering 16th-grade level.

Such elevated prose solely befits individuals whose reading prowess resides within the top echelons of the populace, the elite 5–10% who have ascended to the intellectual stratosphere. In light of these revelations, even when courting the illustrious cohort of erudite minds, you must not overlook the crucial step of assessing the readability during a meticulous editing process. (Don’t write like this: I got this paragraph straight from ChatGPT.)

The risk of getting a 16+ difficulty level means that even when targeting a high-literacy audience, you must check for readability during editing.

Let me conclude with ChatGPT’s wise words: “So heed my words, my esteemed interlocutor, and engrave this truth deep within your writer's soul. In the pursuit of linguistic excellence, readability is the unwavering companion that leads you to triumph, regardless of the path you choose.” [Here’s Jakob again because I can’t really let a computer have the last word: reading levels matter, regardless of your audience, and you should usually shoot for either 8 or 12, for broad and elite markets, respectively.]

Quiz: Check Your Understanding of This Article

“Take the Test” image by Midjourney.

I made ChatGPT give me 5 questions for you to check your understanding of this article. The correct answers are given after the appendix.

Q1: What is the recommended reading level for online text targeting a wider consumer demographic?

a) 12th-grade reading level

b) 8th-grade reading level

c) 16th-grade reading level

d) College-level reading level

Q2: What are the challenges faced by low-literacy users when interacting with AI-generated content?

a) Difficulty in formulating articulate AI prompts

b) Limited access to AI tools

c) High cost of AI-generated content

d) Lack of interest in AI technology

Q3: How does the reading level of ChatGPT's response about the Viking fortresses compare to the desired target reading level?

a) It is below the desired target reading level.

b) It matches the desired target reading level.

c) It is two levels higher than the desired target reading level.

d) It is four levels higher than the desired target reading level.

Q4: Why is it important to subject AI-generated text to a readability test before publishing?

a) To ensure accuracy of the information

b) To verify the source of the text

c) To assess the comprehensibility for the target audience

d) To meet regulatory requirements

Q5: What are the two guidelines suggested for online copy targeting a broad consumer audience?

a) Aim for a 12th-grade reading level and use complex sentence structures

b) Aim for an 8th-grade reading level and use a tone of voice suited for children

c) Aim for a 12th-grade reading level and use simple, straightforward language

d) Aim for an 8th-grade reading level and compose for adults using a mature tone

By the way, Q3 is a terrible question for testing the reader’s understanding of this content. The question only checks your memory of that example, which is completely unimportant. I kept the question to illustrate the need for editorial review, even for a quiz.

Appendix: A Tale of Two Bios

I asked ChatGPT 4 to “write a 500-word biography of Dr. Jakob Nielsen, the UX expert.” Here’s the result, which is written at a 14-grade reading level.

----------

Dr. Jakob Nielsen is a distinguished figure in the field of User Experience (UX) design and web usability, whose work has shaped the digital world as we know it today.

Born on October 5, 1957, in Denmark, Nielsen obtained a Ph.D. in human-computer interaction from the Technical University of Denmark in 1988. His education laid the foundation for his pioneering work in user experience and usability.

Nielsen began his career at the IBM User Interface Institute in the late 1980s. His tenure at IBM allowed him to work on various usability issues, marking the beginnings of his influence in the field.

In 1998, Nielsen co-founded the Nielsen Norman Group with Don Norman, another renowned figure in UX design. The Nielsen Norman Group quickly gained recognition for its work in research, consulting, and training in the field of user experience. It was here that Nielsen developed his widely respected "10 Usability Heuristics," which many designers consider to be a cornerstone of the field.

Nielsen's work is characterized by a deep commitment to the user. His focus on user-centric design, coupled with an insistence on rigorous testing, has pushed the field of UX design forward. His ideas, such as discount usability testing and heuristic evaluation, have become standard practice in the industry.

His efforts to spread knowledge and awareness about usability have also been commendable. He is the author of several influential books on the subject, including "Usability Engineering" (1994), "Designing Web Usability: The Practice of Simplicity" (1999), and "Prioritizing Web Usability" (2006). Nielsen also publishes a bi-weekly column, "Alertbox," on website usability.

Nielsen's contributions have not gone unrecognized. In 2000, he was listed in the "Top 100 Most Influential People in IT" by the Uptime Institute. Additionally, in 2010, Bloomberg Businessweek included him in their "27 Most Influential Designers" list.

However, his influence extends beyond accolades. Nielsen has been a strong advocate for a user-centered approach to design, consistently arguing that usability is not an optional extra but a core component of successful products. His "10 Usability Heuristics" are regarded as a seminal text in the field, providing designers with a set of guiding principles to ensure the creation of intuitive, accessible, and effective interfaces.

Dr. Nielsen's impact on the digital world is undeniable. His tireless dedication to improving the human-computer interface has made him a central figure in the development of the UX design field. His work has not only improved the way we interact with technology today but also set the stage for the development of more user-friendly technologies in the future. In essence, Nielsen's work underscores his belief that at the heart of every technological innovation should be the user. And it is this vision that continues to guide the field of UX design.

----------

To get copy written at the required 8th-grade reading level for targeting a broad consumer audience, I changed my prompt to ChatGPT 4 to request text written at a 6th-grade reading level. (So I had to ask for two levels below the desired level to get ChatGPT to produce what I wanted.) Here is what it produced. Note how much easier this copy is to read, but also note how unfortunate it is that the tone of voice has become too childish for adult readers.

----------

Dr. Jakob Nielsen is a well-known person who works in an exciting field called User Experience (UX) design. He also works on something called web usability. He was born on October 5, 1957, in a country called Denmark. He got a special degree called a Ph.D. in something called human-computer interaction from the Technical University of Denmark in 1988. This degree helped him do his important work in user experience and usability.

In the late 1980s, Dr. Nielsen started working at a big computer company called IBM. Here, he worked on making computers easier for people to use. This work was very important in the computer world.

In 1998, Nielsen started his own company with a friend named Don Norman. They called their company the Nielsen Norman Group. This company quickly became famous for their work in user experience. At this company, Nielsen made a list of "10 Usability Heuristics." Heuristics is just a fancy word for rules or guidelines. These are 10 rules that many people who design websites or apps use to make sure their designs are easy for people to use.

Nielsen has always cared a lot about making things easier for people who use computers and websites. He believes in testing products many times to make sure they work well for users. He also shares his ideas with others. He has written several important books and writes a column every two weeks about how to make websites easier to use.

People all over the world know about Nielsen's work. In 2000, he was listed as one of the "Top 100 Most Influential People in IT." That's a big deal! And in 2010, Bloomberg Businessweek said he was one of the "27 Most Influential Designers." These are just a few of the ways people have recognized his hard work.

But Nielsen's biggest achievement is how he has helped make computers and websites easier for people to use. He thinks that the most important thing about a product is that it works well for the user. His "10 Usability Heuristics" are rules that many designers use to create things that are easy and fun to use.

In the end, Dr. Nielsen has done a lot to make the digital world a better place. He's worked really hard to make our computers and other devices easier to use. His work has helped make technology better for us today, and his ideas will continue to shape how we use technology in the future. He always reminds us that the user - the person using the technology - should always be the most important thing to consider. This is what guides people who work in UX design today.

----------

Quiz Answers

Don’t read this until you have attempted the quiz (above the appendix) yourself.

Here are the correct answers:

Q1: What is the recommended reading level for online text targeting a wider consumer demographic?

Correct Answer: b) 8th-grade reading level

Q2: What are the challenges faced by low-literacy users when interacting with AI-generated content?

Correct Answer: a) Difficulty in formulating articulate AI prompts

Q3: How does the reading level of ChatGPT's response about the Viking fortresses compare to the desired target reading level?

Correct Answer: c) It is two levels higher than the desired target reading level.

Q4: Why is it important to subject AI-generated text to a readability test before publishing?

Correct Answer: c) To assess the comprehensibility for the target audience

Q5: What are the two guidelines suggested for online copy targeting a broad consumer audience?

Correct Answer: d) Aim for an 8th-grade reading level and compose for adults using a mature tone

AI Generates Complex Text, Challenging Many Readers

Current AI-Generated Text is Excessively Complex

Rewrite for Targeted Reading Level, But Check the Output

Quiz: Check Your Understanding of This Article

Appendix: A Tale of Two Bios

Quiz Answers

Recent Posts

Top Past Articles

Design Leaders Should Go “Founder Mode”

4 Metaphors for Working with AI: Intern, Coworker, Teacher, Coach

Dark Design Patterns Catalog

UX Angst of 2023-24

Jakob’s Law of the Internet User Experience

Ideation Is Free: AI Exhibits Strong Creativity, But AI-Human Co-Creation Is Better

The 10 Usability Heuristics Reimagined

UX Needs a Sense of Urgency About AI

AI Is First New UI Paradigm in 60 Years