You can input long-form (>3,000 words) sections of text and statistics into Chat GPT. Then, with the right selection of prompts, you can use Chat GPT 4 to query that text in natural language.
This is a great way to get insights into large text (i.e., whitepapers, reports, surveys), but it comes with some caveats.
In this blog post, we show you the following:
How to input long-form text into Chat GPT4 for analysis.
9 prompts you can use to gain insights.
A practical example demonstrating why you need to be extremely careful when it comes to the outputs.
You can’t just upload a 20-page white paper into ChatGPT4 and start asking questions like “How much worse was this year compared to last year for endpoint attacks” or “What percentage of respondents to this survey felt more secure after installing EDR.”
This is because Chat GPT4 has a token input limit. It can only analyze so much text (represented by tokens) at once. In our experience, the limit for a Chat GPT4 input is 3,500 tokens (3,000 words).
Note: OpenAI claims that Chat GPT4 can process 32,000 tokens, but this figure is the upper limit for an entire conversation, not an input.
The amount of tokens you can input into Chat GPT in a single post is far less. Also, remember that one token = 0.75 words.
For example, when you paste a lot of text (>3,000 words) into Chat GPT4, it tells you to resubmit text that is shorter.
So how do you analyze longer texts? Or texts with lots of stats and figures that take up tokens as well?
Answer: You split your text into multiple sections or “chunks.” That way, Chat GPT4 can understand a large document.
The easiest way to do this is to use the free ChatGPT Splitter tool. This tool allows you to copy-paste text and split it into defined character-limited sections. It also gives you a pre-built prompt to tell ChatGPT4 how to handle the text.
But just in case this tool ever goes offline or you cannot access it, here is a process, including a prompt, that you can use to analyze long-form text with Chat GPT:
Find out your text’s word count by copy-pasting it into a Word or Google doc and using the word count function.
If your text (plus your prompt) is over 3,000 words (even slightly), you need to split it manually. Do this by dividing it into two chunks of text.
If your text is more than 6,000 words, you will need to split it into at least three chunks. You can divide text by copy-pasting it into a doc and then creating a break in the doc to separate it.
After you have separated your text into chunks, paste this prompt into the Chat GPT4 window: “Act like a document/text loader until you load and remember content of the next text/s or document/s.There might be multiple files, each file is marked by name in the format ### DOCUMENT NAME.”
Press enter and then paste this prompt in the box: “[START CHUNK 1/2 ] ### file.txt ###”
Then, paste your first chunk of text.
At the end of the text, copy-paste the prompt “[END CHUNK 1/2] Reply with OK: [CHUNK x/TOTAL], don't reply to anything else, don't explain the text!”
CHAT GP4 should respond with “OK: [CHUNK 1/2].”
For your next chunk:
Before your text, paste the prompt “[START CHUNK 2/2].”
At the end of your text, paste “[END CHUNK 2/2] Reply with OK: [CHUNK x/TOTAL], don't reply anything else, don't explain the text!”
CHAT GPT4 should respond with “OK: [CHUNK 2/2].”
Now your text has been inputted into Chat GPT. You can begin to query it.
Careful prompt design can help you understand insights from large volumes of text. This can help brainstorm new ideas for content assets like turning a whitepaper into a series of blog posts or just give you some data to back up claims in your existing content (for example, you could dump survey data from two or more years to compare).
But how do you know what questions to ask? Here are nine prompts:
“The text provided above describes survey responses to a survey about [subject of the survey]. Rank the most important findings in the context [subject of the survey] and explain why they have been ranked this way.“
“Using bullet points, list the people, groups, organizations, job titles, and countries that responded to the survey questions in the text provided.”
“This text contains survey data from two years (x) and (y). Identify the most noticeable differences between the findings described in the following two years (x) and (y). Rank these findings based on ascending order of magnitude of change and show the relevant figures in your response for each year and listing”
“Please look at the text and identify any cause and effect findings reported.”
“Tell me about the actors in this text and group them into types. Actors are people, groups, organizations, and states.”
“The text talks about the challenges/opportunities in the field of ( ). Rank these three challenges/opportunities in terms of their severity/potential benefit, and explain why they have been ranked this way.”
“Can you provide a list of narratives contained within the text? Use a two to three-sentence paragraph for each narrative. Pitch me each narrative.”
“Summarize the text in 250 words.”
“Use newspaper headlines to describe the information within the text.”
CHATGPT4 is not a sentient being. It does not actually understand anything about the text you give it, so be very careful when querying it for information.
To test this theory, we fed Chat GPT4 a cybersecurity market report that featured statistics from 2020, 2021, and 2022. We wanted to know how attitudes to risk, as surveyed in the report, had changed.
We asked questions like:
How have CISOs’ perceptions of risk changed in the past three years?
What are the biggest drivers of cybersecurity spending, and how have these changed over the years?
Chat GPT4 then gave us some answers, including: “CISOs have become more conscious of risk as a result of factors like digital transformation…..” and “Regulations have encouraged spending…..”
This helped us gain rapid insight into the document.
But Chat GPT4 will not always give you true answers - even when you provide a body of text to analyze.
To validate Chat GPT4’s output, we asked it to give us statistics to support its claims. Sometimes these would be real statistics from the text we provided (we double-checked these by searching the report manually), but other times they would be made-up statistics.
When we asked Chat GPT4 where the fake statistics came from, it apologized. Then it would give us another fake statistic…
If we did not check Chat GPT4’s claims by a) asking it what statistics backed up its claims and b) manually verifying those statistics, we would have never known whether it was right or wrong.
So the moral here is “Never trust, always verify.”
The above outlines one limitation of Chat GPT4 - reliability. But when using this tool, you also need to ask yourself- Is ChatGPT4 giving me real insights into this text?
Think "interpretive sparring partner,” not “magic machine.”
Chat GPT4 does not replace the requirement to read long-form content and form your own opinions on it. Use it as a shortcut at your peril.
But it is a way to get another pair of eyes on something that might prompt you to think differently about the content before you or even help you generate new ideas for repurposing it.
We use Chat GPT4 to augment our own insights and challenge our assumptions.
Need help with your content? Contact us today.
Written by Robert Galvin