AI Tips #7: Best Practices for Effectively Analysing Documents with AI Tools

Explore how to more effectively prompt and utilise AI tools to summarise, analyse and extract information from documents

Jun 25, 2024

Two Lego Stormtroopers poring over documents with magnifying glasses in a dimly-lit room — Without delving into how AI models work with documents and the right techniques, it can be difficult to make the most of their capabilities

Among the many capabilities that Large Language Models (LLMs) have endowed us with is the ability to summarise, analyse, and extract information from documents, whether they be in the form of PDFs, Word files, or spreadsheets.

Meanwhile most organisations are bursting to the brim with documentation. Meanwhile the average knowledge worker spends 5-15% of their time reading and reviewing information, and a whooping 30% of their workday simply searching for information. Therefore, when working with clients and participants of my AI workshops, this is often the most frequently requested topic.

Yet without delving into how AI models work with documents and the right techniques, it can be difficult to make the most of their capabilities in this regard. That’s what this instalment of AI Tips sets out to address.

By way of a heads-up, this is a fairly long article so please feel free to skip ahead to those sections that interest you:

Landscape of AI Tools for Document Analysis: overview of the types of AI tools that a typical, non-technical user is likely to employ when analysing documents.
Technical Approaches for Document Analysis: overview of approaches for document analysis, which has implications for document analysis best practices.
Best Practices for Document Analysis: general best practices, as well as best practices for specific systems.
Current Limitations of AI-Enabled Document Analysis: Specific tasks that AI systems struggle with in relation to document analysis.

Thank you for reading New World Navigator! Please feel free to share this post if you think others would benefit from it!

Landscape of AI Tools for Document Analysis

Let’s start with the types of tools that a typical, non-technical user is likely to employ when analysing documents.

#1 Large Language Models (LLMs)

LLM chatbots such as OpenAI’s ChatGPT (specifically the GPT-4 and GPT-4o models), Anthropic’s Claude, and Google’s Gemini, allow users to upload documents for summary and analysis through simple drag and drop or document upload interfaces.

Most of these LLM chatbots are also now largely multimodal in nature, meaning that they can interpret both text as well as images such as graphs and charts. There is currently however a limit to this capability (explained in more detail below) and most models today struggle to accurately analyse complex

Google’s Gemini also supports integration with Google Drive, while the paid versions of ChatGPT also support integration with Google Drive and Microsoft OneDrive, allowing these LLMs to directly access our documents without users having to upload documents.

#2 Document Chatbots

There are a multitude of document chatbots that specialise in summarising and analysing documents, most commonly PDF files. Most of these are “GPT-wrappers” meaning that they are powered by OpenAI’s GPT models.

Examples include ChatPDF, docAnalyzer, UPDF AI, PDFGPT.IO, ChatDOC, and PDFgear. Most of these employ a freemium model, allowing uploads of 2-3 documents and 10-20 questions for free each day under the free plans.

PDFgear in particular stands out because it is an entirely free-to-use product and is highly data secure. Rather than storing your personal information or data in the cloud, PDFgear works through a local client / application and your data is erased as soon as you exit the program.

#3 Knowledge Assistants

Going beyond Document Chatbots, Knowledge Assistants not only allow users to interact with one or more documents simultaneously but also serve as a repository for an individual’s, team’s, or organisation’s documents.

Examples include Sana AI, Humata AI, Petal, and Quivr. The likes of Quivr also support integration with websites, while Sana AI goes several steps further to also connect with other enterprise applications such as Slack, the Microsoft Office suite, Notion, Salesforce etc.

Most Knowledge Assistants tend to be targeted at teams and organisations rather the individuals, catering to their need for knowledge management and access solutions.

Technical Approaches for Document Analysis

LLMs that support document uploads (e.g., GPT-4o, Claude) generally adopt a distinct approach for working with documents from Document Chatbots and Knowledge Assistants, which has implications for the best practices when interrogating documents. Let’s explore how these approaches work.

#1 Chunking & Summarisation Approach

LLMs adopt an approach known as Chunking & Summarisation for working with documents. We’ll use the analogy of a student reading a book and taking notes to describe how this works:

Document segmentation / Breaking a book into chapters: Think of the book as being divided into chapters. Each chapter is of a manageable size that the student (or in this case, the LLM) can read without feeling overwhelmed. This is like dividing a long document into smaller, bite-sized chunks.
Tokenisation / Reading each chapter: As she reads each chapter, she take notes on the important parts. These notes are akin to tokens for an LLM, breaking down the chapter into understandable pieces.
Sequential Processing / Understanding the plot: The student goes through each chapter one by one, making sure she understands the plot and main points. Each chapter builds on the last, so she keeps the overall story in mind as she goes along.
Contextual Embedding / Connecting the Dots: While reading a new chapter, she recalls what has happened in previous chapters in the form of a mini mental summary. This helps her connect the dots and understand how the current chapter fits into the overall story.
Summarisation / Summarising Chapters: After finishing each chapter, she writes a brief summary of it. This summary captures the main events and key information from the chapter.
Aggregation / Combining Summaries: Once she has summarised all the chapters, she combines these summaries into a single, cohesive overview. This gives her a clear picture of the entire book without having to re-read every chapter.

#2 Retrieval Augmented Generation (RAG) Approach

Most Document Chatbots and Knowledge Assistants adopt an approach known as Retrieval Augmented Generation (RAG) for working with documents. We’ll use the analogy of a journalist trying to answer a specific question by conducting research in a library to illustrate how this works:

Document indexing / Creating a research database: The journalist creates a detailed research database of all the books, articles, and resources available in the library. This database helps him quickly locate where specific topics are discussed.

Query processing / Formulating the question: He formulates the specific question he needs to answer for his article, and translates this question into key search terms and concepts that can be used with the research database.
Retrieval / Searching the library: The journalist searches the library’s research database using the key terms and concepts from their question. They identify the most relevant books, articles, and sections that are likely to contain the answers.
Reranking / Prioritising sources: He prioritises the sources he’s found, determining which books and articles are most relevant and reliable. The journalist sorts through the top sources to decide which ones to read first.
Contextual embedding / Contextual reading: The journalist reads through the top-ranked sources, paying attention to the context around the information relevant to their question. They take notes on how different pieces of information relate to each other.
Generation / Writing the article: Using the information from the best sources, the journalist writes a detailed and coherent article that answers the specific question. He synthesises the information from multiple sources to provide a comprehensive answer.
Response delivery / citing sources: He may include citations in his article, referencing the specific books and articles he used to gather information. This provides transparency and allows his readers to verify the information.

Best Practices for Document Analysis

Below are some best practices that apply regardless of whether you are using the document upload functionality in an LLM (which employs the Chunking & Summarisation Approach) or Document Chatbots or Knowledge Assistants (which most employ RAG). We’ll illustrate these best practices using the example of a student analysing a hypothetical research paper titled, “The Effects of Climate Change on Coastal Ecosystems":

Contextualisation: Provide context within your prompts, which helps the AI to understand the scope and focus of your query. Examples:
- Contextually-poor prompt: "How does climate change affect communities?"
- Contextually-rich prompts: "To inform my economic analysis of climate change for my Masters thesis, can you tell me about how climate change economically impacts coastal communities?"
Clarity and specificity: Ensure your prompts are clear and specific, which makes it easier for the system to understand and respond accurately. Examples:
- Unclear and ambiguous prompt: "Tell me about marine life."
- Clear and specific prompt: "What does the document say about the impact of climate change on marine life, specifically focusing on fish species?"
Structured information requests: Request responses in a structured format (e.g., tables, JSON), which makes it easier for you to organise, understand, and utilise the information provided. Examples:
- Table request: "Summarise the key impacts of climate change on coastal ecosystems in a table."
- JSON request: "Provide statistics on sea level rise projections in JSON format."
Document structure: Provide references to the document’s structure (e.g., section or chapter names, page numbers) which can help improve accuracy and contextualisation. This is more critical for document uploads to an LLM as these systems are not able to semantically search across an entire document (as RAG-based approaches do) but instead work by analysing and summarising chunks of text one-by-one. Examples:
- Chapter reference: “Summarise the section on climate change impacts on coral reefs in Chapter 3”
- Page reference: "Refer to page 45 for detailed statistics on sea level rise projections."

Iterative approach: Be patient if you don’t get what you need right away. Adopt an iterative approach that allows for refining questions based on responses, progressively narrowing down to the most pertinent information. Examples:
- Initial prompt: "Summarise the impacts of climate change on coral reefs based on what has been written in this document."
- Follow-up prompt: "Can you provide more details on the causes of coral bleaching mentioned in the document?"
- Confirmatory prompt: "How do rising sea temperatures specifically contribute to coral bleaching?"
Break up large documents: For very large documents, for instance those in the hundreds or thousands of pages, it may make sense to break them up into smaller chunks using a tool such as Adobe's Acrobat PDF Splitter. This will help to improve the quality, relevance, accuracy, and speed of analysis.

When it comes to working with document uploads to LLMs, these specific practices can be especially helpful:

Gradual deep dive: Start with broad questions or summaries to build a foundation, before asking specific questions, or questions that seek out comparisons and relationships across topics. This allows the model to build sequential understanding by progressively adding new information and linking it to previously discussed topics. Each response builds on prior answers, ensuring a logical progression of information. Examples:
- Initial prompt: "Summarise the main points of the document from the introduction regarding the impact of climate change on coastal ecosystems."
- Follow-up prompt: "What specific effects does climate change have on coral reefs as mentioned in the document?"
- Connection prompt: "How does the document relate the impact on coral reefs to the overall health of marine biodiversity?"

Contextual reminders: Provide clear cues that help the model retain and integrate context, such as reminding it of key themes or points covered earlier. Examples:
- Chapter reminder: "In the section on socio-economic impacts, it was mentioned that tourism is affected. Can you elaborate on how changes in coastal ecosystems influence tourism?"
- Reference reminder: "You mentioned that coral reefs are significantly affected by climate change. How does this impact the fish species that depend on these reefs?"

Establish confidence levels: Encourage LLMs to indicate their level of certainty or providing qualifiers about their responses. This helps to overcome their tendency to present everything as the gospel truth, and the fact that the Chunking & Summarisation approach employed by LLMs might not always capture nuances from across the entire document. Examples:
- Highlighting uncertainty: "If you're not certain about any part of your response, please indicate that."
- Qualifying responses: "How confident are you in the information provided about the impact of rising sea levels on coastal ecosystems?"

For document analysis with RAG-based systems, consider the following practices to improve the quality and relevance of responses:

Keywords and related terms: Include a variety of keywords, synonyms, and related terms to help the system understand the broader context and nuances of the query. This is helpful as RAG approaches rely on semantic search mechanisms to retrieve relevant sections of a document, and documents are sometimes written using various terminologies and phrases that may not always be picked with singular keywords alone. Examples
- Synonyms: “Tell me how climate change causes coral bleaching, using key words such as 'coral whitening' and 'reef bleaching'.”
- Related terms: “How does climate change cause coral bleaching? Consider related terms such as 'coral health,' 'reef degradation,' and 'coral ecosystem decline'."
References and citations: Since RAG involves retrieving and synthesising information from relevant sections across entire documents, most systems can also provide specific evidence and citations from documents. This can help a user to judge the reliability and accuracy of a response and to deep dive into specific references within the document to conduct further research. Examples:
- Evidence: "Provide me with information about the extent of rising sea levels, including evidence quoted in the document where these projections are discussed, to ensure the information's accuracy and reliability."
- Citations: "Find data on greenhouse gas emissions, including specific sources and citations from the document to verify the information and enable a deeper dive into the sources."

Many of the tips here are derived from prompt engineering best practices. If you are keen to explore prompt engineering further, please feel free to refer to this article that discusses how to improve prompts by adopting a step-by-step approach, and this post that explores how to prompt for creative brainstorming.

Current Limitations of AI-Enabled Document Analysis

While AI tools have made our lives significantly easier in relation to analysing documents, it is crucial to acknowledge the current limitations that can impact the effectiveness and accuracy of these systems.

There are still areas where AI struggles, particularly when dealing with complex queries and nuances, intricate data visualisations, very long documents, and analysis across multiple documents.

#1 Deep Nuances and Complex Queries

While AI models are excellent at processing straightforward information, they often falter when the subject matter requires deep domain expertise, and an understanding of subtle distinctions or intricate details.

For instance, having asked ChatGPT to analyse work contracts on multiple occasions, my partner, a lawyer by training, often points out that the model may be adept at surface level summaries and analyses, but often misses the deeper implications of specific legal terms or the nuanced interplay between different sections.

She also notes that ChatGPT may default to US legal advice in the absence of the user defining the legal jurisdiction in question, which is something those of us without legal training would not know to specify. Simply put, ChatGPT will probably not be replacing my partner’s legal advice anytime soon!

#2 Complex Charts and Graphs

Extraction of data from simple bar charts and tables is usually not an issue, and AI can usually handle these with relative accuracy. However, the technology still has some way to go in interpreting complex charts and graphs (e.g., multi-variable scatter plots, intricate network diagrams, layered pie charts).

Indeed many humans often find complex visual representations challenging, so its no surprise that AI continues to struggle. The problem lies in the inherent complexity of these visualisations, which often require contextual understanding and the ability to discern patterns that are not immediately obvious.

That being said, the technology is progressing rapidly in this regard, and I expect to see reliable solutions in the coming months.

3. Very Long Documents

AI also encounters significant challenges when tasked with analysing very long documents. Both the Chunking & Summarisation method employed by LLMs and RAG systems are limited by their ability to process a limited amount of information at any one time (a problem that we too face as humans!).

This can result in contextual disconnects and fragmented answers, as the AI struggles to integrate diverse pieces of information into a comprehensive narrative, an issue which is exacerbated with longer documents.

4. Difficulties with Multiple Document Analysis

AI models can also struggle with synthesising information across multiple documents. This limitation is particularly problematic in research scenarios where comprehensive understanding often requires drawing connections between multiple studies, reports, or articles.

Consider a scenario where a researcher needs to compile findings from several scientific papers to draw a comprehensive conclusion about a new medical treatment. AI might extract key points from each paper but could struggle to synthesise these points into a coherent narrative that accounts for varying methodologies, results, and interpretations across the sources.

Conclusions

Effectively analysing documents is essential for productivity and informed decision-making in today’s information-driven world. AI tools, such as LLMs, Document Chatbots, and Knowledge Assistants, offer innovative solutions to streamline and enhance this process.

Understanding their technical approaches, like Chunking & Summarisation for LLMs and Retrieval-Augmented Generation (RAG) for Document Chatbots and Knowledge Assistants, is key to utilising these tools effectively. Implementing best practices, such as providing clear prompts, referencing document structures, requesting structured information, and using iterative approaches, can significantly improve the outcomes of AI-enabled document analysis.

However, it is crucial to recognize the current limitations of these tools so as to set realistic expectations and helps users navigate potential challenges. AI may struggle with deep nuances, complex queries, intricate data visualisations, lengthy documents, and synthesising information across multiple sources. Despite these limitations, the technology is progressing extremely quickly and I have little doubt that many of the challenges will be overcome in the coming months or years.

If you haven’t already, dive in and start exploring these AI tools! By embracing them with a balanced understanding of their capabilities and constraints, you can revolutionize your document analysis processes, save time, and enhance the quality of insights derived from your documents.

Justin Tan is passionate about supporting organisations and teams to navigate disruptive change and towards sustainable and robust growth. He founded Evolutio Consulting in 2021 to help senior leaders to upskill and accelerate adoption of AI within their organisation through AI literacy and proficiency training, and also works with his clients to design and build bespoke AI solutions that drive growth and productivity for their businesses. If you're pondering how to harness these technologies in your business, or simply fancy a chat about the latest developments in AI, why not reach out?

New World Navigator