What is Document AI and how does it work?

What is Document AI?

Document AI is a powerful AI technology that extracts meaningful data from unstructured documents such as invoices, receipts, scanned documents, images etc. It enables computers to understand and analyze the contents of documents in a similar way to how humans do and outputs structured information that is understandable by human and other computer programs for further processing. Document AI is an extremely useful technology for the organizations who want to automate document intensive

What is the difference between OCR and Document AI?

In a nutshell, OCR(Optical Character Recognition) extracts and returns unstructured text from scanned documents, however Document AI gives us meaningful information from images and scanned documents in structured form.

OCR is a technology that identifies and extracts text word by word or line by line from images or scanned documents. On the other hand, Document AI is a combination of technologies including machine learning and OCR that understands the context of documents and it goes beyond the OCR by applying machine learning algorithms to recognize patterns and elements within documents, such as tables, paragraph and extract meaningful information that can be process further either by human or by another computer program. It is an advanced technology that brings insights from documents for decision making and helps to automate document-based workflows.

Document AI is an ultimate solution for processing unstructured documents and automating document-based workflows.

How does the Document AI work?

Document AI uses AI and machine learning to understand the context of documents like a human. It can efficiently do various tasks such as Document Image Classification, Document Layout Analysis, Table detection etc. within a document.

What is Document image classification?

When we scan a document or make an image of a document, we call it a “Document Image”. Document AI can understand what type of document-image it is. It is an invoice, or a receipt, or a contact, or other type of documents. It does it by taking input of an image and classifies it with an AI model. You may need to train the AI model with a variety of document types before it does this classification

What is Document layout analysis?

Document layout analysis is a technique that is used to analyze the visual layout of a document and identify individual building blocks that form the document and separate them to process further. It does it by understanding the relationships between different elements such as text, images, tables, graphs etc. within a specific document type.

Building blocks or segments of a document can be text segments, headers, paragraphs, tables, graphs etc. If we are processing an invoice, vendor information can be a segment and line items can another one. Similarly on a contract, payment terms can be a segment.

The document layout analysis can be done with different AI models such as LayoutLMv3 or DiT (Document Image Transformer). Both these models use the Mask R-CNN framework for object detection within documents as a backbone.

These models identify different segments/building-blocks in a document and output a set of segmentation masks or bounding boxes, along with class names.

What is Table detection in a document?

Having tabular data within documents is common and OCR does not work properly to extract data from tables. Document AI solves this problem. It identifies the tables within a document and extracts data with greater accuracy. The first task of the AI model is to locate the table within the document and then it understands the structure of the table such as identification of the rows, columns and cells of the table. The next step is to do the Functional Analysis (FA) of the table that recognizes the key-value relationships within the table.

The AI model (transformer) called TATR(table transformer) which is available on Github can be used to extract tables from documents.

What are the use cases of Document AI

Document AI can be used to automate any document intensive processes and manual data entry. Here are some use case –

Finance and procurement:

Document AI is widely used to process invoices, receipts, financial statements.
Tax documents, balance sheets and other financial documents.
Challans and money receipts processing.

Banking and lending and insurance:

Customer identity documents such as passport, SSN, driver’s license, photo id proof, address proof and other documents submitted by customers can be processed.
Loan and mortgage documents, PDF forms and other related documents.
Annual reports, balance sheets, Tax documents etc.
Insurance documents, claim forms, claim documents, etc.

Contracts and legal documents:

Relevant data such as payment terms, expiration date, contract period etc can be extracted and processed from Contracts and agreements.
Rental bills, money receipts, lease agreements, deeds etc.

HR and Employee management:

Resumes, educational certifications, experience certificates etc. can be processed to extract the right data that are needed.
Employee tax documents, investment proofs such as insurance documents, investment documents can automatically processed to calculate tax or TDS (tax deduction at source)

Document AI in Government:

Tender documents, supporting documents, company documents etc.
Permit and approval documents processing.

In a nutshell, document AI is a very useful technology for automation. It is used along with the RPA technology for end-end-end automation.