Everything about text recognition on PDF invoices with SmartScan

Everything about text recognition on PDF invoices with SmartScan

Smartscan is an advanced AI-driven text recognition technology that makes processing PDF invoices in the Spend Cloud faster and easier. Whether an invoice is uploaded manually or added via the mailbox, Smartscan automatically reads various invoice details and enters them into the encoding menu. This minimizes manual entry and significantly speeds up the processing process.

Smartscan recognizes the following data on a PDF invoice: the creditor, total amount, invoice date, cost center number/BRIN (if enabled)*, and payment reference. Additionally, it recognizes the following data (depending on the active modules): purchase order number, contract number, and commitment number.

Alert
The recognition technology performs best with alphanumeric codes (such as BRIN). For cost centers consisting exclusively of numbers, the reliability of recognition on PDF invoices may decrease.  This is because numeric codes are less unique and more quickly interpreted by the software than for other numerical data, such as an invoice number, item number or CoC number. We therefore recommend using this only when working with BRIN.  

Smartscan offers various advantages:
  1. Optimized for busy periods:  the time required to extract invoice data has been significantly reduced. As a result, there are no waiting times during busy periods, such as at the beginning of the month. Invoice processing has become faster and more scalable thanks to an improvement in how our software systems communicate with each other.
  1. AI-driven text recognition:  the new Smartscan technology utilizes artificial intelligence (AI), offering opportunities for future innovations. This leads to improved invoice recognition and further optimization, in contrast to previous systems without future expansion capabilities (read more below about the feedback loop within Smartscan).
  1. Efficiency improvement:  Smartscan processes more than 10 million documents per month and eliminates a large portion of manual data entry. It is already used in various systems, making it a proven solution. Examples include eAccounting, Dinero, Visma e-conomic, PowerOffice, Visma Raet, and mobile apps such as Visma Attach and Visma Employee.

Read more here about exactly how Smartscan works and discover additional information about the capabilities and benefits of this text recognition technology.  

With this technology, various fields can be automatically extracted from your organization's invoices. This increases the accuracy of the extraction and drastically reduces manual operations.

Why is the invoice not being read correctly by the system?

The Smartscan-feedbackloop

When an invoice arrives, Smartscan automatically recognizes various details, such as the IBAN number and the invoice number. The user codes the invoice and verifies that all information is correct. As soon as the invoice is exported, the corrected values are sent back to SmartScan as feedback. This feedback helps Smartscan recognize patterns and become increasingly accurate in reading invoice data. As a result, the process becomes not only faster but also increasingly reliable. Please note that this process may take time, the more feedback the system receives, the faster this should go, but it relies heavily on the amount of invoices that are processed by our system. Our Support department is unable to influence this process.

Info
This article is about the recognition of PDF invoices. Have you received an invoice in XML format that is not being read correctly? Read  this article  about which fields the Spend Cloud reads by default from an XML invoice and what you can do if the desired field is not listed.
When adding invoices, the Spend Cloud uses SmartScan recognition to read all information on the invoice and fill in relevant data for the invoice coding. A number of predefined characteristics are used for this purpose, such as the characteristic that an amount always consists of numbers and a comma to indicate decimals, or that an amount is always preceded by a currency symbol. The data on an invoice does not always conform to these generic characteristics, making them sometimes difficult to recognize. This is primarily because the structure of invoices varies widely and data cannot always be identified by general characteristics. For instance, recognizing the correct invoice numbers/payment references is laborious, as these can consist of numbers, letters, or no fixed number of characters. Furthermore, there are no uniform agreements regarding the formatting of an invoice. Although a purchase invoice is required to include certain items, such as an invoice date and payment reference, there is also freedom to determine how and where this is stated on the invoice.

If you notice that certain data is not recognized on the PDF, the following points are important to take into account: 
  1. Low PDF quality and the invoice font can cause data to be incorrectly recognized. It is therefore always important to verify the accuracy of the recognized data.
  2. The Spend Cloud only reads the first and last pages of an invoice . If, for example, the total amount is on an intermediate page, it will not be recognized. We advise instructing the creditor to place the total amount on the first or last page.
Below is a checklist per piece of data and a possible reason why the data is not recognized.

FieldPossible causes of failed recognition
Creditor
  1. Quality : Poor readability of IBAN, Chamber of Commerce or VAT number (often in small fonts or footers).
  2. Configuration : Data is missing or outdated in the Spend Cloud master data.
  3. Duplicate data : The IBAN is registered with multiple creditors, preventing the system from making a unique choice.
Invoice date
  1.  Naming convention : Use of the general word 'Date' instead of 'Invoice date'.
  2.  Overload : Presence of other data (delivery date, order date, due date) that appear more prominently on the invoice.
Invoice number/payment reference
  1. Naming convention : Use of vague terms such as 'Invoice' or 'Reference'.
  2. Layout : The number is not logically positioned relative to the description, causing the system to miss the context.
Total amount
  1. Naming convention : Absence of terms such as 'Total payable' or 'Invoice amount'.
  2. Calculations : Presence of discounts or set-offs below the total line, causing the final amount to differ from the sum.
  3. Layout : The amount is not on the same line height as the description.
VAT-percentage
  1. Symbolism : Absence of the '%' sign or the explicit mention of 'VAT'.
  2. Configuration : Use of percentages not configured in Spend Cloud (default is 0%, 9%, or 21%).
Payment terms
  1. Lack of clarity : No concrete timeframe (number of days) mentioned.
  2. Overload : Mention of multiple time limits or reminder trajectories (e.g. "reminder after 14 days, dunning notice after 30 days"), which causes confusion in the logic.

Idea
For recurring invoices, we recommend using the templates feature. This way, you can still code these invoices quickly. Read   more about using templates here .

When no data is recognized on the invoice


It may happen that the scan and recognition process fails for an invoice. Previously, these invoices would get stuck and were eventually permanently rejected by the system. Instead, these invoices will be placed in the coding menu. This allows you to process the invoice by entering the details yourself. You will also see a message on the invoice indicating that no details have been entered because the scan and recognition process failed. There may be various reasons why this occurs; we are investigating these reasons and are continuously working on solutions to reduce this failure. You can always try to reapply the recognition using the three dots next to the invoice in the coding overview, or the general 'reapply text recognition' button in the Coding menu item, to do this for multiple invoices at once.


InfoPlease note: if you previously used the old view of the encoding menu, you will now notice changes. Previously, only the first three pages of the PDF were displayed as an image, including markings for recognized fields and the document number. In some cases, however, this led to incorrect displays, for example with protected PDFs. To resolve this, the original, complete PDF is now displayed in the encoding menu, without markings or compression. This allows you to view all pages directly in the PDF viewer, without any obstructions. Read more information about this  in the DIRK article.


    • Related Articles

    • Everything About Coding Invoices

      Everything About Coding Invoices In the Encode menu, all invoices that can be processed (coded) are displayed. By clicking on the row or the pencil icon in the overview, you will be taken to the page where you can code the invoice details. How are ...
    • Everything about receiving and processing invoices via the mailbox

      To add an invoice into Spend Cloud, you can utilize either the Mailbox or the Add feature. In the Mailbox of Invoice Processing, you will find all the invoices received through the designated email address. Attachments in PDF, XML or UBL formats can ...
    • Updating disapproved invoices and definitely rejecting invoices

      Invoices in the Spend Cloud can be rejected. Consider, for example, an invoice addressed to the wrong budget holder or with an incorrect coding. The reviewer or the central administration (with the appropriate permissions) can then reject an invoice. ...
    • Everything about managing and optimizing the mailbox

      All emails sent to the Spend Cloud Mailbox email address are stored in the Mailbox. By redirecting invoices from your own accounts payable mailbox to this email address, all invoices are collected in one place. From the application management ...
    • How can I process invoices automatically?

      In Spend Cloud, you can optimize the invoice processing workflow in several ways. By streamlining and automating this process wherever possible, you can expedite invoice processing. One important feature that can assist in this regard is ...