WFR(ee) Things A Customer Can Do To Improve Extraction

By: William Phelps | Senior Technical Architect

 

When using Oracle Forms Recognition(“OFR”)  or WebCenter Forms Recognition(“WFR”) with the Oracle Solution Accelerator or Inspyrus, clients often engage consulting companies (like TekStream) to fine-tune extraction of invoice data.  Depending on the desired data to extract from the invoice, the terms “confidence”, “training”, and “scripting” are often used in discussing and designing the solution.  While these techniques justifiably have their place, it may be overkill in many situations.

Chances are, if you are reading this article, that you may already be using WFR.  However, the extraction isn’t as good as desired.  You may have been using it for quite a while, with less-than-optimal results.

In reality, there are several no-cost options that a customer can (and should) perform before considering ANY changes to a WFR project file or attempting to bring in consulting.  This approach is the “don’t step over a dollar to pick up a dime” approach.  Many seemingly impossible extraction issues are truly and purely data-related, and in all likelihood, these basic steps are going to be needed anyway as part of any solution.  There is a much greater potential return on investment by simply doing the boring work of data cleanup before engaging consulting.

The areas for free improvement should begin by answering the following questions:

  1. Does the vendor address data found in the ERP match the address for the vendor found on the actual invoice image?
  2. Is the vendor-defined in the ERP designated as a valid pay site?
  3. In the vendor data, are intercompany and employee vendors correctly marked/identified?
  4. Do you know the basic characteristics of a PO number used by your company?
  5. Are the vendors simply sending bad quality invoice images?

Vendor Address Considerations

The absolute biggest free boost that a customer can do to increase extraction is to actually look at the invoice image for the vendor, and compare the address found on the invoice to the information stored in the ERP.  WFR looks at the zip code and address line as key information points.  Mismatches in the ERP data will lower the extraction success rate.  This affects both PO and non-PO invoice vendor extraction from an invoice.

To illustrate this point at a high level, let’s use some basic data tools found within the Oracle database for testing.  The “utl_match” packages will work to get a basic feel for how seemingly minor string differences can affect calculations.

Using utl.match.edit_distance_similarity in a simple query, two strings can be compared as to how similar the first string is to the second.  A higher return value indicates a closer match.

  • This first example shows the result when a string (“expresso”) is compared to itself, which unsurprisingly returns 100.

  • Changing just one letter can affect the calculation in a negative direction. Here, the second letter of the word is changed from an “x” to an “s”.  Note the decrease in the calculation.

  • The case in the words can matter to a degree as well for this comparison. Simply changing the first letter to uppercase will result in a similar reduction.

  • Using the Jaro Winkler function, which tries to account for data entry errors, the results are slightly better when changing from “x” to “s”.

Let’s now move away from theory.  In more of a real-world example, consider the following zip code strings, where the first zip code is a zip + 4 that may be found on the invoice by WFR, and the second zip code is the actual value recorded in the ERP.

In the distance similarity test, the determination is that the strings are 50/50 in resemblance.

However, Jaro Winkler is a bit more forgiving.  There is a difference, but it’s closer to matching both values.

The illustrations above are purely representative and do not reflect the exact process used by WFR to assign “confidence”.  However, it’s a very good illustration to visually highlight the impact of data accuracy.

The takeaway from this ERP data quality discussion should be that small differences in data between what appears on the invoice compared to the data found in the ERP matters.  This data cleanup is “free” in the sense that the customer can (and should) undertake this operation without using consulting dollars.

Both the Inspyrus and Oracle Accelerator implementation of the WFR project leverage a custom vendor view in the ERP.

  • Making sure this view returns all of the valid vendors is critical for correct identification of the vendor. A vendor that is not found in this view cannot be found by WFR – plain and simple since the WFR process collects and stores the vendor information for processing.
  • Also, be sure in this view to filter out intercompany and employee vendor records. These vendor types are typically handled differently, and the addresses of these kinds of vendors typically appear as the bill-to address on an invoice.  Your company address appearing multiple times on the invoice can lead to false positives.
  • In EBS, there is a concept of “pay sites”. A “pay site” is where the vendor/vendor site combination is valid for accepting payments and purchases.  Be sure to either configure the vendor/vendor site combination as a pay site, or look to remove the vendor from the vendor view.

PO Number Considerations

On a similar path, take a good look at your purchase order number information.  WFR operates on the concept of looking for string patterns that may/may not be representative of your organization’s PO number structure.  For example, when describing the characteristics of your company’s PO numbers, these are some basic questions you should answer:

  • How long are our PO numbers? 3 characters? 4 characters? 5 or more characters? A mix?  What is that mix?
  • Do our PO numbers contain just digits? Or letters and digits? Other special characters?
  • Do our PO numbers start with a certain sequence? For example, do our PO numbers always start with 2 random letters? Or two fixed letters like “AB”? Or three characters like “X2Z”?

Answering this seeming basic set of questions allows WFR to be configured to only consider the valid combinations.

  • By discarding the noise candidates, better identification and extraction of PO number data can occur.
  • More accurate PO number extractions can lead to increased efficiency inline data extraction, since the PO data from the ERP can be leveraged/paired, and can lead to better vendor extraction since the vendor can be set based on the PO number.

Avoid trying to be too general with this exercise.  Trying to cast too wide of a net will actually make things worse.  Simply saying “our PO numbers are all numbers 7 to 10 digits long” will result in configurations that pick up zip codes, telephone numbers, and other noise strings. If the number of variations is too many, concentrate on vendors using the 80/20 rule, where 80% of the invoices come from 20% of the vendor base.

General Invoice Quality

Now, one might think “I cannot tell the vendor what kind of invoice to send.”  That’s not an accurate statement at all.  If explained correctly, and provided with a proper incentive, the vendor will typically work to send better invoices.  WFR is very forgiving, but not perfect, and looking at the items in the following list will help.

  • Concentrate initially on the vendors who send in high volumes of invoices.
  • Make sure the invoices are good quality images containing no extra markings on the image that is covering key data, like PO numbers, invoice numbers, dates, total amount, etc.
  • Types of marks could be handwriting, customs stamps, tax identification stamps, mailroom stamps, or other non-typed or machine-generated characters. Dirty rollers on scanners can leave a line across the image.

Hopefully, this article will give an idea of the free things that can be done to increase the efficiency of WFR.

Want to learn more? Contact us today!