Mindee - a simple invoice data extractor

Mindee invoice OCR API- using Python3

MINDEE LOGO

Read other Mindee related posts


A software that I found after a long search, including online, offline solutions. I needed this PDF scraping/data extraction tool using AI based services accessed through an API.
In addition to the many useful features of mindee, a repository of predefined document data retrieval models is available through an AI-based API, which registered users can access for free - with a limited number of pages per month, of course.
Among the several "off-the-shelf" solutions (OCRs: invoice, receipt, passport, license plates), the use of the invoice API is presented below. The code is infinitely simple, as mindee's well-constructed website is a great help for using the service, including sample codes. This makes it infinitely simply to use.
What is needed:
- python (or other languages see below) installed, easier if an IDE (Integrated Development Environment) helps in coding, I used Spyder provided in the Anaconda distribution
- mindee registration (see above)
- mindee python modul installed (see below)
- files (preferably in pdf format) to scratch, or the mindee invoice sample file 
- reliable internet connection as the files are uploaded to mindee server, processed and then the extracted data is sent back to your application. 

The mindee documentation on using invoice file data extraction starts at the installation of the required Python module (in the system prompt window):

pip install mindee
However I have Anaconda distribution, which uses conda package manager, I was forced to use pip to install mindee module. That was the only drawback of the whole project, never mind!
The installation is a relatively fast process (of course this also needs an internet connection!).

As usual, API services require a user account; including a username and the website (mindee system) generated API key to access the tool.
Pricing was a tempting offer: free up to 250 pages (not files!) of usage per month. See Mindee website for details!
To receive an API key select the required API from the several offered "off-the-shelf" solution, in this case the invoice API. Click on the Invoice card in the APIs Store
This is added to your APIs:

Now you only need to copy the API key and username from the website. 

It is important to note that there are examples (codes) in the following languages for all APIs, including the invoice OCR case: Python3, Node.js, Ruby, Java, .NET, PHP,
so if you are rather interested in one of those, go ahead and check their website!

The provided code that only needs your API key and the filepath to fill in the right place that I highlighted.

from mindee import Client, PredictResponse, product

# Init a new client
mindee_client = Client(api_key= "YOUR32CHARACTERAPICODEHERE")

# Load a file from disk or from local network
input_doc = mindee_client.source_from_path('FILEPATH')

# Load a file from disk and parse it.
result: PredictResponse = mindee_client.parse(product.InvoiceV4, input_doc)
As it is defined in the original code "The endpoint name must be specified since it cannot be determined from the class." It is not mentioned but defining product.InvoiceV4 is not enough but you really have to start the invoice API services on the website as I indicated above. Otherwise in spite of the correct python code the process returns with error due to unreachable service.
If everything goes fine then printing out the whole document or printing the "predictions", meaning the found texts in the uploaded document, according to the invoice API:

# Print a summary of the API result
print(result.document)

# Print the document-level summary
print(result.document.inference.prediction)

The output includes (if are declared on the invoice):

  • Invoice Number, Purchase Date, Due Date,
  • amounts as Total Net, Total Amount, Total Tax
  • Supplier data as Name, Company Address
  • Customer data as Name, Company Address

For compatibility and other details, check the Release notes session.

No comments:

Post a Comment

Snowflake universe, part #6 - Forecasting2

Forecasting with built-in ML module Further posts in  Snowflake  topic SnowFlake universe, part#1 SnowFlake, part#2 SnowPark Notebook Snow...