Documents frequently remain the lifeblood of business processes. They get created, circulated, adjusted, forwarded and, in many cases, lost within giant directories.

Solving the Document Pipeline Problem

Even a well-designed document pipeline suffers from attrition. A pipeline may suffer from dark spots, where documents are stored within the system but, for whatever reason, cannot be searched for and located when needed. Worst of all is a lack of document processing pipeline altogether. Without processes and functional systems for ingesting, storing and discovering documents, your organisation is effectively operating in the dark.

There is room in this process for an AI/machine learning categorisation system to function, providing value through its capacity to make sense of large volumes of unstructured data. Before that point, though, it is important to understand the elements that make up a modern document discovery solution.

The Document Ingestion Process

Whether or not documents are being scanned in or are completely digital, the fundamental issue remains the same, the fact that the data is usually semi-structured or unstructured. Even if there are standardised fields that are relatively consistent, there is still upfront processing that is required in order to make the data usable.

This can be done with an ELT process, the aim of which is to comprehensively transform the data in order to get it into a structured, relational format. This data can then be stored in a data warehouse and utilised for analytics in a presentation layer. However, an alternative is to retain the documents in their original formats, instead building a rich layer of metadata around them.

Loome takes this second approach in order to build a document search and discovery system which is easier and less disruptive to establish within an existing document pipeline. It is the perfect option for situations that call for documents to be preserved in their original formats. During the ingestion process, Loome utilises Azure Computer Vision to perform Optical Character Recognition (OCR). Because of this, documents with imperfections and handwriting are processed without issue. Additionally, voice-to-text Cognitive Services can be used to feed in speech audio and have it be handled within the same pipeline as more conventional documents.

Enabling Document Discovery

There are two steps to effectively build up the relevant and usable metadata around a document. The first of these is format identification to determine how it should be parsed. The second is keyword and phrase tagging which creates the rich context enabling the document to be discovered and processed effectively. Loome allows you to easily implement routing rules and apply tags all within a single web-based interface.

Keyword Tagging for Machine Learning

Although manual tagging through a central portal can already be an incredibly powerful document pipeline tool for many organisations, the ability to automate this process opens up entirely new horizons. Manual tagging performed on documents is essentially like the tagging that is required to build a machine learning model. Because of this, a modern document discovery system is perfectly suited for the implementation of machine learning classification functionality.

Once a certain volume of documents have been manually tagged, Loome allows that tagged data to be easily rolled into a model for Azure Machine Learning Services. All of this is still done within the same single web portal. This model can then be implemented in order to automate the process of categorising documents, providing companies with a solution which can break through the most congested document bottlenecks. The document pipeline can then be augmented further and made even more accessible with tools from Loome, such as the ability to use natural language query to search for particular documents or key statements.

How Loome can Help

Loome Assist is an AI powered multi-purpose enterprise problem solving tool. One of its features is the ability to easily set up Machine Learning powered document routing pipelines, tagging and setting up rules all within a single intuitive interface.