Lee Dallas

Last year's announcement for Opentext Magellan was accompanied by plenty of marketing videos and analyst opinions but what I would like to focus on is what I think is the most interesting implication that this wave of affordable Enterprise AI suggests: the death of unstructured content.

First some context. Forrester’s Mike Gualtieri presented a fantastic introduction to AI at Enterprise World. I highly recommend his blog. He led off this session by explaining that there are two very distinct classes of Artificial Intelligence:




Pragmatic AI represents that set of capabilities where algorithms can process data and predict outcomes based on models as well as or in most cases faster and with a higher degree of accuracy than humans. The practical application of machine learning, one of the components of pragmatic AI, is improving predictive computing. This is giving us such life-changing advancements as self-driving cars, natural language processing, and robotic process automation. The more data that is ingested, the more the system is able to improve its accuracy in predicting outcomes. It learns and improves without additional programming.

Pure AI is still something in the future. Pure AI does not simply process data. It perceives, processes and formulates new concepts or solutions based on conditions it was not expressly programmed to encounter and is essentially indistinguishable from human interaction. This is plainly the stuff of science fiction at this point. Fun to talk about but not something that is likely to impact your bottom line in the short term.

Pragmatic AI or cognitive computing as it now more commonly referred to is finding its way into the enterprise through offerings like IBMs Watson and the newest entry Opentext’s Magellan. As traditional practitioners of ECM though it is important to understand the implications of affordable AI in the enterprise.

Not tomorrow, and not next week but someday in the relatively near future, AI will make unnecessary a term many have built their entire career on. AI will one day kill Unstructured Content. As far as I am concerned- good riddance.

To be fair, what dies is not the content itself but rather the distinction. I have long complained about the term Unstructured data. I have never liked it because while it had a practical purpose in guiding how to build a system twenty years ago, it always seemed like you were giving up. The “content”(data), whether it be a letter or a contract or a policy did not neatly fit into a row and a column therefore you give up and treat it as blob with metadata. A file. It is still data. What does it matter if the data is a table or a document and why should there be less discipline in managing it or capability expected from it. Unstructured data always had lower expectations because it is “just files.” At least that is what we thought until we figured out really important stuff was buried in them.

Surrendering to the “idea” of unstructured data had implications of its own. While you can get business value from tracking state and transition of the entity, eventually whole industries developed around trying to understand the data inside the files themselves. SGML and later XML reinserted structure into file formats but that proved too hard and bulky. OCR too was the magic bullet that always missed its target in the eyes of the customer. ECM exists as a practice largely because dividing data into these categories was necessary to manage the sheer scale of content created by any organization of reasonable size. It was necessary because of a machine’s limited ability to understand information that had not been pre-interpreted into fixed contexts (tables, records and fields – oh my).

Enter artificial intelligence. At the lowest level, predictive algorithms have been creeping into the processing of documents for years in capture technologies. We are at a point of critical mass however where the machine can now “know” what a document is, understand what it says, recognize errors, infer corrections and decide what actions to take all without a human in the process to interpret those conditions and make those decisions. Document-centric transactions have progressively diminishing dependence on humans for error correction and these forms of work start to be assumed by Robotic Process Automation. This frankly was the easy part.

The ability of AI to analyze big data sets if applied equally to tables and content changes everything. The fact that data is structured or unstructured stops being important if you have a unified means of interpreting that data, extracting insight and taking action.

The next level of advancement is when documents cease to be necessary at all. To an AI documents are simply containers. Multiple vendors, for example, are leveraging pragmatic AI processes to not only assist in contract creation but also to ingest contracts and extract the terms and conditions on their own. Eventually, integrations can then implement the contract in an ERP system without humans (or lawyers) ever having to be involved.

With this level of intelligence, why does the contract file itself need to be managed. You can enforce, track, and control cost of the instrument at the term level providing more discreet control of the business relationship. When all of the data within a file is interpretable by the machine the container itself is rendered obsolete. Extend this to the purchase order or the invoice and again the container matters less than the individual line item records.

Contracts are in fact some of the most structured documents we manage. It makes sense this is one of the first document classifications to be affected by this “deconstruction” and transformed into smart contracts. Once you give up the idea of unstructured data across the enterprise data set you can start to see the implications of technologies like block chain can have on how we actually build these systems in the near future.

Documents as a discreet, trackable unit of work may always be convenient. We are approaching tipping point though where documents as a concept stops being needed by the machines. It is the humans that need the container because we simply cannot deal with data at the level of analytical resolution that an AI can. We must be careful not to let paradigms like document limit the progress of efficiency in design. With this change the distinction of unstructured versus structured becomes obsolete and we need to take advantage of it. Raise your expectations. Expect more from this data.

Expecting more from your data is where AI leads you. To be successful in getting business value from the content we manage, we need to start focusing on a new skill. Instead of simply managing the creation through deletion lifecycle, the skill of asking questions about data will grow in importance.

Unstructured Content Is Dead – Long Live the Question.





About The Author

Lee Dallas

Global Presales Programs Leader, OpenText

View Profile