CIGNEX and Relevance Lab have joined forces. Learn more at Relevance Lab

Case Study

Automating Document (Text) Classification using Machine Learning

Solutions	Industry	Technology	Expertise Delivered
AI & Ml	Media & Publishing	Apache Tika, D4lj, Mallet (Naïve Baiyes), Tensorflow	Consulting, Development

Client Overview

Global Information Service Provider and Publishing Company with over 15,000 employees, active in across 150 countries offering expertise in Health, Tax, Accounting, Governance, Risk and Compliance

Business Need

Their current classification process of documents was manual, error prone and not scalable. With 10M documents and 36 target categories they wanted an intelligent classification model.

Key Features

Parser (Apache Tika + Custom)

Input: XML files / Output: Text files
Content detection and analysis framework
Customized to process ‘Header’ section

Classifier (DL4J, Naïve Baiyes and TensorFlow)

Evaluated machine learning and neural network tools
Create a model and run a test set (with different ratios of training : test set)

Reviewer

Audit logs, docs parsed and outliers (enrich feature set, training set, classify outliers)
Feedback loop back to the Classifier

Custom Reviewer

To capture results of classification iterations in terms of accuracy.
Review unclassified documents (Outliers)
Enhance external feature set and enrich training set
Review accuracy trends, performance etc.

Benefits

High level of accuracy using Naïve Baiyes (>95%). Accuracy to be further enhanced using external feature set
Scalable, accurate solution design with high performance

Download Case Study

Math question 4 + 3 =

Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.

CIGNEX is a global consulting company offering solutions, services and platforms on Open Source, Cloud and Automation technologies. Since 2000, CIGNEX has been delivering enterprise class solutions, which are built using leading platforms & can easily be integrated with existing systems to achieve unparalleled results. By leveraging multiple delivery models, we help organizations around the world to increase revenue, achieve business goals, gain competitive advantage, and maximize customer satisfaction while significantly reducing the cost of doing business.

As a leading System Integrator, CIGNEX provides end-to-end services on Liferay | UiPath | Kafka – Confluent | Sitecore | Red Hat | Appian | Salesforce | Servicenow | MongoDB | Drupal

For any questions, RFP or to get in touch, you can email us at info@cignex.com

CIGNEX and Relevance Lab have joined forces. Learn more at Relevance Lab

Solutions

Services

Staffing Services

Resources

About Us