Skip to main content Business Need:
- Lack of ability to extract content & metadata from images (GIF, PNG, JPG)
- Unstructured input/ output format
- Need to build rules/ training set along with Self Learning ability
Key Features:
- Apache Tika to extract complex image content and metadata.
- Content Storage in a scalable NoSQL repository – MongoDB
- Entity Extraction / NLP through IBM Alchemy
- Continuous self learning & Machine Learning through Mahout for improving accuracy
Benefits:
- Ability to extract data from 100+ disparate file formats (GIF, PNG, JPG)
- Accuracy in metadata extraction / output improved by 70% - 86%
- Automated 95% of the review processes