Text2Extract - Extract Text From Any Document

FAQs

Got questions? We have all answers for you.

General Information

What is Text2Extract?

Text2Extract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables.

What document formats does Text2Extract support?

Text2Extract currently supports PNG, JPEG, and PDF formats. For synchronous APIs, you can submit images either as an S3 object or as a byte array. For asynchronous APIs, you can submit S3 objects. If your document is already in one of the file formats that Text2Extract supports (PDF, JPG, PNG), don't convert or downsample it before uploading it to Text2Extract.

What are the most common use cases for Text2Extract?

The most common use cases for Text2Extract include:

Import Documents and Forms into Business Applications
Create Smart Search Indexes
Build Automated Document Processing Workflows
Maintain Compliance in Document Archives
Extract Text for Natural Language Processing (NLP)
Text Extraction for Document Classification