FAQs

Got questions? We have all answers for you.

General Information
Text2Extract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables.
 Text2Extract currently supports PNG, JPEG, and PDF formats. For synchronous APIs, you can submit images either as an S3 object or as a byte array. For asynchronous APIs, you can submit S3 objects. If your document is already in one of the file formats that  Text2Extract supports (PDF, JPG, PNG), don't convert or downsample it before uploading it to  Text2Extract.
The most common use cases for  Text2Extract include:
  • Import Documents and Forms into Business Applications
  • Create Smart Search Indexes 
  • Build Automated Document Processing Workflows
  • Maintain Compliance in Document Archives
  • Extract Text for Natural Language Processing (NLP)
  • Text Extraction for Document Classification