Problems Using Text Extraction Libraries in Next.js API Routes
Unanswered
James!! posted this in #help-forum
James!!OP
Hey everyone, I'm having trouble getting text extraction libraries to work inside a Next.js API route. I've tried multiple libraries, including office-text-extractor and textract, but I keep running into different issues.
What I'm Trying to Do
I'm building a file upload API in Next.js where users can upload documents (PDFs, Word, Excel, etc.), and I need to extract text from them.
Problems I'm Facing
With textract:
I get errors related to missing modules or unsupported server imports when trying to use textract.fromBufferWithMime().
With office-text-extractor:
I run into issues with missing keyv storage adapters (like @keyv/sqlite, @keyv/redis, etc.), and even after installing them, I still face module resolution errors.
General Issues:
Some libraries try to use Node.js built-in modules that don’t work well in a Next.js serverless environment.
Others have trouble resolving dynamic imports when used inside an API route.
What I Need Help With
Has anyone successfully used text extraction libraries in a Next.js API route?
Is there a recommended way to handle this in a serverless environment?
Any alternative libraries that work better with Next.js?
Would really appreciate any advice or workarounds! Thanks in advance.
What I'm Trying to Do
I'm building a file upload API in Next.js where users can upload documents (PDFs, Word, Excel, etc.), and I need to extract text from them.
Problems I'm Facing
With textract:
I get errors related to missing modules or unsupported server imports when trying to use textract.fromBufferWithMime().
With office-text-extractor:
I run into issues with missing keyv storage adapters (like @keyv/sqlite, @keyv/redis, etc.), and even after installing them, I still face module resolution errors.
General Issues:
Some libraries try to use Node.js built-in modules that don’t work well in a Next.js serverless environment.
Others have trouble resolving dynamic imports when used inside an API route.
What I Need Help With
Has anyone successfully used text extraction libraries in a Next.js API route?
Is there a recommended way to handle this in a serverless environment?
Any alternative libraries that work better with Next.js?
Would really appreciate any advice or workarounds! Thanks in advance.