Recommended tech stack for a web-based document OCR system (React/Next.js + FastAPI?)
Unanswered
African Slender-snouted Crocodil… posted this in #help-forum
African Slender-snouted CrocodileOP
I’m designing a web-based document OCR system and would like advice on the appropriate frontend, backend, database, and deployment setup.
The system will be hosted and will support two user roles: a general user who uploads documents and reviews OCR results, and an admin who manages users and documents.
There are five document types. Two document types have varying layouts, but I only need to OCR the person’s name and the document type so it can be matched to the uploader. One document type follows a two-column key–value format such as
For the frontend, I am most familiar with React.js and Next.js. I prefer using React.js with shadcn/ui for building the UI and handling user interactions such as file uploads and OCR result editing.
For the backend, I am considering FastAPI to handle authentication, file uploads, OCR processing, and APIs. For my OCR, I am thinking of using PaddleOCR but I am also open to other recommendations. And also searching for other OCR tools for my usecase.
My main questions are:
* Is React.js with shadcn/ui a good choice for this type of application, or would Next.js provide meaningful advantages?
* Is FastAPI suitable for an OCR-heavy workflow that includes file uploads and asynchronous processing?
* Are there known deployment or scaling issues when using Next.js (or React) together with FastAPI?
* What type of database would be recommended for storing users, document metadata, OCR results, and corrected values?
I’m trying to avoid architectural decisions that could cause issues later during deployment or scaling, so insights from real-world experience would be very helpful.
Thanks in advance.
The system will be hosted and will support two user roles: a general user who uploads documents and reviews OCR results, and an admin who manages users and documents.
There are five document types. Two document types have varying layouts, but I only need to OCR the person’s name and the document type so it can be matched to the uploader. One document type follows a two-column key–value format such as
First Name: John. For this type, I need to OCR both the field label and its value, then allow the user to manually correct the OCR result if it is inaccurate. The remaining document types follow similar structured patterns.For the frontend, I am most familiar with React.js and Next.js. I prefer using React.js with shadcn/ui for building the UI and handling user interactions such as file uploads and OCR result editing.
For the backend, I am considering FastAPI to handle authentication, file uploads, OCR processing, and APIs. For my OCR, I am thinking of using PaddleOCR but I am also open to other recommendations. And also searching for other OCR tools for my usecase.
My main questions are:
* Is React.js with shadcn/ui a good choice for this type of application, or would Next.js provide meaningful advantages?
* Is FastAPI suitable for an OCR-heavy workflow that includes file uploads and asynchronous processing?
* Are there known deployment or scaling issues when using Next.js (or React) together with FastAPI?
* What type of database would be recommended for storing users, document metadata, OCR results, and corrected values?
I’m trying to avoid architectural decisions that could cause issues later during deployment or scaling, so insights from real-world experience would be very helpful.
Thanks in advance.
8 Replies
@African Slender-snouted Crocodile I’m designing a **web-based document OCR system** and would like advice on the appropriate **frontend, backend, database, and deployment setup**.
The system will be hosted and will support **two user roles**: a general user who uploads documents and reviews OCR results, and an admin who manages users and documents.
There are **five document types**. Two document types have varying layouts, but I only need to OCR the person’s name and the document type so it can be matched to the uploader. One document type follows a two-column key–value format such as `First Name: John`. For this type, I need to OCR both the field label and its value, then allow the user to manually correct the OCR result if it is inaccurate. The remaining document types follow similar structured patterns.
For the **frontend**, I am most familiar with React.js and Next.js. I prefer using **React.js with shadcn/ui** for building the UI and handling user interactions such as file uploads and OCR result editing.
For the **backend**, I am considering **FastAPI** to handle authentication, file uploads, OCR processing, and APIs. For my OCR, I am thinking of using **PaddleOCR** but I am also open to other recommendations. And also searching for other OCR tools for my usecase.
My main questions are:
* Is React.js with shadcn/ui a good choice for this type of application, or would Next.js provide meaningful advantages?
* Is FastAPI suitable for an OCR-heavy workflow that includes file uploads and asynchronous processing?
* Are there known deployment or scaling issues when using **Next.js (or React)** together with **FastAPI**?
* What type of database would be recommended for storing users, document metadata, OCR results, and corrected values?
I’m trying to avoid architectural decisions that could cause issues later during deployment or scaling, so insights from real-world experience would be very helpful.
Thanks in advance.
Poodle
Next.js with shadcn is a good combo here since you get API routes that can handle some of the auth and file stuff before hitting FastAPI. FastAPI is perfect for this kind of OCR work because you can just throw the heavy processing into background tasks and not block anything.
PaddleOCR should work fine for structured docs like yours. for the database just go with Postgres, Supabase is nice if you want auth and storage bundled in.
only thing to watch out for is file
uploads on Vercel have size limits so send those straight to FastAPI or S3 instead.
uploads on Vercel have size limits so send those straight to FastAPI or S3 instead.
@Poodle only thing to watch out for is file
uploads on Vercel have size limits so send those straight to FastAPI or S3 instead.
African Slender-snouted CrocodileOP
I was told that Nextjs would be quite an overkill for this and that 2 separate backends for Nextjs + FastAPI would create headaches. Or would Vite + React.js + shadcn be the better option with Celery + Redis?
@African Slender-snouted Crocodile I was told that Nextjs would be quite an overkill for this and that 2 separate backends for Nextjs + FastAPI would create headaches. Or would Vite + React.js + shadcn be the better option with Celery + Redis?
Poodle
yeah thats fair, if FastAPI is handling all your backend logic anyway then Next.js API routes just add confusion about what lives where. Vite + React + shadcn with FastAPI and Celery + Redis is cleaner since you have one frontend and one backend with a clear separation, simpler to reason about and deploy.
@Poodle yeah thats fair, if FastAPI is handling all your backend logic anyway then Next.js API routes just add confusion about what lives where. Vite + React + shadcn with FastAPI and Celery + Redis is cleaner since you have one frontend and one backend with a clear separation, simpler to reason about and deploy.
African Slender-snouted CrocodileOP
I see. Would you recommend other OCR tools?
@African Slender-snouted Crocodile I see. Would you recommend other OCR tools?
Poodle
PaddleOCR and EasyOCR are both solid for self hosted, EasyOCR is easier to set up if you want to get moving fast. if you have budget AWS Textract is built for key value extraction and pulls form fields automatically so you skip the parsing logic.
African Slender-snouted CrocodileOP
Thanks. I'll try to check for AWS Textract as wel