Next.js Discord

Discord Forum

is there a way to have a tmp directory on the server?

Answered
Dutch Smoushond posted this in #help-forum
Open in Discord
Dutch SmoushondOP
^
Answered by Ray
View full answer

75 Replies

Dutch SmoushondOP
[OfficeParser]: Error: ENOENT: no such file or directory, mkdir 'officeParserTemp/tempfiles'

Trying to use officeparser through langchain
Toyger
you already have tmp, it's linux /tmp folder
so if you want to create here something then inside it like
/tmp/officeParserTemp/tempfiles
but as you should understand temporary can mean as still as request is happening, because vercel running on ephermal instances, so on invokation of next request this folder can be already deleted.
@Dutch Smoushond Thank you for the response, so what do you think would work in this instance? Im using package https://js.langchain.com/docs/integrations/document_loaders/file_loaders/pptx which I believe is built on top of this package https://github.com/harshankur/officeParser#readme which states the need for the tmp folder
Toyger
probably it is using it, but they didn't expose temp folder option, you either use your own implementation where you'll expose it, either ask in langchain issues can they expose it as option to customize temp folder location
American Crow
i just read over the docs (not carefully) you sure you need a temp folder? i don't see that part. Can you not just simply read a file from public or whatver?
This is in the officeparser github
Is aws s3 a solution? Ive read some have used that
@American Crow i just read over the docs (not carefully) you sure you need a temp folder? i don't see that part. Can you not just simply read a file from public or whatver?
Dutch SmoushondOP
Also, if you dig in to the langchain node_modules, you'll find
A method that takes a `raw` buffer and `metadata` as parameters and
     * returns a promise that resolves to an array of `Document` instances. It
     * uses the `parseOfficeAsync` function from the `officeparser` module to extract
     * the raw text content from the buffer. If the extracted powerpoint content is
     * empty, it returns an empty array. Otherwise, it creates a new
     * `Document` instance with the extracted powerpoint content and the provided
     * metadata, and returns it as an array.
American Crow
Dutch SmoushondOP
The pdf loader works fine tho, is it writing to the tmp without a problem? Never knew that
@American Crow you right Duke i found the issue: https://github.com/langchain-ai/langchainjs/issues/4000
Dutch SmoushondOP
Unfortunate that they still never addressed this
American Crow
yea sorry can't really help
@Ray could you show some code on how are you using it?
Dutch SmoushondOP
Yah, the file here is a file object that comes from the frontend, basically doing exactly as that docs
import { PPTXLoader } from "langchain/document_loaders/fs/pptx";

const loader = new PPTXLoader(file);

const docs = await loader.load();
@Ray on page component?
Dutch SmoushondOP
in api folder
on the backend
@Dutch Smoushond in api folder
add this to your next.config.js
/** @type {import('next').NextConfig} */
const nextConfig = {
  experimental: {
    serverComponentsExternalPackages: ["officeparser"],
  },
};

module.exports = nextConfig;
after that, this code works for me
import path from "path";
import { PPTXLoader } from "langchain/document_loaders/fs/pptx";

export async function GET() {
  const loader = new PPTXLoader(path.join(process.cwd(), "test.docx"));
  const docs = await loader.load();

  return Response.json(docs);
}
blob work too
  const filePath = path.join(process.cwd(), "test.docx");
  const buffer = await fs.readFile(filePath);
  const loader = new PPTXLoader(new Blob([buffer]));
  const docs = await loader.load();
@Ray blob work too ts const filePath = path.join(process.cwd(), "test.docx"); const buffer = await fs.readFile(filePath); const loader = new PPTXLoader(new Blob([buffer])); const docs = await loader.load();
Dutch SmoushondOP
async function readPPT(file) {
const filePath = path.join(process.cwd(), "test.docx")
const buffer = fs.readFile(filePath)
const loader = new PPTXLoader(new Blob([buffer]))
const docs = await loader.load()
return docs
}

this is my code, the file comes in as a parameter
Where does the file go in to that code?
what is the type of the file?
Dutch SmoushondOP
Its an object
object of what? could you log it out?
PPTXLoader accept a blob or string
either the blob of the file or the path of the file
Dutch SmoushondOP
A File object is a specific kind of Blob, and can be used in any context that a Blob can.

According to mdn
then try new PPTXLoader(file)
Dutch SmoushondOP
Taking some time because my code is all messed up from trying a bunch of different ways to solve this
This is what the file looks like

file: File {
size: 647237,
type: 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
name: 'Dickinson_Sample_Slides.pptx',
lastModified: 1710868377332
}
This is what the function looks like currently


async function readPPT(file) {
const loader = new PPTXLoader(file)
const docs = await loader.load()
console.log({ docs })
return docs
}
@Ray then try `new PPTXLoader(file)`
this works for me
@Ray this works for me
Dutch SmoushondOP
In production?
It works locally but not in production for me
where are you running in production
Dutch SmoushondOP
vercel
let me try it real quick
@Dutch Smoushond vercel
can't use PPTXLoader there because I don't know how to config the tempFilesLocation for officeParser
I use officeParser directly and it works
export async function POST(request: Request) {
  const formData = await request.formData();

  const file = formData.get("file") as File;

  const docs = await officeParser.parseOfficeAsync(
    Buffer.from(await file.arrayBuffer()),
    {
      tempFilesLocation: "/tmp",
    }
  );

  //   const loader = new PPTXLoader(file);
  //   const docs = await loader.load();
  return Response.json(docs);
}
Dutch SmoushondOP
how are you importing officeparser?
import path from "path";
import { PPTXLoader } from "langchain/document_loaders/fs/pptx";
import officeParser from "officeparser";

export async function POST(request: Request) {
  const formData = await request.formData();

  const file = formData.get("file") as File;

  const docs = await officeParser.parseOfficeAsync(
    Buffer.from(await file.arrayBuffer()),
    {
      tempFilesLocation: "/tmp",
    }
  );

  //   const loader = new PPTXLoader(file);
  //   const docs = await loader.load();
  return Response.json(docs);
}
Answer
if you need to use PPTXLoader, I think you should ask them how to set the tempFilesLocation
@Ray if you need to use `PPTXLoader`, I think you should ask them how to set the `tempFilesLocation`
Dutch SmoushondOP
Is there a limit on the file size that is being sent to the backend?
@Dutch Smoushond Is there a limit on the file size that is being sent to the backend?
are you talking about the body size?
Dutch SmoushondOP
Becuase one file worked but a larger one didnt
or /tmp
@Ray are you talking about the body size?
Dutch SmoushondOP
Yes
@Ray or /tmp
Dutch SmoushondOP
/tmp is supposedly 512mb according to the github discussion from above, which is plenty
@Ray https://nextjs.org/docs/app/api-reference/next-config-js/serverActions#bodysizelimit
Dutch SmoushondOP
Thank you, im gonna increase it a bit
@Ray body size is 1mb by default
Dutch SmoushondOP
Thank you for all your help! Been stuck on this for days and I hate being stuck on a bug, can't stop thinking about it
Dutch SmoushondOP
The smaller file worked, so I assume it has to do something with file size
cool
Dutch SmoushondOP
Getting this error
The maximum payload size for the request body or the response body of a Serverless Function is 4.5 MB. If a Serverless Function receives a payload in excess of the limit it will return an error 413: FUNCTION_PAYLOAD_TOO_LARGE. See How do I bypass the 4.5MB body size limit of Vercel Serverless Functions for more information.
Dutch SmoushondOP
7 mb
i set 10mb but im thinking maybe I need to upgrade vercel 🫠
Dutch SmoushondOP
Its alright, I can just send back a react toast or maybe split the doc into two requests
you may need to upload the file to s3 and read it on api route for large file
@Ray you may need to upload the file to s3 and read it on api route for large file
Dutch SmoushondOP
Yah, I'll decide what to do later but 4.5 mb for a file is not small, I'll send a message to split up the file perhaps
Maybe there is a way to filter out all the extra stuff in a pptx file, I just need the words
Anyhow, thank you for your time
your welcome