Next.js Deduping Calls on Parallel Environment
Unanswered
ᴉuɐpɹɐɐ posted this in #help-forum
ᴉuɐpɹɐɐOP
setting
context:
I am trying to find a way to dedupe fetch() calls and unstable_cache() calls on build time building SSG routes.
Currently fetch() calls gets deduped down to 2 calls while unstable_cache() is horrible at deduping itself during SSG.
cpus: 1 solves this issue but is there any way to store "caches" in between processes?context:
I am trying to find a way to dedupe fetch() calls and unstable_cache() calls on build time building SSG routes.
Currently fetch() calls gets deduped down to 2 calls while unstable_cache() is horrible at deduping itself during SSG.
99 Replies
ᴉuɐpɹɐɐOP
the question is how to communicate between worker threads
and what it does between the cpus if the "workerThreads" is disabled here......
Toyger
you can try to use redis as single source of caches.
vercel have example for redis cache-handler https://github.com/vercel/next.js/tree/canary/examples/cache-handler-redis
or based on it you can write your own cache handler.
vercel have example for redis cache-handler https://github.com/vercel/next.js/tree/canary/examples/cache-handler-redis
or based on it you can write your own cache handler.
@Toyger you can try to use redis as single source of caches.
vercel have example for redis cache-handler https://github.com/vercel/next.js/tree/canary/examples/cache-handler-redis
or based on it you can write your own cache handler.
i wonder if this is more designed for build step (what alfon wants here), or just in general (still cool)
ᴉuɐpɹɐɐOP
how does the createLruCache works? if its just storing a global variable then it wont be deduped
also this page doesnt exist "https://caching-tools.github.io/next-shared-cache/configuration/use-file-system"
best docs i found on same page for that: https://caching-tools.github.io/next-shared-cache/configuration/on-creation#arguments
also sounding promising as you kinda have multi of app running
ngl i think its going to be the fetch thing, and now i see why alfon was asking about the external server as this kinda sounds like the only way (3rd party app that runs as its own cache and gives the data/makes it when asked)
@riský also sounding promising as you kinda have multi of app running
ᴉuɐpɹɐɐOP
what about this?
independent instances of same app running (as isnt that what build is doing)
ᴉuɐpɹɐɐOP
so how do we set up the neshca thing
i wonder how much of this one could abuse tho: https://nextjs.org/docs/app/api-reference/next-config-js/incrementalCacheHandlerPath and then ig use unstable_cache
Toyger
if it's for build time it depends what you want to cache, on build next have it's own caches in .cache folder, but I doubt it cache fetch or anything like that.
theoretically you can use build phase condition https://github.com/vercel/next.js/discussions/48736#discussioncomment-5704784 directly in your code, and you can save cache whenever you want, either in file either in redis.
something like
theoretically you can use build phase condition https://github.com/vercel/next.js/discussions/48736#discussioncomment-5704784 directly in your code, and you can save cache whenever you want, either in file either in redis.
something like
export async function generateStaticParams() {
if (process.env.NEXT_PHASE=="phase-production-build"){
const cachedData = await redisGetAsync(redisKey);
if (cachedData) {
return JSON.parse(cachedData);
}
const posts = await fetch(url);
if (!response.ok) {
throw new Error(`Failed to fetch data from ${url}`);
}
const data = await response.json();
await redisSetAsync(redisKey, JSON.stringify(data));
return posts.map((post) => ({
slug: post.slug,
}));
}
}ᴉuɐpɹɐɐOP
im not sure how this could help me in my case
if i use
redisGetAsync in multiple routes, would those calls be deduped?@ᴉuɐpɹɐɐ if i use `redisGetAsync` in multiple routes, would those calls be deduped?
Toyger
it's just pseudocode, you can use even files and remove them after build.
redis will be as single source of truth so if key will be already set then in any other thread it will read value instead making new fetch
redis will be as single source of truth so if key will be already set then in any other thread it will read value instead making new fetch
ᴉuɐpɹɐɐOP
redis is.. in the same node.js environment as the next.js runtime?
@ᴉuɐpɹɐɐ redis is.. in the same node.js environment as the next.js runtime?
Toyger
no, you need to use either https://upstash.com/ or your own self-hosted.
or as I said you can just write your cache to files and read from them
or as I said you can just write your cache to files and read from them
ᴉuɐpɹɐɐOP
so its an external environment?
Toyger
yeah
ᴉuɐpɹɐɐOP
if i can't dedupe fetch() how will i even dedupe connection to upstash?
Toyger
because they almost free, they return value from memory so it's nanoseconds of cpu work, but your fetch is couple hunderd milliseconds at best
ᴉuɐpɹɐɐOP
even if my fetch is in the same server? upstash, being some distance away be faster?
Toyger
your fetch still get data from somewhere like database
ᴉuɐpɹɐɐOP
isnt that what upstash do? its also "somewhere"
not here
Toyger
database need to calclulate data, which take time, and each time it will be 300-400ms which end up in minutes
redis save only cached value to memory, so it's nanoseconds to fetch each time
redis save only cached value to memory, so it's nanoseconds to fetch each time
you can selfhost redis
Toyger
yeah
but not on vercel of course
you can also make a custom proxy app thats simpler too
ᴉuɐpɹɐɐOP
how will using redis dedupe the calls if I have 200 routes, it will call redis 200 times at build time
Toyger
again you can use files instead redis, and it will be local file read.
redis will be just cheap, it will be same 200 requests but instead minutes or hours of fetching it will take less than 5s for all 200
redis will be just cheap, it will be same 200 requests but instead minutes or hours of fetching it will take less than 5s for all 200
@ᴉuɐpɹɐɐ how will using redis dedupe the calls if I have 200 routes, it will call redis 200 times at build time
you just have to hope one is faster then the other
and can cache quick enough
:)
ᴉuɐpɹɐɐOP
redis wont be quick enough
you cant beat physics
Toyger
you can prebuilt caches
just write script that will fetch all your path and put them in redis before build
either in files
ok can we get back to the op question of doing within nextjs confines maybe
otherwise i think alfon has considered this all external things
ᴉuɐpɹɐɐOP
I have 200 routes of non cached data.
I have to
Next.js doesn't do sequential fetch, they do parallel
all 200 will be fetched and i only have rate limit quota of 60 per hour.
this is all pointless 200 fetches still happens when instead it could have been one.
I have to
fetch them first before caching it.Next.js doesn't do sequential fetch, they do parallel
all 200 will be fetched and i only have rate limit quota of 60 per hour.
this is all pointless 200 fetches still happens when instead it could have been one.
Sure i could prefetch them but i dont have that liberty on Vercel, which is what majority of people will be using
also if i want to cache the
result of fetch calls, next.js already does that btwToyger
if it's 200 different routes, then cache will not help there at all
ᴉuɐpɹɐɐOP
its 200 different routes but is fetching the same data ->
localhost:4000 or www.api.comToyger
one domain but not one data, you have different data for each route
maybe you should just isr and dont prerender :)
ᴉuɐpɹɐɐOP
huh?
I guess you know more about my project than I do then
Toyger
each of your 200 routes return 200 diffrent results? or single data returned for all 200 routes?
you can still write prebuilt script even on vercel, that will fetch it with timeouts
either even in
you can still write prebuilt script even on vercel, that will fetch it with timeouts
either even in
generateStaticParams you can await promise that will have 60s or whatever how many threads you have to not throttle api.ᴉuɐpɹɐɐOP
each of my 200 routes uses the same data, therefore its a single data but processed differently for 200 routes
its 200 different routes but is fetching the same data ->
localhost:4000/ or www.api.com/getDataToyger
then you can fetch single time this data in prebuilt script into single file and in generateStaticParams just read directly from local file
ᴉuɐpɹɐɐOP
that could work, though not as intuitive i hoped it would be
@Toyger then you can fetch single time this data in prebuilt script into single file and in generateStaticParams just read directly from local file
oh yeah that, good could work (i think i said it some time ago - maybe in just direct msg), but it is a little paing to work it out, however with a utils.ts file you may be able to keep track and so
ᴉuɐpɹɐɐOP
I managed to get 6 calls down to 2 with unstable_cache
ᴉuɐpɹɐɐOP
idk how to extensively test its correctness
but basically implemented a file based locking system
but basically implemented a file based locking system
like mutex but done in a file
I like solution proposed by horus
but if its possible i want to decouple it from other things and make it as contained as possible first
i.e doing it without extra configuration
@ᴉuɐpɹɐɐ idk how to extensively test its correctness
but basically implemented a file based locking system
at least sleep for some ms 😭
Toyger
why do you need to lock file? if it will have same data there is no need for that
to force order
ᴉuɐpɹɐɐOP
-of execution
but i dont know how its better then just cpu:1 other then you control the order, but i mean do you really need to know order if you can cache in same process
Toyger
didnt get it. what point of some order? data is same for everyone.
ᴉuɐpɹɐɐOP
if data exists then it wouldnt have to make unnecessary repeated network calls
the problem with initial 200 fetch is that bcause the cache doesn't exist, which is why you suggested to pregenerate the data first via script
in here unstable_cache() is being called in 200 routes. if there is some sort of order,
i.e the first one calls first, then store the data,
then the rest of 199 wouldnt have to make any network calls since its already cached
i.e the first one calls first, then store the data,
then the rest of 199 wouldnt have to make any network calls since its already cached
(context: ssg)
(( i know the motive is questionable ))
Toyger
you didn't understood what I suggest.
you need to make some script and then run it before build
something like that in package.json:
this prefetch_script.js will make fetch to your api and download all data to
then in your generateStaticParams you read it as
you need to make some script and then run it before build
something like that in package.json:
"scripts": {
...
"build": "node prefetch_script.js && next build",
...
},this prefetch_script.js will make fetch to your api and download all data to
/tmp/prefetcghed_data.json because even on vercel you have write access to /tmpthen in your generateStaticParams you read it as
if (fs.existsSync('foo.txt')) {
const file = await fs.readFile('/tmp/prefetcghed_data.json', 'utf8');
//handle file
} else // another fetch just in case, but prefetch already should done work, so most likely this not neededᴉuɐpɹɐɐOP
i understood
i said i like your solution
i dont need to because thats not the answer im looking for
i wouldnt say i liked it if i dont understand it
Toyger
I just don't understand why do you need abstractions with mutex lock, if this solution basically solve your problem in easiest way.
ᴉuɐpɹɐɐOP
then say that instead of gaslighting me
@Toyger didnt get it. what point of some order? data is same for everyone.
Toyger
but I said that
ᴉuɐpɹɐɐOP
you didnt say that it "solves my problem in easiest way"
also didnt say that you dont understand "the need of abstraction with mutex lock"
also didnt say that you dont understand "the need of abstraction with mutex lock"
i am willing to discuss
insisting that i dont understand your idea would simply means that you arent willing to discuss and dont understand my problem.
ᴉuɐpɹɐɐOP
Im trying to make a function that handles next.js caching out of the box while also takng advantage of the native caching strats
it already handles deduping and memoizsation on dynamically rendered routes
but it havent done so in statically rendered routes.
it would be nice if it could do both
whether by a forcing deduplication on build time, or pregenerating it with a script
maybe it doesnt make sense for a function to handle both scenario to begin with
ᴉuɐpɹɐɐOP
Deduplication has been achieved
@ᴉuɐpɹɐɐ Deduplication has been achieved
Yay, but what was the strat
@riský Yay, but what was the strat
ᴉuɐpɹɐɐOP
file mutex
Ah nice
ᴉuɐpɹɐɐOP
using npm i proper-lockfie