Dealing with thousands of MDX Files in Server Components
Answered
American Crow posted this in #help-forum
American CrowOP
This question is rather broad and i am looking more for general guidance.
So i am generating ~4k static sites from MDX u using https://velite.js.org/ which is like a successor of content-layer.
Everything is working however whenever i freshly deploy or freshly build and hit a route for first time i get this huge initial load (10sec+).
The build itself takes 5minutes and 4k pages seems to be the max (I run out of memory when i try more, my machine has 32GB).
So i suspect some webpack limitations, however my understanding of bundling is too poor to really understand the problem.
After building and hitting any page route once, all pages run smooth. I am suspecting some caching kicks in.
Velite basically creates a huge JSON file with all the MDX content and frontmatter, which i load in a server component.
Is Velite the problem?
Am i just stupid and should split the MDX files into multiple loaders and not load everything into one big JSON but rather split em up?
I tried disabling caches but i can't reproduce the issue manually other than rebuilding. So i don't really understand where that "cold start problem" is coming from.
I would be very thankful if someone could explain the fundamental process behind this( i know i am asking a lot and missing fundamentals ) and guide me to why this might be happening and how to improve, if possible.
So i am generating ~4k static sites from MDX u using https://velite.js.org/ which is like a successor of content-layer.
Everything is working however whenever i freshly deploy or freshly build and hit a route for first time i get this huge initial load (10sec+).
The build itself takes 5minutes and 4k pages seems to be the max (I run out of memory when i try more, my machine has 32GB).
So i suspect some webpack limitations, however my understanding of bundling is too poor to really understand the problem.
After building and hitting any page route once, all pages run smooth. I am suspecting some caching kicks in.
Velite basically creates a huge JSON file with all the MDX content and frontmatter, which i load in a server component.
Is Velite the problem?
Am i just stupid and should split the MDX files into multiple loaders and not load everything into one big JSON but rather split em up?
I tried disabling caches but i can't reproduce the issue manually other than rebuilding. So i don't really understand where that "cold start problem" is coming from.
I would be very thankful if someone could explain the fundamental process behind this( i know i am asking a lot and missing fundamentals ) and guide me to why this might be happening and how to improve, if possible.
Answered by fuma π joulev
I think even if you split it into multiple files it still crashes. When compiling MDX files, usually they store all page info, and their compiled output into runtime memory.
Next.js itself already has a high memory usage in production build, with 4k+ MDX files it is no longer possible to handle that.
The huge JSON file, is not a problem (since file system can handle that easily), although reading it can lead to a higher consume of memory.
Most importantly, it also implies that the entire JSON file is the output of compilation, in the process webpack and the MDX plugin compiled all of these files, and all the cache & outputs are stored in memory.
Luckily after the build, all MDX files are compiled into HTML files + Javascript, hence it is fast everywhere after build
Next.js itself already has a high memory usage in production build, with 4k+ MDX files it is no longer possible to handle that.
The huge JSON file, is not a problem (since file system can handle that easily), although reading it can lead to a higher consume of memory.
Most importantly, it also implies that the entire JSON file is the output of compilation, in the process webpack and the MDX plugin compiled all of these files, and all the cache & outputs are stored in memory.
Luckily after the build, all MDX files are compiled into HTML files + Javascript, hence it is fast everywhere after build
42 Replies
American CrowOP
@fuma π joulev
I think even if you split it into multiple files it still crashes. When compiling MDX files, usually they store all page info, and their compiled output into runtime memory.
Next.js itself already has a high memory usage in production build, with 4k+ MDX files it is no longer possible to handle that.
The huge JSON file, is not a problem (since file system can handle that easily), although reading it can lead to a higher consume of memory.
Most importantly, it also implies that the entire JSON file is the output of compilation, in the process webpack and the MDX plugin compiled all of these files, and all the cache & outputs are stored in memory.
Luckily after the build, all MDX files are compiled into HTML files + Javascript, hence it is fast everywhere after build
Next.js itself already has a high memory usage in production build, with 4k+ MDX files it is no longer possible to handle that.
The huge JSON file, is not a problem (since file system can handle that easily), although reading it can lead to a higher consume of memory.
Most importantly, it also implies that the entire JSON file is the output of compilation, in the process webpack and the MDX plugin compiled all of these files, and all the cache & outputs are stored in memory.
Luckily after the build, all MDX files are compiled into HTML files + Javascript, hence it is fast everywhere after build
Answer
Yes, in initial load, Next.js loads page content, this includes the 4k+ pages compiled from MDX files
Honestly I think usually if you really have such a large amount of content, just separate them into a few websites, or use a CMS that can handle this
American CrowOP
So you are saying i am hitting a natural , technical barrier here?
true, rarely can a site to reach this size
of course with faster programming languages like Rust, it's possible. But in Node.js, it's pretty much the max you can get
bottleneck, in other words
@fuma π joulev Yes, in initial load, Next.js loads page content, this includes the 4k+ pages compiled from MDX files
American CrowOP
Can you explain this a little more? Why are all the 4k pages in the initial load? Also can this initial load happen more than once, i feel like when i don't interact with the website for long enough in production this coldstart/ initial load reappears? My feeling might be wrong but i think i have witnessed that
yes, iniital load happens everytime the server performs coldstart
@fuma π joulev yes, iniital load happens everytime the server performs coldstart
American CrowOP
Sorry for the slow responding. I have to read documentation in between to understand things better.
Can caching my serverless nodejs functions ( i am assuming this is whats used since i am running filesystem operations with the mdx reading) improve my situation?
https://vercel.com/guides/how-can-i-improve-serverless-function-lambda-cold-start-performance-on-vercel
^No#3 on that list.
Can i theoretically set the cache-control headers very high so a cold start rarely (or only once) happens or is this whole concecpt not applicable to my situation?
Can caching my serverless nodejs functions ( i am assuming this is whats used since i am running filesystem operations with the mdx reading) improve my situation?
https://vercel.com/guides/how-can-i-improve-serverless-function-lambda-cold-start-performance-on-vercel
^No#3 on that list.
Can i theoretically set the cache-control headers very high so a cold start rarely (or only once) happens or is this whole concecpt not applicable to my situation?
no, imagine you have 4k+ files, how would you cache the them?
the best way, as mentioned above, take a better alternative like CMS
Just out of curiosity, what are you actually building?
Just out of curiosity, what are you actually building?
American CrowOP
website sent in pm (not public yet)
I mean, can you elaborate what's your app doing and why does it need that amount of content?
@fuma π joulev no, imagine you have 4k+ files, how would you cache the them?
American CrowOP
Well its a website for name generation.
It has multiple categories e.g. dungeons and dragons or gaming
And some have multiple subcategories e.g. pokemon, valorant, ...
Each category x subcategory combinations has its own page with some suggestions for names. Those are the mdx files.
All of those are available in 13 languages. So it's category x subcategory x languages = amount of pages. Thats were the big number comes from.
It has multiple categories e.g. dungeons and dragons or gaming
And some have multiple subcategories e.g. pokemon, valorant, ...
Each category x subcategory combinations has its own page with some suggestions for names. Those are the mdx files.
All of those are available in 13 languages. So it's category x subcategory x languages = amount of pages. Thats were the big number comes from.
Actually SSR is enough for your case I sure
Just render the page according to the category/language from query parameter
and let the user to select this using a select element
American CrowOP
I start to belive that this is the best way. However my SEO heart cries a little bit?
you want to index your 4k+ pages on Google?
American CrowOP
i was planning on it thats why i do it like this yea
you will need a sitemap.xml, and Google will not index every single page of your website
best way should be indexing each category and language (usually using different domains for i18n)
American CrowOP
I have done all that (sitemap.xml) is live at /sitemap.xml and using next-intl i have the alternate href's in the header (not in the sitemap) but thats fine for Google too
American CrowOP
Oh yea it will def. find duplicates and just crawl some but not index
From a technical perspective why does SSR the pages solve my problem? I understand that now when a user requests a page its on demand. But why is now the 4k+ files not a problem anymore
1. Doesn't make sense in terms of performance, auto generation is way better than literally building 4k+ files
2. Google doesn't index every page of your incredibly large website, they are more likely to ignore it according to their current strategy (all your pages are looked similar, and very few users click it)
3. SSR also supports metadata API, SEO isn't very terrible
2. Google doesn't index every page of your incredibly large website, they are more likely to ignore it according to their current strategy (all your pages are looked similar, and very few users click it)
3. SSR also supports metadata API, SEO isn't very terrible
@American Crow From a technical perspective why does SSR the pages solve my problem? I understand that now when a user requests a page its on demand. But why is now the 4k+ files not a problem anymore
you don't actually build and serve 4k+ pages as static assets with SSR
American CrowOP
you answered it with #1. the build is not necessary
man i am so lost you helped me a tremendous amount
I'm glad to hear that
American CrowOP
i feel like the most stupid person on earth but i am relieved at the same time
@American Crow i feel like the most stupid person on earth but i am relieved at the same time
Devs may feel they're dumb at some point in their career, but it's a sign of gaining knowledge
American CrowOP
i will SSR most of the pages. Do a little hybrid approach to do at least three or four pages of each category static
You have no idea how much time i wasted into this chasing this static idea
but yea
i wont forget
thank you gain very very very much
Just to ellaborate one thing: At one point in time switched out the internationalisation lib to next-intl just because they introduced that
unstable_static
function to make i18n and static site generation possible. So i was willing to use unstable functions, swap out libs, i also swapped out like 3 different MDX libs, ... CHASING THAT PHANTOM OF SSG. While all signs pointed in the other direction. :<