Next.js Discord

Discord Forum

How to hide __NEXT_DATA__ from Google?

Unanswered
MacGillivray's Warbler posted this in #help-forum
Open in Discord
Avatar
MacGillivray's WarblerOP
Next uses a script element with the mime type application/json to conveniently provide page data. The problem with that is that we have a bunch of links in our page props, some of which are automatically generated and point at invalid sites. Google discovers these links and crawls them, causing SEO issues since it now thinks there are a bunch of linked 404 pages.

With the problem identified, we are currently weighing our options. We are aware that we could remove the invalid links or fix them and there appears to be a way to make Google ignore parts of the HTML document. However, we would like to know, if there is a recommended approach with Next to solve this kind of issue. Something like automatically preventing Google from crawling this data or hiding it. I'd be grateful for any and all help.
Image

12 Replies

Avatar
B33fb0n3
you can disallow specific pages inside your robots.txt, that google is not allowed to crawl. You can read more about it here: https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt
@MacGillivray's Warbler
Avatar
MacGillivray's WarblerOP
Yes, but the page should still be crawled. The __NEXT_DATA__ element is the only thing that should not be crawled. As far as I'm aware, fragments are not supported in robots.txt, so we could not block elements by id. The robots.txt does possibly provide a good approach, though, as the invalid URLs are all similar. Thanks for reminding me of it.
That is only an indirect solution, though. Is there something that directly prevents Google from crawling only the __NEXT_DATA__ json?
Avatar
B33fb0n3
yea, you can allow also specific objects and path, that shouldn't be crawled and which ones should. So you can configure it like you want to
Avatar
MacGillivray's WarblerOP
And how would I do that with Next? I am aware of <!--googleoff: all--> and <!--googleon: all-->
Avatar
B33fb0n3
you can set specific params, to allow only crawling on specific paths
Image
Avatar
MacGillivray's WarblerOP
I know, but the data is INSIDE the page.
So if I blocked the path, the whole page would be disallowed.
instead of just the json inside the page
Avatar
B33fb0n3
oh ok. So the json is inside the page and only this specific json inside the page should be hidden?
Avatar
MacGillivray's WarblerOP
Yes, precisely.
I shared a screenshot of the element here.