How to hide __NEXT_DATA__ from Google?
Unanswered
MacGillivray's Warbler posted this in #help-forum
MacGillivray's WarblerOP
Next uses a script element with the mime type application/json to conveniently provide page data. The problem with that is that we have a bunch of links in our page props, some of which are automatically generated and point at invalid sites. Google discovers these links and crawls them, causing SEO issues since it now thinks there are a bunch of linked 404 pages.
With the problem identified, we are currently weighing our options. We are aware that we could remove the invalid links or fix them and there appears to be a way to make Google ignore parts of the HTML document. However, we would like to know, if there is a recommended approach with Next to solve this kind of issue. Something like automatically preventing Google from crawling this data or hiding it. I'd be grateful for any and all help.
With the problem identified, we are currently weighing our options. We are aware that we could remove the invalid links or fix them and there appears to be a way to make Google ignore parts of the HTML document. However, we would like to know, if there is a recommended approach with Next to solve this kind of issue. Something like automatically preventing Google from crawling this data or hiding it. I'd be grateful for any and all help.
12 Replies
you can disallow specific pages inside your robots.txt, that google is not allowed to crawl. You can read more about it here: https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt
@MacGillivray's Warbler
@MacGillivray's Warbler
MacGillivray's WarblerOP
Yes, but the page should still be crawled. The
__NEXT_DATA__
element is the only thing that should not be crawled. As far as I'm aware, fragments are not supported in robots.txt, so we could not block elements by id. The robots.txt does possibly provide a good approach, though, as the invalid URLs are all similar. Thanks for reminding me of it.That is only an indirect solution, though. Is there something that directly prevents Google from crawling only the
__NEXT_DATA__
json?yea, you can allow also specific objects and path, that shouldn't be crawled and which ones should. So you can configure it like you want to
MacGillivray's WarblerOP
And how would I do that with Next? I am aware of
<!--googleoff: all-->
and <!--googleon: all-->
MacGillivray's WarblerOP
I know, but the data is INSIDE the page.
So if I blocked the path, the whole page would be disallowed.
instead of just the json inside the page
oh ok. So the json is inside the page and only this specific json inside the page should be hidden?
MacGillivray's WarblerOP
Yes, precisely.
I shared a screenshot of the element here.