Next.js Discord

Discord Forum

Need Help with LinkedIn and Twitter Not Scraping My Next.js Page

Unanswered
Transylvanian Hound posted this in #help-forum
Open in Discord
Transylvanian HoundOP
Hey everyone,

I'm facing a bit of a challenge with getting LinkedIn and Twitter to properly scrape and preview one of my pages built with Next.js. Here’s what’s happening:

### The Issue:

- LinkedIn Post Inspector: When I try to scrape the page with LinkedIn’s Post Inspector, I get the following error:
  Ingestion feedback
  Error: We did not scrape the content because its size was larger than our threshold. Maximum size for this input stream is 3145728 bytes.
  

This suggests that the page exceeds LinkedIn's size limit of around 3 MB, making it impossible for them to scrape the content or generate a preview.

- Twitter: On Twitter, the preview image isn’t showing up at all. I’ve checked the Open Graph and Twitter card tags, and everything seems correct, but Twitter still doesn't display the image.

### What I Need Help With:

- Handling Size Limits: Does anyone know effective strategies for dealing with content size limits when scraping with LinkedIn? How can I ensure that the page gets scraped and a preview is generated even if the page is large?

- Twitter Image Preview Issues: Why might Twitter not be displaying the preview image even when the correct twitter:image tag is present? Are there known issues or tricks to ensure the image appears?

- Fallback Solutions: If reducing the page size isn’t an option, are there any alternative solutions or workarounds? For example, is it possible to force LinkedIn or Twitter to scrape a specific URL for the image, even if the page itself is large?

- General Best Practices: What are the best practices for ensuring social media platforms can properly scrape and generate previews for Next.js pages? Are there any tools or plugins that could help with this?

I’d really appreciate any insights or advice you might have. This issue is causing some frustration, and any help would be valuable. Thanks in advance!

7 Replies

Transylvanian HoundOP
This the screenshot of the layout file.
Transylvanian HoundOP
Help here 😭
@Transylvanian Hound Hey everyone, I'm facing a bit of a challenge with getting LinkedIn and Twitter to properly scrape and preview one of my pages built with Next.js. Here’s what’s happening: ### **The Issue:** - **LinkedIn Post Inspector:** When I try to scrape the page with LinkedIn’s Post Inspector, I get the following error: Ingestion feedback Error: We did not scrape the content because its size was larger than our threshold. Maximum size for this input stream is 3145728 bytes. This suggests that the page exceeds LinkedIn's size limit of around 3 MB, making it impossible for them to scrape the content or generate a preview. - **Twitter:** On Twitter, the preview image isn’t showing up at all. I’ve checked the Open Graph and Twitter card tags, and everything seems correct, but Twitter still doesn't display the image. ### **What I Need Help With:** - **Handling Size Limits:** Does anyone know effective strategies for dealing with content size limits when scraping with LinkedIn? How can I ensure that the page gets scraped and a preview is generated even if the page is large? - **Twitter Image Preview Issues:** Why might Twitter not be displaying the preview image even when the correct `twitter:image` tag is present? Are there known issues or tricks to ensure the image appears? - **Fallback Solutions:** If reducing the page size isn’t an option, are there any alternative solutions or workarounds? For example, is it possible to force LinkedIn or Twitter to scrape a specific URL for the image, even if the page itself is large? - **General Best Practices:** What are the best practices for ensuring social media platforms can properly scrape and generate previews for Next.js pages? Are there any tools or plugins that could help with this? I’d really appreciate any insights or advice you might have. This issue is causing some frustration, and any help would be valuable. Thanks in advance!
if you mean fortis-frontend.vercel.app, your robots.txt
User-Agent: *
Allow: /yir-2023
Allow: /yir-2023/*
Disallow: /

pretty much bans all bots from crawling your pages (which means all social media sites are banned from getting the data needed for the embeds)
Transylvanian HoundOP
I'm testing the fortis-frontend.vercel.app/yir-2023 page
@Transylvanian Hound I'm testing the fortis-frontend.vercel.app/yir-2023 page
so i can tell the reason why it failed: the html response of https://fortis-frontend.vercel.app/yir-2023 is 5.75 MB, which is too too big for crawlers (and for basically everyone as well).

the cause seems to be that although the page content is small, it's appended by gigantic <script>s.

it's not normal at all. massive pages im maintaining at work are only totalling some 50-100 kB of html. it's weird that you somehow got 5.75 MB...

do you have some special stuff in your server components or something? how do you deploy? dynamic/static routes? if you remove stuff from the page, when does the number significantly drop to a safe level? (say, 100 kB-ish)

btw to check the number you can just use postman or any http clients and GET the page.