lighthouse crawler file malformed

Answered

Arinji posted this in #help-forum

2023-08-31T12:08:21.000Z

return {
title: ${seoJson.data.name} (@${params.biolink}) \u00b7 Feds.lol,
description: Stay connected with @${params.biolink} and discover their online world in one convenient location.,
openGraph: {
title: ${seoJson.data.name} (@${params.biolink}) \u00b7 Feds.lol,
description: Stay connected with @${params.biolink} and discover their online world in one convenient location.,
images: [
{
url: seoJson.data.profile_picture,
alt: ${seoJson.data.name} Profile Picture,
},
],
},
robots: {
index: true,
follow: true,
nocache: true,
googleBot: {
index: true,
follow: true,
noimageindex: false,
"max-video-preview": -1,
"max-image-preview": "large",
"max-snippet": -1,
},
},
};
}

When I do a lighthouse report my seo score drops because "crawler file is malformed"

Any idea why this happens

Answered by not-milo.tsx

You need an Allow entry like this:

User-Agent: *
Allow: /
Disallow: /private/

Sitemap: https://acme.com/sitemap.xml

View full answer

27 Replies

ArinjiOP

2023-08-31T13:09:23.000Z

The docs shows the same code so I'm really confused now lol

ArinjiOP

2023-09-01T08:32:20.000Z

Bump

not-milo.tsx

2023-09-01T08:48:40.000Z

I don't think it's related to the code you posted. That generates some meta tags for web crawlers.

If you want to have a properly formed robots.txt file either include one at the root of your app directory or generate one dynamically with a [robots.ts](https://nextjs.org/docs/app/api-reference/file-conventions/metadata/robots) file

@not-milo.tsx I don't think it's related to the code you posted. That generates some meta tags for web crawlers. If you want to have a properly formed `robots.txt` file either include one at the root of your app directory or generate one dynamically with a [`robots.ts`](<https://nextjs.org/docs/app/api-reference/file-conventions/metadata/robots>) file

ArinjiOP

2023-09-01T08:50:02.000Z

oh that could be it actually, lemme try :D

ArinjiOP

2023-09-01T09:05:01.000Z

ok so i just got this new lighthouse seo issue

my robots.txt file
User-agent: *
Disallow: /dash

The page im trying to index is https://website-git-frontend-v2-fedslol.vercel.app/techy

not-milo.tsx

2023-09-01T09:13:27.000Z

You need an Allow entry like this:

User-Agent: *
Allow: /
Disallow: /private/

Sitemap: https://acme.com/sitemap.xml

Answer

ArinjiOP

2023-09-01T09:16:00.000Z

oh ok, lemme try :D

also for the sitemap, i have a lot of users with their own pages, like /techy

so do i need to like dynamically generate all of them?

if i want them to be indexed as well

not-milo.tsx

2023-09-01T09:18:25.000Z

Yup, you have to and you can do that dynamically too as explained here: https://nextjs.org/docs/app/api-reference/file-conventions/metadata/sitemap#generate-a-sitemap

ArinjiOP

2023-09-01T09:18:59.000Z

Alright, thankyou, lemme check if the allow entry works

Also sorry about the cross posting thing, i thought it was like one question per thread

@not-milo.tsx Yup, you have to and you can do that dynamically too as explained here: <https://nextjs.org/docs/app/api-reference/file-conventions/metadata/sitemap#generate-a-sitemap>

not-milo.tsx

2023-09-01T09:19:56.000Z

Right now it's only limited to the url and lastModified fields. So if you need something more sophisticated you'll have to generate it in a different way. (If you need it I can show you)

ArinjiOP

2023-09-01T09:20:17.000Z

Oh no need, the lastModified thing is optional right?

@Arinji Also sorry about the cross posting thing, i thought it was like one question per thread

not-milo.tsx

2023-09-01T09:20:55.000Z

Oh no, not necessarily. If the questions are related you can post them in the same thread. The only thing the server doesn't allow is posting the same question in multiple places.

ArinjiOP

2023-09-01T09:21:13.000Z

yea, got it now. Thanks :D

@Arinji Oh no need, the lastModified thing is optional right?

not-milo.tsx

2023-09-01T09:21:51.000Z

Yup, the sitemap return tipe is this one:

type Sitemap = Array<{
  url: string
  lastModified?: string | Date
}>

ArinjiOP

2023-09-01T09:22:27.000Z

alright, actually i was thinking, we might want to add some more details like the name and the description of the user. So could you show me how i would have to set that up?

not-milo.tsx

2023-09-01T09:25:29.000Z

The trick is to use middleware and return a custom xml response when the crawler requests /sitemap.xml.

Se it in action here: https://github.com/milovangudelj/milovangudelj.com/blob/master/middleware.ts#L31-L39

And here: https://github.com/milovangudelj/milovangudelj.com/blob/master/lib/sitemap.ts

ArinjiOP

2023-09-01T09:25:57.000Z

oh, cool lemme check it out

oh thats very neat, thankyou so much :D

not-milo.tsx

2023-09-01T09:27:35.000Z

No problem âœŒðŸ»
If that's all you can mark the question as resolved

ArinjiOP

2023-09-01T09:27:56.000Z