Google reports a problem with the robots.txt

  • Author
    Posts
  • #3287620

    rrg69
    Member

    Google reports a problem with the robots.txt file for whatnowdoc.com (robinrg69.wordpress.com). Their report states:
    “URL is on Google, but has issues What it means: The URL has been indexed and can appear in Google Search results, but there are some problems that might prevent it from appearing with the enhancements that you applied to the page. This might mean a problem with an associated AMP page, or malformed structured data for a rich result (such as a recipe or job posting) on the page. What to do next: Read the warnings or errors information in the report and try to fix the problems described.”
    The robots.txt states:
    # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead. # Please see https://developer.wordpress.com/docs/firehose/ for more details. Sitemap: http://whatnowdoc.com/sitemap.xml Sitemap: http://whatnowdoc.com/news-sitemap.xml User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Disallow: /wp-login.php Disallow: /activate/ # har har Disallow: /cgi-bin/ # MT refugees Disallow: /mshots/v1/ Disallow: /next/ Disallow: /public.api/ User-agent: IRLbot Crawl-delay: 3600 # This file was generated on Thu, 19 Jul 2018 21:13:09 +0000

    The blog I need help with is whatnowdoc.com.

    #3287657

    rinazrina
    Member

    Hello,

    Google reports a problem with the robots.txt file for whatnowdoc.com

    I’ve just checked your site’s robots.txt and everything looks normal and standard for wordpress.com sites. There are no issues with the robots.txt, so nothing crucial to worry about.

    Their report states:
    “URL is on Google, but has issues. What it means: The URL has been indexed and can appear in Google Search results, but there are some problems that might prevent it from appearing with the enhancements that you applied to the page. This might mean a problem with an associated AMP page or malformed structured data for a rich result (such as a recipe or job posting) on the page. What to do next: Read the warnings or errors information in the report and try to fix the problems described.”

    The report means that while your site is well indexed by Google, it has some problems that might prevent it from appearing with the rich results that applied to the page.

    The report in Google Search Console would include the details of the problems. Could you share the details here so I can take a look?
    If you’d like to share a screenshot, it would be even better.

    Rina

    #3287692

    rrg69
    Member

    Hi Rinazrina,
    Thanks for taking an interest. https://robinrg69.files.wordpress.com/2019/03/screenshot.odt is what you’re asking for, I believe but I don’t see any more info that I’ve previously reported.
    The post in question has been in place for a considerable time, this error report arrived out of the blue today.
    Cheers
    Robin

    #3287699

    justjennifer
    Moderator

    FWIW- Google Search Console started pinging me today about this exact same thing on two of my sites. Click on the link to go to search console to see what the issue is.

    In my case, both “issues” had to do with a link in WP Admin or the Customizer, which shouldn’t be in Google search returns to begin with as it is correctly blocked in our robots.txt file.

    #3287739

    rinazrina
    Member

    Thanks, @justjennifer for chiming in, and thanks, Robin, for sharing the screenshot.

    I see the below message there:

    The following warnings were found on your site:

    • Indexed, though blocked by robots.txt

    It means Google found some URLs of your site that are indexed even though they are blocked in your robots.txt. This sometimes happens because a Disallow in robots.txt is for blocking/controlling crawling and not to prevent indexing.

    As @justjennifer already suggested, you can click the blue button “Fix Coverage issues” on the email you received to check which URLs mentioned there.

    However, as your robots.txt is the default for WordPress.com sites, this issue unlikely to cause harm for your site. You can safely ignore the warning.
    URLs indexed but not crawlable are usually not prominent in Google search results.

    Cheers,
    Rina

    #3287803

    rrg69
    Member

    Well thanks to both of you, I’ll just ignore it then ;-)

    Cheers
    Robin

    #3289419

    supernovia
    Staff

    Thanks for pinging us on this thread, too. If anyone is having trouble with Google actually indexing the site, please let us know!

    #3290864

    sjbraun
    Member

    I got the warning today from Google also. 29 pages affected. “Indexed, though blocked by robots.txt”

    I really don’t understand it. Before having this wordpress.com site, I had my own domain site, and I know I could work with the robots.txt there. On wordpress.com, I don’t think I can access that.

    Any suggestions? I see that I get very little referral traffic from search engines, and I’m wondering if this is why.

    My site:
    http://www.girlsinwhitedresses.wordpress.com

    Thank you for any insight!

    #3290923

    Hi,

    I blog at The Tattooed Book Geek on WordPress.com and I’m having issues with “indexed, though blocked by robots.txt” on my blog. I received an email from the Google search team saying that it is affecting my blog, 280+ pages and I have no idea what it is or how to fix it or if I should be worried about it and any help you can offer would be greatly appreciated.

    My blog link is: https://thetattooedbookgeek.wordpress.com

    #3290983

    supernovia
    Staff

    Hi again folks, this is what your robots.txt file should look like on an indexable site. If yours does not look like this, please update us:

    # If you are regularly crawling WordPress.com sites, please use our firehose to receive real-time push updates instead.
    # Please see https://developer.wordpress.com/docs/firehose/ for more details.
    
    Sitemap: https://<em>(youractualaddress)</em>/sitemap.xml
    Sitemap: https://<em>(youractualaddress)</em>/news-sitemap.xml
    
    User-agent: *
    Disallow: /wp-admin/
    Allow: /wp-admin/admin-ajax.php
    Disallow: /wp-login.php
    Disallow: /wp-signup.php
    Disallow: /press-this.php
    Disallow: /remote-login.php
    Disallow: /activate/
    Disallow: /cgi-bin/
    Disallow: /mshots/v1/
    Disallow: /next/
    Disallow: /public.api/
    
    # This file was generated on (date)

    This blocks program directories like /cgi-bin and /wp-admin — but those are things visitors don’t need to access anyway.

    You can check it by putting this in your address bar, and changing the parentheses and everything in between to show your actual site address.

    https://(youractualaddress)/robots.txt

    If your file does look like that and Google still says it’s blocking them from indexing your content, please ask Google for more information on that. We’ll also try to work with them here.

    #3290988

    supernovia
    Staff

    And just adding extra emphasis to what @justjennifer said:

    In my case, both “issues” had to do with a link in WP Admin or the Customizer, which shouldn’t be in Google search returns to begin with as it is correctly blocked in our robots.txt file.

    It’s OK for Google to be blocked from indexing wp-admin, cgi-bin, your customizer, etc. Those are not things you need in search engines, plus they wouldn’t help you anyway since they’re programs and basically the same as everyone else’s.

    The important thing is that they can access your content. And everything in the sample robots.txt file I included above says they can. I don’t know why they’d complain about being unable to index wp-admin, but if you can see that is waht the complaint is about, I recommend just ignoring that.

    Hoping this helps!

    #3290992

    I’ve looked through a few of the posts I’m having an issue with and checked the robots.txt on the Google fix and they all have the same issue, the “disallow wp – login.php” line is the one causing an issue. Is that OK to ignore? Sorry, I’m clueless on this things.

    #3291028

    “disallow wp – login.php”

    This is fine. You do not need that to be indexed by a search engine.

    #3296506

    jas0nw0ng123
    Member

    Hi all,

    I have received the same email from Google earlier, and I have ignored the warning after reading this thread.

    However, when I visit the Search Console today, I find that Google suddenly refuses to index many of my blog posts, as you can see in the below reports.

    I don’t know if this is related to the aforementioned problem, but I do not have any other warnings in Search Console, and I have not been updating my blog in the past few weeks.

    Any advices on the issue would be greatly appreciated. Thank you.

    #3296515

    rinazrina
    Member

    Hello all, just wanted to add my two cents here.

    @jas0nw0ng123

    However, when I visit the Search Console today, I find that Google suddenly refuses to index many of my blog posts, as you can see in the below reports.

    I believe this is a different issue from the original post. The original post is about URLs that are “indexed, though blocked by robots.txt” that can safely be ignored since the robots.txt is working as intended.

    Your issue is “crawled – currently not indexed“.
    There is a known indexing issue this past week that Google officially stated in this tweet thread:
    https://twitter.com/searchliaison/status/1114961119699804160

    If the issue on your site happened recently, it could mean that your site is affected. And as mentioned in that tweet, “…the issues are mostly resolved and don’t require any special efforts on the part of site owners.

    Note that Search Console report is not live. It could take weeks to adjust with the current indexing status.

    If you have more questions regarding Search Console report, I would suggest posting in Google Webmasters Forum for better assistance from the experts.

    Cheers,
    Rina

    #3296518

    jas0nw0ng123
    Member

    Thank you very much for your help!

The topic ‘Google reports a problem with the robots.txt’ is closed to new replies.