Baidu spider keeps downloading an old file

  • Hello,
    At my personal portfolio site on wordpress.com, in the Stats page I see a Baidu spider (I assume…) will show up every few days and “download” a copy of my CV using a document name I had used last February.

    I don’t know if it succeeds or not – the stats page doesn’t tell me whether a download is successful or not. But, when I search my media page, I cannot find a file with that name, so I assume that I have deleted it as it was an old version.

    Is there a way to know whether a “download” was a success or not (at least, from my end)? If not, then this is just a spam or error query and I’m assuming should not be marked in the stats as a “download”??

    Thanks!

    The blog I need help with is: (visible only to logged in users)

  • Hello there,

    Many thanks for reaching out.

    Are you able to confirm the URL of the website that you need assistance with please?

    The reason why I ask is because I don’t see anything Baidu spider related on the site cliffwatson.com.

    Many thanks.

  • Hi, Aleone89. Yes, that’s the one. Re: spider/not spider… Well, I see the file “downloaded” but cannot definitively confirm the origin country or referrer (if present in the data) since they’re not clearly correlated on the stats page. I also missed that the most recent one was from usa. What I see for CV downloads in the last few months are the following (and I have noted the country of origin and/or referrer in parentheses):
    Dec 7 (usa)
    Nov 25 (baidu, china), 16 (Baidu, FB, china, usa), 3 (Baidu, china)
    Oct 28 (usa), 22 (baidu, china), 18 (baidu, china), 7 (but NO visitors noted??), 1 (baidu, china)

    I don’t believe I have artist connections in China so I’m not sure why it would constantly be trying to retrieve a non-existent file if it wasn’t a bot? But you’re the expert! Let me know if you have another idea, e.g. should I filter the view, or block something, or,… Thank you!

  • Can you let us know where you’re seeing that specifically? I’d like to make sure we’re looking at the same thing here.

    Thanks!

  • Sure, it’s here: https://wordpress.com/stats/day/cliffwatson.com
    Then scroll back/forward in time using the arrows.

  • Nothing seems terribly out of place there.

    One thing to keep in mind is that our stats don’t link up actions. You may have a referrer from Baidu and a download of that file, but it doesn’t mean the download came from Baidu.

    Also, keep in mind that is a referrer from Baidu, not Baidu’s spider, we don’t record bot hits. That would be a human who found the file by searching Baidu.

    If it did come from Baidu then, keep in mind that search engines don’t index the web in real time. It’s simply possible the file ranks highly over there. Eventually, Baidu will remove it from its index, but it can take quite some time.

  • Sorry, that last bit seems out of place because I forgot to recommend that, if this is a concern, to delete the file and re-upload it with a new name (or keep it deleted).

  • Ok, thanks. I’ll keep it deleted and just ignore the hits.

    I’ll also just ignore the Oct 7 as a data aberration where there were no visitors yet there *was a file download (I’m guessing it occurred around midnight since there is a visit – with no download – on Oct 8).

  • RE those aberrations, also keep in mind that the various tracker blockers out there (some built-into browsers these days) will affect our ability to record everything.

    Like, for example, a file being downloaded (because we have the server activity), but no visitor (because they blocked our tracker).

  • The topic ‘Baidu spider keeps downloading an old file’ is closed to new replies.