How important is API or compare Яндекс.XML and real results

ноября 09, 2017

Always wondered if there is a difference between the search results of Yandex and their API (xml.yandex.ru), deciding the same tasks (official position: Яндекс.XML — possibility to do search queries to Yandex and to publish the results of your search on their site).

It is known that the data in Yandex.Webmaster are always greatly delayed, and at odds with reality: information can be obtained through the issuance of (number of indexed pages, links, etc.) appears in AWM only a few days.

But since in Yandex opposes direct scraping issue, they have made an alternative using the receive data in xml.

By the way, before to Я.XML all could access, just confirming the phone number in the account (if I am not mistaken, for unconfirmed accounts had a limit of 1000 queries), but about a year or two ago in Yandex abandoned this policy and introduced a metric which is strongly correlated with traffic (or to be more precise, with "number of hits in the results").

Generally it is a very interesting metric (for example, the more often a site shows in the results, the more often anti-virus Yandex bot checks the page). Last year I just got it, and, responsiv 3кк requests from different groups. These data can be condemned in a separate article. And for the first time this term I heard on Yet Another Conference 2013, in the security Department.

But back to XML.

the experiment:

1. Was taken 2,778 requests from 4 groups (Commerce, women's topics, tourism, information requests)
2. Almost simultaneously launched the parsing of search results (xml parsed longer due to internal restrictions)
3. To access the Я.XML took their limits from Ya. Webmaster for scraping issue — closed proxy service. For the sake of cleanliness of experiment has been specified, the region is lr=1 (geography IP proxy — RU (for huizu), in the address field is specified Moscow).

The last update of the database was on January 9, and the data collected 13, so that the storm of the issuance no longer exists and the data can be considered reliable.

A little about the disadvantages of XML:

the

does not give the contents of the title, only the snippet
in the snippet there is a difference with a snippet of the issue
shows the results (so you can evaluate the competitors and the degree of commercialization of the request)
indicates whether the Yandex services in the results

(Also, I'm on my other projects, check the domains on the indices (indexing, TIC, etc.). When checking the index through XML very often Yandex numbers change, I had noticed. The difference can reach a hundred pages (plus or minus), sometimes the index is supposedly 0.)

Now the conclusions:

Most of the discrepancy is plus or minus 1 position.
A little less than plus or minus 5 positions
Few other sites on the positions.

in numbers:

Match positions — 75%
Not the same — 25%

I will be glad to instructions on possible errors, and especially by comparison with the results of similar experiments.

the

Random sample with the highlighted data: yadi.sk/i/i4imHJ8qmvgTd
All the results to csv: yadi.sk/d/X5SYWxl7mvgUe
database Dump: yadi.sk/d/O5viMlrRmvgKD

Numbers in the results is the frequency of requests for wordstate (shared and exact), they have a particular role to play, but there is

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express

How important is API or compare Яндекс.XML and real results

Комментарии

Отправить комментарий

Популярные сообщения из этого блога

A series of meetings of Dmitry Grishin with students

Habr dying?

Developing for Sailfish OS: Working Bluetooth