As looking and not looking for Google Maps

To this question (see subject) I sometimes have to answer, because I'm working on an alternative service of local search. Google vaguely describes where the data comes from. The main source for the article is my own observation and this patent application.



The main misconception is that "Google Maps finds information about the companies on the Internet". It is not so. Information about your company may be on hundreds of indexed web pages, but never get into the results on Google Maps .


Unlike web search, which searches for the index of cached web pages, Google Maps contains structured directory. Every record of the company contains the key-value field with the data understandable to machines. This should find "a restaurant with a vegetarian menu and pre-order within a radius of 10km from the Kiev station", but most of the catalog contains the exact values of only the address and phone number.

So it's not like Google is looking for in its own directory, and from which to extract information.

location data on Google Maps


As reported by Google, "catalog combines information from different sources to give the best result." The sources are divided into two groups:

the Structured and semi-structured is the data sources that are easy to put in straightforward to program the key-value. This is usually:
the
    the
  • commercial base businesses bought
  • the
  • web sites contain large directories of companies; the data from these sites get personal crawler that regular expressions vparivaet information from the directory
  • the
  • Google Local Business Center where business owners themselves fill information
  • the
  • KML (and similar) files, which are used to display points using Google Maps API
  • the
  • custom maps

the Unstructured is indexed the web sites that may contain information about the enterprise, but data from them are not amenable to structuring.

How is structured information


This process can be described in three basic steps:
    the
  1. Data given to the form key-value, come from multiple structured sources
  2. the
  3. Data about the company clusterservice: compares values from different sources and for each is determined by the accuracy and weight.
  4. the
  5. Structured data are supplemented by unstructured *

*

Structured data usually contain accurate but sparse information about the company. And it is difficult to
the
    the
  • search; how to find "private kindergarten" if the directory does not contain a field of the form of ownership?
  • the
  • ranking; how to determine what "pharmacy" must be in the results first, if all the data from a single directory?
So when the enterprise defines the main fields (name, address, telephone number), is a web search by request:
nazvanijami+adreste
and found pages (and most importantly the keywords from the found pages) associated with the company data.

How not


You can cite several examples when the algorithm leads to incorrect results.

Looking for "hostel" and find US consular office

Looking for a hostel and find the U.S. consular section

the Reason: this hostel associations continually publish the lists of embassies and consulates. The consular office was in a catalog from one of the structured sources, but it was associated with the site hihostels.com.ua
Looking for "rent" and find GEC

rent a flat and find GEC

the Reason: rental property placed at the lists of utilities. GEC hit the Google with one of the bases of the enterprises, but were associated with the site toprealty.org.ua

What to do to get the results in Google Maps


It is obvious that what was not contained in the web information about the company, the most important thing that this information got into one (and preferably several) structured sources. The problem is that Google does not list databases and directories from which the information is taken. The only known place is the Google LBC.

total


Google Maps is not as transparent as Google Web Search:
the
    the
  • Most users are not aware of how looking for Google Maps
  • the
  • is Often impossible to determine the source of information
  • the
  • Sometimes the result is not consistent with the principle "least astonishment"

I think Google can better.

I would be grateful for corrections, additions and comments.

Sources


Generating structured information (patent application US 2006/0200478 A1)
Google's Local Search Patent Application (at SEO by the Sea)
Local listings: Where do they come from?
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Why I left Google Zurich

2000 3000 icons ready — become a sponsor! (the table of orders)

New web-interface for statistics and listen to the calls for IP PBX Asterisk