Our aim is to give you the best information on the web. This website will give you up to date, relevant useful information and guides on a huge range of topics.
Category: Google - Sandboxing - New Google patent proves
"sandbox" exists
|
|
|
New
Google patent proves "sandbox" exists
I have read that this patent confirms the sandbox theory, which I feel is quite untrue. The fact that Google have decided to include historical data as one of the variables in website scoring is new for the algorithm but not as a method*. Remember that the patent seems really long, but lawyers like cover everything so there's a lot of repetition.
A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data.
Links:
Claim 22 (23, 25) and claim
23 (24), claim 1(26, 31, 32), claim 6 (27), claim 26 (28)
Content:
The frequency at which
the content of the document changes:
Method 6(11) basically tells you that they will be able to place a date on the last time content changed on your website. How do they propose to do that do you think?
This tells you that as well as parts of your pages, they will also take into account the number of pages changed and assess it in the same way.
Relevancy and fresh content are important.
Document Division:
Inclusion in search results:
This is surely the most interesting claim dont you think? The history variable will include the number of times your documents appear in search results, and the number of times people choose to go there. Perhaps ranking becomes very important now.
You get more points the more popular your document proves to be.
These claims are ensuring that you keep users interested by providing lots of new and interesting content on a regular basis and a good reason for them to return. This is helped by syndication.
(Search terms and queries) This ensures that those pertaining to be associated to certain keywords, when in fact they are not, do not get included in them in future. It ensures that websites stick to their subject matter, thus making classification a lot easier.
Stale documents: Claim 19(20, 21)
Traffic
Domains:
Ranking:
Penalizations include:
Also, none of this is yet implemented is it so how does this explain the sandbox and the amount of time people believe it has been about?
I think its a great patent, well done Google. You still rock. I knew you were going to throw something up. This definitely makes them much more likely to provide great results. Lets still keep in mind that MSN search is a baby just out of beta though.
I also wanted to add that everyone has immediately assumed that the patent and the methods presented in it referred explicitly to web search. Digital libraries use these methods and it seems really probable to me that they may use it for Google Scholar.
* Web Structure, Age and Page Quality - Baeza-Yates, Saint-Jean, Castillo (2002) "This paper is aimed at the study of quantitative measures of therelation between Web structure, age, and quality of Web pages.Quality is studied from different link-based metrics and theirrelationship with the structure of the Web and the last modification time of a page"
"as expected PageRank is biased against new pages..."
"we also gathered time information (last modified date) for each page as informed by the web servers."... "here we focus on web page age, that is, the time elapsed after the last modification. As the web is young, we use months as a unit, and our study considers only the last 3 years as most websites are that young.
"the low correlation between pageRank and authority is suprising because both ranks are based on incoming links..." "Notice the correlation
between hub/authority, which is relatively low but with a higher value
for pages about 8 months old..."
|
|
|
|
|
