Saturday, April 25, 2015

How to Audit a Website for Search Engine Optimization

Website Audit


Methodology and technical audit SEO case study on a site
The semantic Audit: Methodology and Tools
Audit eCommerce site

Call it semantics audit, audit editorial content or the content is the heart of any audit website and web marketing strategy. I will present my methodology and some tools to achieve a semantic level audit SEO and content marketing.

    The objectives of a semantic audit
    How Google evaluates the quality of content?
    Tools to find keywords
    Select and weight keywords
    Categorize keywords
    Find content creation opportunities
    The duplication of content
    Title pages
    Text size
    Conclusion

 The objectives of a semantic audit
A semantic audit helps you accomplish more of the following:

Categorize a website, an editorial site, an ecommerce website or other
Define the pages that need to enrich the content
Create new pages: landing pages, articles, hubs pages, ...
Targeting your netlinking strategy
Find the ways out of Google Panda
Determine the best position for pages which keywords
Finally, the aim of these objectives is to create traffic and conversions

To sum up, the objectives of a semantic audit are to combine content marketing requirements and the SEO, the needs of users and search engines.
How Google evaluates the quality of content?

Before going further on my methodology of semantic analysis, it is good to understand how google judge the relevance and quality of content or at least capture the main fundamentals. Google has several advanced semantic algorithms that allow it to be more relevant than when it was enough to repeat several times the same keyword or create as many pages as targeted keywords. I will not go into the details of each algorithm because it would quickly become overpowering and presumptuous to say conjectures or hypotheses on what remains in many ways a well kept secret.
The semantic page-rank

Also called topic-sensitive pagerank, pagerank is a traditional variant that attaches to the theme and contextualize links. Contrary to the calculation of historical pagerank that attaches to the number and PR binder pages, the semantic pagerank provides a more qualitative dimension. Another major difference, a single page can have several different semantic pagerank on several different subjects.
The Hilltop algorithm

With the Hilltop algorithm, it is also on the border between linking and semantic relevance, since this algorithm looks at the relationship between the documents considered experts on a particular area and authoritative pages (which are external links to pages experts)
Latent Semantic Indexing (LSI)

The LSI for "latent semantic indexing" establishes the relationship between a document and a semantic corpus. A semantic corpus is the set of terms used around the same concept by a group of web pages (or documents). On this model are grafted keywords weighting methods such as TF-IDF popularized (or not ;)) by Peyronnet brothers. This system measures the relevance of a content based on the use of common and rarer terms of the thematic corpus.

Co-citation

The terms often associated together on a thematic quote reinforces the relevance of these words on this topic. For example, if your brand is often cited by A and B C on a theme, your brand becomes relevant course on the subject. The co-citation works without any link only with semantic proximity.

Or Hummingbird

The Hummingbird algorithm, also known as the hummingbird is established since August 2013. It is the application in modified versions of some of the algorithms seen above. Hummingbird tries to respond to user queries by analyzing the semantic field and therefore not only taking into account the exact query keywords. It especially allows Google to be more relevant on long tail of queries.

The quality of content for Google

The semantics as seen by Google is the result of objective calculations and not the value of a quality perceived by the user that is subjective, even if the objective quality tries to get closer. We'll see later how to define the body of a particular keyword, but when writing content, you can hardly do mathematical calculations every word that lands on the paper or computer screen.

At this stage and not too break the brains, what to remember when writing content:

Define keywords to place in its content
Watching some sites that are positioned well on these objectives keywords to draw the semantic field used.
Use of the rarest and also bring your value terms
Use a rich but simple vocabulary.
With these basics in mind, write content for the user is the one that must be convinced.
Use clear and natural language, call a spade a spade.
Structuring and aerate your content
Naturally repeat your main keyword
Place similar terms
Contextualize your pages to other pages that reuse your key phrases.
Beware of density concepts, it varies from topic to another and it evolves over time.
Think content marketing, your visitors interested while serving your conversion goals.
Try not to be perfect it slows inspiration.

Tools to find keywords

The goal is to create a list of keywords from several sources and on which we will then be able to work. We will proceed in two steps:

Extract keywords that your site is already set
And those where you are not visible, your missing semantic universe.

Even the expressions for which you already have positions have an interest because it can be pages that generate only a single visit and / or incorrectly positioned pages. It is in this case your immediate semantic potential (reassembled easily)
Extract keywords "visible"

1 / Analyze logs of search engines

Logs a website records all visits to pages. Whether on a dedicated server or shared hosting, the web host has a legal obligation to keep the logs for one year minimum. For SEO, interesting data are visits (date, volume, queries used, http code ...) of visitors from search engines and Google visits the pages (or crawl googlebot). Despite the not-provided and if the site has enough visits, we can usually recover enough keywords to make it representative.

Here is an excerpt of a line of logs containing a visit with the query typed on Google:

... HTTP / 1.1 "200 15040 "http://www.google.fr/url?sa=t&rct=j&q=peut%20on%20arroser%20une%20dalle%20beton%20lisser&source=web&cd=7&ved=0CF8QFjAG&url=http%...

The query typed by the user is behind the "q ="

2 / Recover data from analytics tool

Whether Google analytics or other, it is quite easy to extract the keywords that have generated visits.

3 / Use visibility tool

Positions monitoring tools used to query large databases of keywords, several thousand or even millions, under which are necessarily a part of the terms on which position your site. To name only one and best known: Search metrics.

 

With that, we already have a good start list.
Enlargement keywords

We will try to go further to expanding our list.

1 / Retrieve data from the internal search engine

If an internal search engine is installed on the site and is sufficiently used, one can extract the research and thus learn more about the behavior of its visitors. This is often a source rarely used but can be rich ultra qualified teachings as this is your site.

2 / Use a keyword research tool

There are many but I will name one because the other often relies on it, it is simply the keyword planning tool Google Adwords. Whether you have a PPC campaign or not, you can use it free. You enter its keyword list to start in the tool that will suggest other related keywords.

3 / Analyze competition

We can use the same methods used on its own site to analyze competitors, except those requesting access to the website of course.

4 / Analyze the research on social networks

There are also several, including some paid (and yes it is a business). A free tool that analyzes trends, research into social networks and even reputations is Social Mention.

5 / analyze what is being sought on a big forum

If the site ready theme to it and there are forums or part of a forum that covers the same site, then the analysis of forum discussions will extract the most frequently asked questions.

6 / Remove the suggestions of the search engines

Google or even Bing displays suggestions related to the user's query in semie AutoComplete or directly on the results page. Tools used to collect these keyword phrases as ubersuggest

7 / Find keywords from a semantic corpus

As part of a semantic analysis for one or a few keywords, it is possible to know the expressions used in the same semantic field as a keyword. With the tool of statistical analysis Text Stat keywords, you can extract the particular semantic corpus pages positioned in the top ten results of Google or older. We saw above, is to use key words from the same semantic universe is important for Google. This method is effective but remains an approximated version of the actual semantic corpus used by Google.

8 / Use semantic web

Free databases allow access to structured information from the web. For example, DBpedia, one of the Open Data project gathers information from Wikipedia in a structured and searchable form. Access to these data is done via the SPARQL query language. The possibilities are immense and the interest as part of semantic analysis, obvious. It surely will be a future article because it's a bit complicated to talk about here.
Select and weight keywords

Once the keyword list retrieved by these semantic analysis tools, we will classify them according to certain criteria:

Research potential
The relevance
The level of competition
The click rate
Web marketing and other aspects (social engagement, conversion, content marketing)

Categorize keywords

As part of a semantic audit, analysis is usually done at the macro level although for certain strategic pages it is necessary to have an analysis at the micro level. The categorization of this basic keyword will help reveal occurrences and the most used common core. This is useful especially for the sectioning of a website and thus define, prioritize optimally the different navigation menus.

Here is an example with an extract of occurrences of keywords on a health forum:

categorization, keywords

It corresponds to keyword groups that start with those keywords. We find fairly obvious sets and other less, especially when down in the table. These data cross-referenced with the selection criteria and weighting we saw used to perform the categorization of a website for example.
Find content creation opportunities
Harnessing the semantic potential

This semantic analysis work also sets the immediate potential of terms on which the site can quickly position themselves. Indeed, positions on page 2 of goggle can rise faster than positions on page 10 for example. In this case, go back to these pages, such conduct on the enrichment of content and SEO techniques such as PR sculpting to optimize.

Bridging the missing universe

The audit will also reveal content keywords on which the site is not at all present. This is useful for creating landing pages, new categories, articles and pages that will fill these semantic universes. This is not to create pages for each expression but for the most competitive phrases. It is often necessary to have a dedicated page for the keywords that the level of competition is high.

An example of visualization of universe missing a customer site:

Semantic universe-missing

The size is the number of times the keywords are used by competitors and not the customer. The arrows correspond to the relationship between words (e.g. essential oil ...). There is the picture that goes with it, more comprehensive and more detail but this view has the merit of being ... more visual.

 

We realize that in this example the customer is very visible on expressions composed of "essential oils corresponding products but no specific categories. The recommendation was therefore to create them. Category pages associated with the brand (less visible on this visual) have also been the subject of SEO optimizations.

Tips for creating content easily

Creating new pages is not always the first solution to increase its traffic; it often can even gain traffic deleting pages. I speak in the technical part of the audit SEO, this is to remove unnecessary pages SEO coming wasting crawl and dilute the juice at the expense of important pages. But once the cleaning is done, we can get on with the creation of new useful pages for SEO. And for that, there are some tips:

The partial opening of tags: Partial because it is the analysis of content that will decide their creation or not, as well as the possibility of the results on these tags.
The partial opening of the internal search engine: the one must also take precautions and without it comes cannibalize other strategic pages.

For e-commerce sites, the partial opening of facets filters: Same principle, there is not any opening it to seal its SEO.
The user-generated content (UGC English: User Generated Content): reviews, social commentary, ...


The duplication of content

In a semantic audit, there is also the detection of duplicate pages or poor in content. These pages are penalizing for goggle in the Panda filter. To identify:

It makes a categorization of URL classifications identified as being duplicated.
Crawl of tools for analyzing the content of pages duplication rate them

Here's what he can do with a categorization of duplicate urls:

content-analysis-duplication

In this example, 49% of duplicate urls are session urls (oscsid). Once the content replication sources and the poor content pages are identified, we can find solutions (indexation, canonical, redirect, content enrichment,) to clean the Google index and get out of the penalty google Panda.

Title pages

The importance of the factors 'on-page' as the presence of keywords in the title, H1 and H2 securities declined as shown in this study published on Searchmetrics Moz:

correlations-keywords-h1-h2-positions

Here we see the correlation between the presence of keywords in H1 and H2 titles and positions on Google. Moreover, Google often changes the title of the results of its pages web pages so that it reflects the query typed by the user. That said, it's still important elements and it's part of analytical points in semantic audit.

Size titles

Beyond the optimization of titles for the placement of keywords, one can go further and be interested in the correlation between the size of titles and visits:

Size-of-titers

X-axis was the number of words in the title; ordinates right, the sum of visits; and left-hand ordinate, the average visits per url. Thus we see in this analysis that the pages that are most visited are those who averaged six words in the meta title. The pages with a title of 4 words are on average 2 times less visits a page with 6 words in the title.

We must not take these numbers to a generality, for as many positioning factors are related to the theme of the site. Hence the interest to this type of personalized analysis.
Text size

This is an often asked question "What is the correct text length in SEO? ". Difficult to answer in general, it depends on the subject but it is possible to know for a site or sites of the same semantic field:

Text Size-of-SEO

There are several remarkable things in this analysis on the size of the articles of a site:

14% of pages are 73% of visits
These 14% of pages have a text size greater than 7000
However, 56% of pages have a text size less than 6000 for 7%   of visits.

The 7000 takes into account all of the characters on the page, which actually corresponds, in the case of this customer example, a body of text of 2500 characters. The recommendation is therefore to articles of at least 2,500 characters for this site. You tell me, you could have said no to everything, perhaps, but it requires resources to write and why tire for the same result. In this way, it was proof that walking and this is a strong argument for it to be implemented.