
Website Audit
Methodology and technical audit SEO case study on a site
The semantic
Audit: Methodology and Tools
Audit eCommerce site
Call it semantics audit, audit editorial content
or the content is the heart of any audit website and web marketing strategy. I
will present my methodology and some tools to achieve a semantic level audit
SEO and content marketing.
The
objectives of a semantic audit
How
Google evaluates the quality of content?
Tools
to find keywords
Select
and weight keywords
Categorize keywords
Find
content creation opportunities
The
duplication of content
Title
pages
Text
size
Conclusion
The
objectives of a semantic audit
A semantic audit helps you accomplish more of
the following:
Categorize a website, an editorial site, an ecommerce website or other
Define
the pages that need to enrich the content
Create
new pages: landing pages, articles, hubs pages, ...
Targeting your netlinking strategy
Find
the ways out of Google Panda
Determine the best position for pages which keywords
Finally, the aim of these objectives is to create traffic and
conversions
To sum up, the objectives of a semantic audit
are to combine content marketing requirements and the SEO, the needs of users
and search engines.
How Google evaluates the quality of content?
Before going further on my methodology of
semantic analysis, it is good to understand how google judge the relevance and
quality of content or at least capture the main fundamentals. Google has
several advanced semantic algorithms that allow it to be more relevant than
when it was enough to repeat several times the same keyword or create as many
pages as targeted keywords. I will not go into the details of each algorithm
because it would quickly become overpowering and presumptuous to say conjectures
or hypotheses on what remains in many ways a well kept secret.
The semantic page-rank
Also called topic-sensitive pagerank, pagerank
is a traditional variant that attaches to the theme and contextualize links.
Contrary to the calculation of historical pagerank that attaches to the number
and PR binder pages, the semantic pagerank provides a more qualitative
dimension. Another major difference, a single page can have several different
semantic pagerank on several different subjects.
The Hilltop algorithm
With the Hilltop algorithm, it is also on the
border between linking and semantic relevance, since this algorithm looks at
the relationship between the documents considered experts on a particular area
and authoritative pages (which are external links to pages experts)
Latent Semantic Indexing (LSI)
The LSI for "latent semantic indexing"
establishes the relationship between a document and a semantic corpus. A
semantic corpus is the set of terms used around the same concept by a group of
web pages (or documents). On this model are grafted keywords weighting methods
such as TF-IDF popularized (or not ;)) by Peyronnet brothers. This system
measures the relevance of a content based on the use of common and rarer terms
of the thematic corpus.
Co-citation
The terms often associated together on a
thematic quote reinforces the relevance of these words on this topic. For
example, if your brand is often cited by A and B C on a theme, your brand
becomes relevant course on the subject. The co-citation works without any link
only with semantic proximity.
Or Hummingbird
The Hummingbird algorithm, also known as the
hummingbird is established since August 2013. It is the application in modified
versions of some of the algorithms seen above. Hummingbird tries to respond to
user queries by analyzing the semantic field and therefore not only taking into
account the exact query keywords. It especially allows Google to be more
relevant on long tail of queries.
The quality of content for Google
The semantics as seen by Google is the result of
objective calculations and not the value of a quality perceived by the user
that is subjective, even if the objective quality tries to get closer. We'll
see later how to define the body of a particular keyword, but when writing
content, you can hardly do mathematical calculations every word that lands on
the paper or computer screen.
At this stage and not too break the brains, what
to remember when writing content:
Watching some sites that are positioned well on these objectives
keywords to draw the semantic field used.
Use of
the rarest and also bring your value terms
Use a
rich but simple vocabulary.
With
these basics in mind, write content for the user is the one that must be
convinced.
Use
clear and natural language, call a spade a spade.
Structuring and aerate your content
Naturally repeat your main keyword
Place
similar terms
Contextualize your pages to other pages that reuse your key phrases.
Beware
of density concepts, it varies from topic to another and it evolves over time.
Think
content marketing, your visitors interested while serving your conversion
goals.
Try
not to be perfect it slows inspiration.
The goal is to create a list of keywords from
several sources and on which we will then be able to work. We will proceed in
two steps:
Extract keywords that your site is already set
And
those where you are not visible, your missing semantic universe.
Even the expressions for which you already have
positions have an interest because it can be pages that generate only a single
visit and / or incorrectly positioned pages. It is in this case your immediate
semantic potential (reassembled easily)
Extract keywords "visible"
1 / Analyze logs of search engines
Logs a website records all visits to pages.
Whether on a dedicated server or shared hosting, the web host has a legal
obligation to keep the logs for one year minimum. For SEO, interesting data are
visits (date, volume, queries used, http code ...) of visitors from search
engines and Google visits the pages (or crawl googlebot). Despite the
not-provided and if the site has enough visits, we can usually recover enough
keywords to make it representative.
... HTTP / 1.1 "200 15040
"http://www.google.fr/url?sa=t&rct=j&q=peut%20on%20arroser%20une%20dalle%20beton%20lisser&source=web&cd=7&ved=0CF8QFjAG&url=http%...
The query typed by the user is behind the
"q ="
2 / Recover data from analytics tool
Whether Google analytics or other, it is quite
easy to extract the keywords that have generated visits.
3 / Use visibility tool
Positions monitoring tools used to query large
databases of keywords, several thousand or even millions, under which are
necessarily a part of the terms on which position your site. To name only one
and best known: Search metrics.

Enlargement keywords
We will try to go further to expanding our list.
1 / Retrieve data from the internal search engine
If an internal search engine is installed on the
site and is sufficiently used, one can extract the research and thus learn more
about the behavior of its visitors. This is often a source rarely used but can
be rich ultra qualified teachings as this is your site.
2 / Use a keyword research tool
There are many but I will name one because the
other often relies on it, it is simply the keyword planning tool Google Adwords.
Whether you have a PPC campaign or not, you can use it free. You enter its
keyword list to start in the tool that will suggest other related keywords.
3 / Analyze competition
We can use the same methods used on its own site
to analyze competitors, except those requesting access to the website of
course.
4 / Analyze the research on social networks
There are also several, including some paid (and
yes it is a business). A free tool that analyzes trends, research into social
networks and even reputations is Social Mention.
5 / analyze what is being sought on a big forum
If the site ready theme to it and there are
forums or part of a forum that covers the same site, then the analysis of forum
discussions will extract the most frequently asked questions.
6 / Remove the suggestions of the search engines
Google or even Bing displays suggestions related
to the user's query in semie AutoComplete or directly on the results page.
Tools used to collect these keyword phrases as ubersuggest
7 / Find keywords from a semantic corpus
As part of a semantic analysis for one or a few
keywords, it is possible to know the expressions used in the same semantic
field as a keyword. With the tool of statistical analysis Text Stat keywords,
you can extract the particular semantic corpus pages positioned in the top ten
results of Google or older. We saw above, is to use key words from the same
semantic universe is important for Google. This method is effective but remains
an approximated version of the actual semantic corpus used by Google.
8 / Use semantic web
Free databases allow access to structured
information from the web. For example, DBpedia, one of the Open Data project
gathers information from Wikipedia in a structured and searchable form. Access
to these data is done via the SPARQL query language. The possibilities are
immense and the interest as part of semantic analysis, obvious. It surely will
be a future article because it's a bit complicated to talk about here.
Select and weight keywords
Once the keyword list retrieved by these
semantic analysis tools, we will classify them according to certain criteria:
Research potential
The
relevance
The
level of competition
The
click rate
Web
marketing and other aspects (social engagement, conversion, content marketing)
Categorize keywords
As part of a semantic audit, analysis is usually
done at the macro level although for certain strategic pages it is necessary to
have an analysis at the micro level. The categorization of this basic keyword
will help reveal occurrences and the most used common core. This is useful
especially for the sectioning of a website and thus define, prioritize
optimally the different navigation menus.
Here is an example with an extract of
occurrences of keywords on a health forum:
categorization, keywords
It corresponds to keyword groups that start with
those keywords. We find fairly obvious sets and other less, especially when
down in the table. These data cross-referenced with the selection criteria and
weighting we saw used to perform the categorization of a website for example.
Find content creation opportunities
Harnessing the semantic potential
This semantic analysis work also sets the
immediate potential of terms on which the site can quickly position themselves.
Indeed, positions on page 2 of goggle can rise faster than positions on page 10
for example. In this case, go back to these pages, such conduct on the
enrichment of content and SEO techniques such as PR sculpting to optimize.
Bridging the missing universe
The audit will also reveal content keywords on
which the site is not at all present. This is useful for creating landing
pages, new categories, articles and pages that will fill these semantic
universes. This is not to create pages for each expression but for the most
competitive phrases. It is often necessary to have a dedicated page for the
keywords that the level of competition is high.
An example of visualization of universe missing
a customer site:
Semantic universe-missing
The size is the number of times the keywords are
used by competitors and not the customer. The arrows correspond to the
relationship between words (e.g. essential oil ...). There is the picture that
goes with it, more comprehensive and more detail but this view has the merit of
being ... more visual.

Tips for creating content easily
Creating new pages is not always the first
solution to increase its traffic; it often can even gain traffic deleting
pages. I speak in the technical part of the audit SEO, this is to remove
unnecessary pages SEO coming wasting crawl and dilute the juice at the expense
of important pages. But once the cleaning is done, we can get on with the
creation of new useful pages for SEO. And for that, there are some tips:
The
partial opening of tags: Partial because it is the analysis of content that
will decide their creation or not, as well as the possibility of the results on
these tags.
The
partial opening of the internal search engine: the one must also take
precautions and without it comes cannibalize other strategic pages.
For
e-commerce sites, the partial opening of facets filters: Same principle, there
is not any opening it to seal its SEO.
The
user-generated content (UGC English: User Generated Content): reviews, social
commentary, ...
The duplication of content
In a semantic audit, there is also the detection
of duplicate pages or poor in content. These pages are penalizing for goggle in
the Panda filter. To identify:
It
makes a categorization of URL classifications identified as being duplicated.
Crawl
of tools for analyzing the content of pages duplication rate them
Here's what he can do with a categorization of
duplicate urls:
content-analysis-duplication
In this example, 49% of duplicate urls are
session urls (oscsid). Once the content replication sources and the poor
content pages are identified, we can find solutions (indexation, canonical,
redirect, content enrichment,) to clean the Google index and get out of the
penalty google Panda.
Title pages
The importance of the factors 'on-page' as the
presence of keywords in the title, H1 and H2 securities declined as shown in
this study published on Searchmetrics Moz:
correlations-keywords-h1-h2-positions
Here we see the correlation between the presence
of keywords in H1 and H2 titles and positions on Google. Moreover, Google often
changes the title of the results of its pages web pages so that it reflects the
query typed by the user. That said, it's still important elements and it's part
of analytical points in semantic audit.
Size titles
Beyond the optimization of titles for the
placement of keywords, one can go further and be interested in the correlation
between the size of titles and visits:
Size-of-titers
X-axis was the number of words in the title;
ordinates right, the sum of visits; and left-hand ordinate, the average visits
per url. Thus we see in this analysis that the pages that are most visited are
those who averaged six words in the meta title. The pages with a title of 4
words are on average 2 times less visits a page with 6 words in the title.
We must not take these numbers to a generality,
for as many positioning factors are related to the theme of the site. Hence the
interest to this type of personalized analysis.
Text size
This is an often asked question "What is
the correct text length in SEO? ". Difficult to answer in general, it
depends on the subject but it is possible to know for a site or sites of the
same semantic field:
Text Size-of-SEO
There are several remarkable things in this
analysis on the size of the articles of a site:
14% of
pages are 73% of visits
These
14% of pages have a text size greater than 7000
However, 56% of pages have a text size less than 6000 for 7% of visits.
The 7000 takes into account all of the
characters on the page, which actually corresponds, in the case of this
customer example, a body of text of 2500 characters. The recommendation is
therefore to articles of at least 2,500 characters for this site. You tell me,
you could have said no to everything, perhaps, but it requires resources to
write and why tire for the same result. In this way, it was proof that walking
and this is a strong argument for it to be implemented.