SEO is simply following good web development practices
The purpose of this article is to focus on the internal structure of the site for SEO and ways to maximize document relevancy for user searches. A general rule of thumb is to follow good usability and accessibility practices which essentially result in better ranking. The following are not given in order of importance. A well coverage in all of these areas is encouraged for a sound, stable site that will gain its authority over time. These are:
HTTP and URLs
If a URL is not reachable, and the server doesn't wish to indicate any further reason why the request has been refused then an HTTP 404 (Not Found) status code is given. All the URLs on the site which gives this response should be purged immediately: either with a redirect or HTTP 410 (Gone) status code is a good way to indicate this. Similarly, remove all requests that resolve in HTTP 501 (Not Implemented) status.
Changing temporary redirects like HTTP 302 (Found) to permanent redirects, HTTP 301 (Moved Permanently) where appropriate could signify that a URL can be safely found elsewhere in the future. Until it is changed the indexer can't give full credit to the resource since it may change. Therefore all the information (i.e. backlinks that existed, PR values, authority) will be held off temporarily [1].
Avoid duplicate content pages or minimize repeated sub-content in any document. If duplicate URLs exist, use the HTTP 301 status code and point all of them to an authoritative edition of the content.
If an URL is no longer available on the site (no forwarding resource) for reasons outside of an HTTP 404, it would be appropriate to indicate with an HTTP 410 (Gone). It simply says that a resource is no longer at that location and advises the visitor to remove all such references -- [2].
Use the robots.txt file to further indicate which URIs should be allowed or disallowed for access. For sensitive data that is not desired to be indexed, use .htaccess and/or .htpasswd to protect a file or a directory. Furthermore, server-side sessions can also be used along with allowing only authorized logins to certain sections on the site. Rest assure, site crawlers do not try to guess your login and password -- [3].
With respect to URL naming convention for a document, if usability is kept in mind, a number of concerns can be solved at the same time. Using Rewrites by the server can accomplish this easily. By making the document-specific URL easy to read can help the user bookmark and remember. Using only the physical file names as a URL can be confusing and certainly not friendly. Rewrites can also help indicate a document to be permanent and reliable for future requests. Search engines look for indicators like query strings, filename extensions or even the length of the URL to determine if a page is dynamically generated to determine if a URL is worth the visit with respect to their available resources. They also will not try to crawl a URL with such common machine generated URL patterns unless one or more external trustworthy sites contain a backlink to such URLs. Rewrites are a way to reference a requested URL and transform that into a version that is suitable for the server, which can then be processed -- [4]. A concise, meaningful URL also appears on SERPs as a match. Pick a cool URL for your document and stick to it! -- [5]
The efficiency and complexity of the server and server-side scripts should be at importance due to any adverse impact on crawlers and human visitors. If there to be common downtimes or slow transfer speeds, this could certainly impact experience. Also keep in mind total page size -- [6]
Avoid using any user-agent detection to customize information depending on the visitor. This not only causes a performance issue but also makes it difficult to maintain. Keep in mind that user-agent detection is also not always reliable.
Meta information
HTML meta description around 150 characters should be sufficient. Although it doesn't hurt to be a little more, this data should contain the most concise information about the document. The uniqueness of this information also plays a fair role as far as SERPs are concerned.
Meta keywords on the other hand are not quite necessary since it is the responsibility of the search engine indexers to determine the nature and the relevancy of the document. For the purposes of accuracy, they can't rely on what the document claims it to be. There comes a transition on the Web which provides this sort of meta information about the document (I will revisit this later in this article). Today, the results gained from meta keywords are negligible.
A page title should be as specific and concise as possible with respect to the document. This will insure its uniqueness and click-through in SERPs. A structure similar to "Page name | Section name | Site title - Tagline" is encouraged for clarity, uniqueness and better usability for the visitor. Focus on delivering a title that spans from specific (closer to the beginning) to general keywords. The length of the title needs no more then 80 characters.
Separation of code layers
Less interpretation increases accuracy.
Although todays user-agents (crawlers included) are quite forgiving, as we move forward it becomes vital to maintain the well-formedness of HTML documents. This insures the indexers to both successfully interpret the nature of the document and organize it appropriately and also for crawlers to reach other linked URLs with less time spent.
Avoid any critical functionality which does not degrade gracefully in the absence of certain user-agent futures. For instance, if an obtrusive Javascript reveals content that is not initially within the HTML document and desired to be indexed by search engines, then other ways of presenting this data should exist. Moreover, this also makes a document more accessible whether it is a human or a machine looking at it. Any given content should be reachable without the need of Javascript if it is in the interest of appearing in SERPs. The technology to further seek content that is available through behaviour (Javascript) hasn't matured. The separation of structure and data (HTML) from behaviour is always important. It is important to cover the bare minimums before adding such features. Similarly, any content that needs to be indexed should not be hidden behind an HTML form
.
The readable content in a document should be written with a natural language voice and focused to a human reader. Today, search engines are not capable of understanding the meaning of a document; however they do look for other indicators within the document for relevancy in response to user searches.
Single content rule
A user looking for information with the keyword phrase `foo bar baz` is more focused on their search than using `foo bar` keywords.
A proper keyword strategy should be identified before most of the heavy duty coding or organization of the data, or writing site's main content. Determining the ways and the behaviours a site's visitors look for information can be established after extensive analysis. However knowing that this is not always possible, it may be important to concentrate towards two or three keyword combinations. Given the complexity of the content topic, the number of keywords may be greater. In this case, keyword proximity plays an important part. Since the increase of number of keywords indicate a more focused search by the user than any information that is in close - but relevant - proximity can increase the likelihood of click-through as they appear in the matched description part of SERP.
Keep it simple, and concise.
An increase in keyword density is not necessarily a good indicator for the relevancy of a document and therefore it does not equate to better rankings. A well written document will naturally use keywords that are appropriate and in proportion. Search engine algorithms essentially compare similar documents to get a better understanding of the nature of the document. If a document is not well written and gives off-balanced scores then it will raise flags and possibly mark it as not relevant as it indicates a document that is written for the machine and not for the human reader. Keep in mind that indexing is in place to assist human searches.
It is not in the interest of current crawlers to interpret presentation (CSS) of structure and therefore they will only parse through content in the HTML in the order it appears. In the case of table based layouts then, the content should make sense when the page is linearized, as it otherwise may jeopardize keyword proximity -- [7].
The indexer starts to parse a page from the top and it is in the search engine's interest to establish the relevancy of the page as fast as possible. Therefore, relevant and important content should be up higher in hierarchy in the HTML source order. Writing a page's markup that contains only the essentials, as well as content that is concise can increases the likelihood of successful rankings.
Providing h1
, h2
and other heading content for a page that consists mostly of links and dynamic content will not make the page any more authoritative or show any improvement in SERPs. Page headings should in other words describe the content within a page and follow a hierarchy. If keyword saturation is the goal, it should be done within paragraphs with supporting content.
Document inter-connectivity
Internal linking between documents should be used contextually and relevant to the targeted page. Consider the following extract from one of Google's patents:
a search engine modifies the relevance rankings for a set of documents based on the inter-connectivity of the documents in the set. A document with a high inter-connectivity with other documents in the initial set of relevant documents indicates that the document has "support" in the set, and the document's new ranking will increase. In this manner, the search engine re-ranks the initial set of ranked documents to thereby refine the initial rankings.
While other search engines work in similar ways, it then only helps to link to URLs that are relevant to one another. It allows the search engines to group a set of documents that are similar and also helps the users trust the information within -- [8].
Relevancy is in everyones interest.
With respect to content for hyperlinks, using descriptive information helps the human user identify the resource. Ideally describing the page that a hyperlink is pointing to signifies its relevancy. In terms of keyword proximity; if a paragraph with an anchor contains information that has a good relationship with the URL that is being pointed to, then it can help the relevancy of such document. From the perspective of a document that has a back-reference with such anchor text and the information surrounding it will indicate its relevancy. This is of course in the interest of search engines.
Ideally any page on the site should be reachable within 3 URL steps (or more if the scope of the site contains multidimensional content.) For organizational purposes and better usability, concentrating on minimizing the number of levels to reach a document will benefit the users. If the 3 pages are interlinked with one another, then the relevancy of one of the documents may obscure a pages' importance. Having three independent steps in this case will insure a URLs uniqueness even if its deep down on the site hierarchy as it will maintain its relevancy in SERPs. Having said that, a sitemap is a must depicting all levels of content and directory structures. This keeps at least one page of the Web site within one click range of all pages -- [9].
Usability
Break a document into multiple pages if the content within is too long, can be read in sections, or if the content can be provided under different topics. Essentially it is better to keep a focused criteria in every document.
A document should contain enough content to compensate for the amount of URLs found on the page. Having plenty of rich content on a page is important as it is often the case that there are anchors (if not duplicate) due to a common templating structure. This is a good way to differentiate one URL from another as far as document uniqueness and relevancy is concerned. Similarly, reducing the ratio of markup to content by following the best front-end design methods is encouraged.
If an image used as a title and makes use of a font that is similar to a system font, use text instead when possible. Be careful with common fonts found in systems, as they differ from one another.
Use the HTML alt
attribute in images for accessibility reasons as well as to provide additional content for search engines. In the case of describing an image, the content can be quite useful as search engines retrieve this information. Use the HTML title
attribute for additional usability. For instance, apart from anchor texts, additional information about the URL reference can be supplied with the title
attribute.
Semantics and future compatibility
It is important to note that there is much effort to close the gap between the information that is available for humans and how humans retrieve the information that is mined by the machines. Establishing technologies and methods to serve us better benefits all of us. Therefore, while building a site from ground up, a good rule of thumb would be to cover both cases with a single implementation.
Writing (X)HTML markup and content data that can be interpreted by both humans and machines not only increases a site's consistency but it will also play a major role in its expansion and accessibility. Microformats are established with this in mind. For future compatibility it is not only important to write markup that is easily parsed (well-formed) [10], and meaningful (semantic), but also to offer additional standardised information for the machines. This may sound a bit science fiction but it is already here and there is no better time then now to take advantage of these methods in our implementations. The bridge between SEO and having machine readable information is in search engine's best interest as it allows them to mine data that is relevant and faster to get a hold of. All this is established without sacrificing the human needs of the very same information.
References and Resources:
- W3C Link Checker
- HTTP/1.1: Status Code Definitions
- The Web Robots FAQ
- Apache Module mod_rewrite
- Cool URIs don't change
- Web Page Analyzer
- Linearizing Tables
- Ranking search results by reranking the results based on local inter-connectivity
- The Anatomy of a Large-Scale Hypertextual Web Search Engine
- W3C Markup Validation Service
- Microformats
- Tags