Stemming Search Terms in Sitecore Lucene Indexes .

stemming-search-terms-header

Sitecore content search is great technology that allows you to get search on your Sitecore website with minimum efforts. But one thing that always disappointed me is that this search doesn’t understand word forms. Single and plural form of a noun will be saved as two separate terms in the index(e.g.: “tool” and “tools”). Single, past tense and normal form of a verb will result in three different terms in the index(e.g.: “deny”, “denies” and “denied”). It gives worse search results. If you will search “deny” then it would not found documents with “denies” or “denied”.

There are few options how you can “fix” it. First one is usage of similarity parameter in the query: x => x.YourFieldName.Like(“tools”, 0.8f). It is quick and dirty solution. Now, content search will return results with similar words. But there is the other side of the coin. You will get search results with similar words where you don’t expect. E.g.: search for “Ireland” will give “Iceland” and “Island” in results.

The other option is using “Stemming”. Stemming is the process of reducing inflected (or sometimes derived) words to their word stem. There are a different implementation of stemming algorithms. Lucene.Net has implementation of the Porter Stemming algorithm. It could be used to extend Sitecore content search. We need to implement our own analyser:

using Lucene.Net.Analysis;
using System;
using System.IO;
 
namespace Feature.StemmedSearch.Search
{
    public class PorterStemLowerCaseKeywordAnalyzer : KeywordAnalyzer
    {
        public override TokenStream TokenStream(string fieldName, TextReader reader)
        {
            return new PorterStemFilter(new LowerCaseFilter(new KeywordTokenizer(reader)));
        }
    }
}

Then register field mapping in the content search configuration:

<fieldNames>
  <field fieldName="_content"              storageType="YES" indexType="TOKENIZED"    vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
	<analyzer type="Feature.StemmedSearch.Search.PorterStemLowerCaseKeywordAnalyzer, Feature.StemmedSearch" />
  </field>
</fieldNames>

I used _content field as example, but it is better don’t change Sitecore fields that come out of the box and use your own custom fields. Now, after rebuild of indexes we can see that all search terms are saved in a stemmed way:

stemming-search-terms-2

And when you check search queries by turning on verbose logging. You will see that search query terms are also stemmed for _content field:

stemming-search-terms-1

Hurray! Now, our Sitecore website search is more similar to Google search. :-)

Stay tuned, in the second part we will do the same for Solr indexes.

You can read more of my Sitecore blogs here!

Whatever your business, be it a regional or global brand, the content you produce plays a vital role in your success. You know that… hence you’re reading this.

A well formulated and executed content strategy not only drives more traffic, at the core, it defines what your business is and helps build a strong connection between you and your audiences.

So let's quickly look at why developing a coherent content strategy is important and how setting clear goals and understanding your audience will elevate your online performance. 

What is a Content Strategy?

It's basic right? Content is at the core of how you define the way your business presents itself and an effective strategy should look to ensure that tone of voice, messaging and the core values are surfaced across all channels, from service or product pages on your website, to blog posts, through social media updates blah blah blah.

But let's keep it simple - your content strategy should be a clear roadmap that connects your marketing activities to your business goals. Align to your customer’s wants and needs and engage them at every interaction point and boom, you're in business. 

Who are my Audience?

You likely start all your projects with this chalked on the wall because your business knows “exactly” who its customers are right? Sounds obvious but we often find its not been done forensically enough (not based on data), is too old (more than 12 months ago - forget it) or its a spin off from some brand work that was legitimately aspirational but doesn’t face the reality of who you your business is actually engaging today.

So start (or circle back) with audience research, building out those personas to understand their ambitions, their lifestyle, their pain points or concerns, and crucially their wants and needs - in your context. 

Do I need to tailor content?

As part of your research find out where your audiences spend their time online and how they interact with content: Some may spend time thoroughly researching a product or service, whereas other audiences may want their content to be quick, snappy or easily digestible in the form of a video, infographic or short blog posts.

 

Ultimately, the key is to produce a strategy that creates the type of content your customers want to see:

  • What are the problems that your product or service will help them solve?

  • Who are they most influenced by?

  • What voices influence their behaviour?

  • What type of content do they consume?

  • Where do they consume content and engage with brands?

Different Content, Different Objectives

 All content is not born equal: When producing your strategy, it is important that the objectives for each individual piece are defined, that these fulfil your marketing objectives and tie to the overarching goals for your business.

There are various content frameworks that exist to aid content development in this way, but one that is popular and effective is Google’s hero, hub and hygiene method: It provides a framework on developing content to achieve different goals and gives guidance on the effort needed to create each type of content.

Hero Content

Hero content is essentially campaign content, it is big splash ideas designed to appeal to a large audience with the aim of telling your brand’s story at scale. 

Ways of measuring hero include the amount of PR mentions or links from authoritative domains plus social interactions and mentions of your brand across all channels. 

Considering the scale of hero campaigns, this content is not regularly produced and is reserved for peak promotional times where it’s important for a business to stand out from their competitors.

Hub Content

Hub content is the stuff that keeps your audience engaged, it expands on the themes of product or service level content, educates users and helps create a connection between themselves and your brand.

Hygiene Content

Hygiene content is the bread and butter of any website, it is the BAU content for products and services, it is SEO focused and targets important keywords at a product, service or guide level.

How do I manage all this?

Content development is only one part of the ongoing work needed when working with an effective content strategy. We call it “feeding the beast” because it really is the fuel in your brand vehicle and once you start you really can’t stop (if it’s delivering results) but that’s where performance measurement comes in.

Your greatest gift in managing the outputs from your hero/hub/hygiene style efforts is to understand If your content is working. To truly deliver results your business must first understand the objectives and goals of each piece of content to effectively measure its success. That as a guiding light from day 1 will let you slow down, speed up, stop or start new content briefs and projects.

Remember - content strategies are not set in stone. They are living breathing things and should adapt and pivot as insights become available and your brand naturally evolves.

If ever you want to chat content and explore new initiatives we’re always here to help.

want to speak to one of our experts?

 
Anton Tishchenko Thumbnail
Anton Tishchenko
Head of Digital Engineering
Anton has worked as a developer since 2007, he is a highly experienced Sitecore developer who previously worked as a Technical Team Lead at Sitecore. Anton's expertise in the Sitecore platform is formidable; he's definitely one of the world's finest Sitecore ninjas and in 2019 he was recognised as the only Sitecore MVP in the Ukraine when he achieved his Technology MVP Status.
Anton Tishchenko Thumbnail

Anton Tishchenko

02 Nov 2018 - 7 minute read
share this

stay in the know, stay ahead.

Get the latest from the agency, including news, events and expert content.
explore services in the article
find out what we can do for you
read some of our case studies