Stemming Search Terms in Sitecore Solr Indexes .

Solr-Sitecore-Search-Terms

I wrote about stemming in Sitecore Lucene content search in my previous article. But, just to remind you: Stemming is the process of reducing inflected (or sometimes derived) words to their word stem or root form. It allows you to make your search to return more relevant results. That is why usage of stemming could be a good and easy option to improve your search.

Configuring stemming in Solr is even easier than configuring it in Sitecore Lucene Content Search. You don’t need to write even one line of code. All you need is configuration.

There is schema.xml file in configuration of each Solr core. When you will open it you will see that there is field type text_en:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
	<tokenizer class="solr.StandardTokenizerFactory" />
	<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
	<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" />
	<filter class="solr.LowerCaseFilterFactory" />
	<filter class="solr.EnglishPossessiveFilterFactory" />
	<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" />
	<!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
	<filter class="solr.EnglishMinimalStemFilterFactory"/>
        -->
	<filter class="solr.PorterStemFilterFactory" />
  </analyzer>
  <analyzer type="query">
	<tokenizer class="solr.StandardTokenizerFactory" />
	<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
	<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" />
	<filter class="solr.LowerCaseFilterFactory" />
	<filter class="solr.EnglishPossessiveFilterFactory" />
	<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" />
	<!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
	<filter class="solr.EnglishMinimalStemFilterFactory"/>
        -->
	<filter class="solr.PorterStemFilterFactory" />
  </analyzer>
</fieldType>

It contains filter solr.PorterStemFilterFactory that do stemming of your indexed document and your query. Compared to Lucene.Net, you have three options what stemmer to use for English language: Porter, Lovins or Porter2 Also, you have the ability for stemming documents in different languages: Armenian, Basque, Catalan, Danish, Dutch, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish.

To use stemming on your field you should change its type from text_general to text_en:

<field name="_content" type="text_en" indexed="true" stored="false" /> 

Then you need to restart Solr and rebuild indexes. And this one small configuration change will improve search quality on your website.

Want to read more about my findings? You can find more of my blogs here.

want to speak to one of our experts?

Anton Tishchenko Thumbnail
Anton Tishchenko
Head of Digital Engineering
Anton has worked as a developer since 2007, he is a highly experienced Sitecore developer who previously worked as a Technical Team Lead at Sitecore. Anton's expertise in the Sitecore platform is formidable; he's definitely one of the world's finest Sitecore ninjas and in 2019 he was recognised as the only Sitecore MVP in the Ukraine when he achieved his Technology MVP Status.
Anton Tishchenko Thumbnail

Anton Tishchenko

20 Nov 2018 - 5 minute read
share this

stay in the know, stay ahead.

Get the latest from the agency, including news, events and expert content.
explore services in the article
find out what we can do for you
read some of our case studies