Add WordNet Synonym Support to ElasticSearch with English Stemming

Taking this detailed and helpful post as a base, let’s try to introduce English stemming and preserve synonym token replacement by WordNet.

First, to add WordNet prolog file to existing ElasticSearch nodes (in my case Ubuntu) perform the following:

  • sudo su #switch to superuser to access ElasticSearch folder freely
  • wget http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz #download ANSI Prolog version of the WordNet db
  • tar -xvzf WNprolog-3.0.tar.gz #decompress tar
  • cd ../../etc/elasticsearch #go to ElasticSearch config directory
  • mkdir analysis #create analysis subdirectory
  • mv /home/onehydraadmin/prolog/wn_s.pl /etc/elasticsearch/analysis/wn_s.pl #move WordNet file to new directory

Now we are able to create ElasticSearch index that can access WordNet db.

What do we need in terms of synonym mapping? We need both synonyms and queries to be tokenized with English stemmer after English stop words removal. Then query tokens need to be mapped to tokens in synonyms source. After that, list of synonym tokens obtained need to act as a search query tokens against indexed documents.

To achieve this, we create an index with custom synonym analyser that utilises three filters (the order matters!): english_stop, english_stemmer, synonym.
PUT request to http://localhost:9200/synonym_test/


{
  "settings" : {
    "index" : {
        "analysis" : {
            "analyzer" : {
                "synonym" : {
                    "tokenizer" : "standard",
                    "filter" : ["english_stop", "english_stemmer","synonym"]
                }
            },
            "filter" : {
                "synonym" : {
                    "type": "synonym",
                        "format": "wordnet",
                        "synonyms_path": "analysis/wn_s.pl"
                },
        		"english_stop": {
          			"type":       "stop",
          			"stopwords":  "_english_" 
        		},
        		"english_stemmer": {
          			"type":       "stemmer",
          			"language":   "english"
        		}
            }
        }
    }
  },
  "mappings" : {
       "_default_": {
           "properties" : {
               "name" : {
                   "type" : "string",
                   "analyzer" : "synonym"
               }
           }
        }
    }
}

Following the blog post, let’s insert two values to the index: “baby” and “child”:

POST request to http://localhost:9200/synonym_test/1


{
    "name" : "baby"
}
POST request to http://localhost:9200/synonym_test/2

{
    "name" : "child"
}

Now we can search with singular and plurals queries alike and still get all synonyms in response.

POST request to http://localhost:9200/synonym_test/_search?pretty=true

{
   "query" : {
        "match": {
             "name": {
				"query": "babies"
             }
        }
    }
}

Response


{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.30685282,
        "hits": [
            {
                "_index": "projects6",
                "_type": "project",
                "_id": "1",
                "_score": 0.30685282,
                "_source": {
                    "name": "baby"
                }
            },
            {
                "_index": "projects6",
                "_type": "project",
                "_id": "2",
                "_score": 0.19178301,
                "_source": {
                    "name": "child"
                }
            }
        ]
    }
}

17 thoughts on “Add WordNet Synonym Support to ElasticSearch with English Stemming

  1. Hi,
    I tried the above method but facing the below issue.

    {“error”:{“root_cause”:[{“type”:”illegal_argument_exception”,”reason”:”failed to build synonyms”}],”type”:”illegal_argument_exception”,”reason”:”failed to build synonyms”,”caused_by”:{“type”:”parse_exception”,”reason”:”Invalid synonym rule at line 109″,”caused_by”:{“type”:”illegal_argument_exception”,”reason”:”term: course of action analyzed to a token (action) with position increment != 1 (got: 2)”}}},”status”:400}

    Could you please suggest.

    Thanks,
    Ashwin rao

  2. I believe that is among the such a lot vital information for me.
    And i am glad studying your article. However want to remark
    on few common things, The web site style is perfect, the articles
    is really nice : D. Excellent task, cheers

  3. hey there and thank you for your information – I’ve certainly picked up anything
    new from right here. I did however expertise several technical points using this website, since I experienced to reload the
    web site a lot of times previous to I could get it to load correctly.
    I had been wondering if your hosting is OK? Not that I’m complaining,
    but slow loading instances times will very frequently affect your placement in google and can damage your high-quality score if advertising
    and marketing with Adwords. Anyway I’m adding this RSS to my
    email and can look out for a lot more of your respective fascinating
    content. Ensure that you update this again soon.

    My web blog คริปโทเคอเรนซี่

  4. Terrific work! That is the type of information that are supposed to be shared across the web.
    Disgrace on the search engines for now not positioning this publish higher!
    Come on over and discuss with my website . Thank you =)

  5. This is very interesting, You’re a very skilled blogger.
    I’ve joined your feed and look forward to seeking more of your wonderful post.

    Also, I’ve shared your website in my social networks!

  6. Hello There. I discovered your weblog the use of msn. That is a really smartly written article.
    I will be sure to bookmark it and return to learn extra of your helpful info.

    Thank you for the post. I’ll certainly return.

  7. I do not even know how I finished up here, but
    I thought this post used to be great. I don’t recognize who you might be however definitely you are going to a well-known blogger when you are
    not already. Cheers!

  8. Hello I am so grateful I found your website, I really found you by accident,
    while I was searching on Google for something else, Anyways I am
    here now and would just like to say cheers for a marvelous post and a all round entertaining blog (I also
    love the theme/design), I don’t have time to look over it all at the minute but I
    have book-marked it and also added in your RSS feeds,
    so when I have time I will be back to read much more,
    Please do keep up the superb b.

  9. Hey there exceptional website! Does running a
    blog similar to this take a great deal of work? I’ve virtually no understanding of programming however I was hoping to start my own blog in the near future.

    Anyhow, should you have any suggestions or tips for new blog owners please share.
    I understand this is off subject nevertheless I simply needed to ask.
    Many thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *