The big move – Wikimedia chose Elasticsearch

Elasticsearch

Wikimedia — the organization that runs Wikipedia — the world’s largest open encyclopedia, announced that they are in the process of rolling out new search infrastructure based on Elasticsearch to all of the wikis.

After GitHub, Livechat and Quora, Wikimedia is a next big organization which decided to choose Elasticsearch for their key functionality – the search.

Previously, all of the Wikimedia sites have been using a home-grown search system based on Apache Lucene. Now, Wikimedia has choosen to migrate its search engine to the Elasticsearch.

Elasticsearch – the cloud aware search

Following an official overview page, Elasticsearch is a flexible and powerful open-source, distributed, real-time search and analytics engine based on Apache Lucene.

Elasticsearch role is to become an industry leader for Cloud aware full-text search clusters where reliability and scalability are must haves.

The core concepts of the technology are as follows:

  • Distributed Environment – as the name says (Elastic) it was built to provide High Scalability out of the box – including automatic mechanisms of replication and sharding
  • High Availability – automatic detection of failed nodes facilitate HA processes
  • Full Text Search – Elasticsearch uses Lucene to provide the most powerful full text search capabilities available in any open source product
  • Document Oriented – Elasticsearch can be used as a typical NoSQL database to store structured JSON documents

Aside of the key features, Elasticsearch provides real time access to the stored data and facilitates analytics operations like aggregations.

The benefits of change

The motivation of the such big migration is mainly driven by the fact that maintaining dedicated search engine is cost and time expensive. Aside it’s still not possible to have all of the features that come with Elasticsearch:

  • Expressive queries language – the Query DSLs let to easily write very expressive ad-hoc queries
  • Maintenance – Elasticsearch exposes fully featured Rest API which easily let to maintain the cluster
  • Scalability – with mechanisms beginning with automatic nodes discovery and ending with shards rebalancing, scale horizontally is provided out of the box

…and as they say:

We’re very happy with Lucene but we wanted to get out of the business of maintaining a special-purpose open-source search system

More related informations can be found on the official Wikimedia Blog.

Photo by Lady Jenn D


Are you looking for a team to develop your search engine based on the Elasticsearch? Check out Octivi!

Looking to scale-out your
web application?

Hire Octivi!

Web developer