Old Ya: First night hack on elasticsearch (3 hrs)

ElasticSearch is (Apache Solr like) Lucene based distributed RESTful (PUT, GET, POST, DELETE) search server developed in Java.
To get motivated read a soundcloud casestudy and watch Search and Discovery at SoundCloud.

Elasticsearch in 15 minutes from David Pilato is a nice resource to follow.

Following Slide is also a useful one(with Rails tutorial - slides#44).

Elasticsearch Basics from Shifa Khan

Now, Let's get hands dirty.

[STEP 1] download and tar elasticsearch 0.90.5

prayag@prayag:~$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.tar.gz

prayag@prayag:~$ tar -zxvf elasticsearch-0.90.5.tar.gz

[STEP 2] start elasticsearch node (in foreground)

prayag@prayag:~$ elasticsearch-0.90.5/bin/elasticsearch -f
[2013-10-21 23:19:19,127][INFO ][node                     ] [Aardwolf] version[0.90.5], pid[15897], build[c8714e8/2013-09-17T12:50:20Z]
[2013-10-21 23:19:19,128][INFO ][node                     ] [Aardwolf] initializing ...
[2013-10-21 23:19:19,140][INFO ][plugins                  ] [Aardwolf] loaded [], sites []
[2013-10-21 23:19:23,523][INFO ][node                     ] [Aardwolf] initialized
[2013-10-21 23:19:23,524][INFO ][node                     ] [Aardwolf] starting ...
[2013-10-21 23:19:23,800][INFO ][transport                ] [Aardwolf] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.195.8:9300]}
[2013-10-21 23:19:27,045][INFO ][cluster.service          ] [Aardwolf] new_master [Aardwolf][pK-h7KRwTWamnQNNrR2gwQ][inet[/192.168.195.8:9300]], reason: zen-disco-join (elected_as_master)
[2013-10-21 23:19:27,143][INFO ][discovery                ] [Aardwolf] elasticsearch/pK-h7KRwTWamnQNNrR2gwQ
[2013-10-21 23:19:27,233][INFO ][http                     ] [Aardwolf] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.195.8:9200]}
[2013-10-21 23:19:27,234][INFO ][node                     ] [Aardwolf] started


[2013-10-21 23:19:27,311][INFO ][gateway                  ] [Aardwolf] recovered [0] indices into cluster_state

Verify elasticsearch is running at 9200,
First You may need to install curl ,
sudo apt-get update
sudo apt-get install curl

Then, curl following request in terminal,

$ curl -XGET 'http://localhost:9200'
{
  "ok" : true,
  "status" : 200,
  "name" : "Lady Octopus",
  "version" : {
    "number" : "0.90.5",
    "build_hash" : "c8714e8e0620b62638f660f6144831792b9dedee",
    "build_timestamp" : "2013-09-17T12:50:20Z",
    "build_snapshot" : false,
    "lucene_version" : "4.4"
  },
  "tagline" : "You Know, for Search"
}

[STEP 3] create index movies

Then, put following script in create_index.sh, and then execute the same.

curl -XPUT 'http://localhost:9200/movies/'

$ bash create_index.sh

[STEP 4] apply mapping for type 'Movie'
Put following script in movie_mapping.sh, and then execute the same.

curl -X PUT localhost:9200/movies/Movie/_mapping -d '{
    "Movie" : { "properties"  : { 
                    "title"    : { "type":"String" }, 
                    "director" : { "type":"String" }, 
                    "year"     : { "type":"long" }
                }
              }
}'

$ bash movie_mapping.sh

[STEP 5] Indexing(create/update) documents
$ curl -XPUT http://localhost:9200/<index>/<type>/<id>,
where type(in ES) = table (in RDBMS)

Put following script in movie_document.sh, and then execute the same.

curl -XPUT 'http://localhost:9200/movies/Movie/1' -d '
{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972
}'

$ bash movie_document.sh

Response we get is,
{"ok":true,"_index":"movies","_type":"Movie","_id":"1","_version":1}

And, check the elasticsearch console with following updates.

[2013-10-22 00:12:46,835][INFO ][cluster.metadata         ] [Aardwolf] [movies] creating index, cause [auto(index api)], shards [5]/[1], mappings []
[2013-10-22 00:12:47,944][DEBUG][action.index             ] [Aardwolf] Sending mapping updated to master: index [movies] type [movie]
[2013-10-22 00:12:47,961][INFO ][cluster.metadata         ] [Aardwolf] [movies] update_mapping [movie] (dynamic)

[STEP 6.1] Retrieve document by id
Syntax :
$ curl -XGET http://localhost:9200/<index>/<type>/<id>

Execute following command to retrieve movie with document id 1,

$ curl -XGET "http://localhost:9200/movies/Movie/1?pretty=true"
{
  "_index" : "movies",
  "_type" : "Movie",
  "_id" : "1",
  "_version" : 1,
  "exists" : true, 
"_source" : 
{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972 
}
}

[STEP 6.2] Search documents by type field (title)

$ curl -XGET "localhost:9200/movies/Movie/_search?q=title:Godfather"

 {"took":22,
"timed_out":false,
"_shards":{"total":5,"successful":5,"failed":0},
"hits":{
"total":1,
"max_score":0.30685282,
"hits":
[{"_index":"movies",
"_type":"Movie",
"_id":"1",
"_score":0.30685282, 
"_source" : 
{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972 
}
}]
}
}

OR Using Elasticsearch Query DSL,

[7] Not to miss
[7.1] 40 mins presentation by +Shay Banon

Shay Banon - ElasticSearch: Big Data, Search, and Analytics

The presentation includes following topics :
data design patterns
The "kagillion" shards problem
Simple data flows
"Users" data flow
"time" data flow (clicks etc)
more than search
questions?
etc

[7.2] ElasticSearch, "You know, for search", Presentation by Clinton Gormley
Also watch this guy at "Getting down and dirty with Elasticsearch"

ElasticSearch at berlinbuzzwords 2010 from elasticsearch

[7.3] book exploring elasticsearch, Andrew Cholakian

References
ElasticSearch 101– a getting started tutorial, http://joelabrahamsson.com/elasticsearch-101/

ElasticSearch in 5 minutes, http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html

http://blog.florian-hopf.de/2013/09/simple-event-analytics-with.html

http://www.javacodegeeks.com/2013/04/getting-started-with-elasticsearch.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-health.html

http://thediscoblog.com/blog/2013/09/03/effortless-elasticsearch-clustering/

Scaling massive elastic search clusters - Rafał Kuć - Sematext, http://www.slideshare.net/kucrafal/scaling-massive-elastic-search-clusters-rafa-ku-sematext

Shards and replicas in Elasticsearch, http://stackoverflow.com/a/15705989/432903

Old Ya

Monday, 21 October 2013

First night hack on elasticsearch (3 hrs)

No comments:

Post a Comment

@JVMThreadDump

About Me