Monday, 21 October 2013

First night hack on elasticsearch (3 hrs)

ElasticSearch is (Apache Solr like) Lucene based distributed RESTful (PUT, GET, POST, DELETE) search server developed in Java.
To get motivated read a soundcloud casestudy and watch Search and Discovery at SoundCloud.

Elasticsearch in 15 minutes from David Pilato is a nice resource to follow.





Following Slide is also a useful one(with Rails tutorial - slides#44).



Now, Let's get hands dirty.

[STEP 1] download and tar elasticsearch 0.90.5
prayag@prayag:~$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.tar.gz

prayag@prayag:~$ tar -zxvf elasticsearch-0.90.5.tar.gz

[STEP 2] start elasticsearch node (in foreground)
prayag@prayag:~$ elasticsearch-0.90.5/bin/elasticsearch -f
[2013-10-21 23:19:19,127][INFO ][node                     ] [Aardwolf] version[0.90.5], pid[15897], build[c8714e8/2013-09-17T12:50:20Z]
[2013-10-21 23:19:19,128][INFO ][node                     ] [Aardwolf] initializing ...
[2013-10-21 23:19:19,140][INFO ][plugins                  ] [Aardwolf] loaded [], sites []
[2013-10-21 23:19:23,523][INFO ][node                     ] [Aardwolf] initialized
[2013-10-21 23:19:23,524][INFO ][node                     ] [Aardwolf] starting ...
[2013-10-21 23:19:23,800][INFO ][transport                ] [Aardwolf] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.195.8:9300]}
[2013-10-21 23:19:27,045][INFO ][cluster.service          ] [Aardwolf] new_master [Aardwolf][pK-h7KRwTWamnQNNrR2gwQ][inet[/192.168.195.8:9300]], reason: zen-disco-join (elected_as_master)
[2013-10-21 23:19:27,143][INFO ][discovery                ] [Aardwolf] elasticsearch/pK-h7KRwTWamnQNNrR2gwQ
[2013-10-21 23:19:27,233][INFO ][http                     ] [Aardwolf] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.195.8:9200]}
[2013-10-21 23:19:27,234][INFO ][node                     ] [Aardwolf] started


[2013-10-21 23:19:27,311][INFO ][gateway                  ] [Aardwolf] recovered [0] indices into cluster_state

Verify elasticsearch is running at 9200, 
First You may need to install curl ,
sudo apt-get update
sudo apt-get install curl

Then, curl following request in terminal,
$ curl -XGET 'http://localhost:9200'
{
  "ok" : true,
  "status" : 200,
  "name" : "Lady Octopus",
  "version" : {
    "number" : "0.90.5",
    "build_hash" : "c8714e8e0620b62638f660f6144831792b9dedee",
    "build_timestamp" : "2013-09-17T12:50:20Z",
    "build_snapshot" : false,
    "lucene_version" : "4.4"
  },
  "tagline" : "You Know, for Search"
}

[STEP 3] create index movies

Then, put following script in create_index.sh, and then execute the same.
curl -XPUT 'http://localhost:9200/movies/'

$ bash create_index.sh

[STEP 4] apply mapping for type 'Movie'
Put following script in movie_mapping.sh, and then execute the same.

curl -X PUT localhost:9200/movies/Movie/_mapping -d '{
    "Movie" : { "properties"  : { 
                    "title"    : { "type":"String" }, 
                    "director" : { "type":"String" }, 
                    "year"     : { "type":"long" }
                }
              }
}'

$ bash movie_mapping.sh


[STEP 5] Indexing(create/update) documents 
$ curl -XPUT http://localhost:9200/<index>/<type>/<id>,
where type(in ES) = table (in RDBMS)

Put following script in movie_document.sh, and then execute the same.
curl -XPUT 'http://localhost:9200/movies/Movie/1' -d '
{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972
}'

$ bash movie_document.sh

Response we get is, 
{"ok":true,"_index":"movies","_type":"Movie","_id":"1","_version":1}

And, check the elasticsearch console with following updates.
[2013-10-22 00:12:46,835][INFO ][cluster.metadata         ] [Aardwolf] [movies] creating index, cause [auto(index api)], shards [5]/[1], mappings []
[2013-10-22 00:12:47,944][DEBUG][action.index             ] [Aardwolf] Sending mapping updated to master: index [movies] type [movie]
[2013-10-22 00:12:47,961][INFO ][cluster.metadata         ] [Aardwolf] [movies] update_mapping [movie] (dynamic)


[STEP 6.1] Retrieve document by id
Syntax : 
$ curl -XGET http://localhost:9200/<index>/<type>/<id>

Execute following command to retrieve movie with document id 1,
$ curl -XGET "http://localhost:9200/movies/Movie/1?pretty=true"
{
  "_index" : "movies",
  "_type" : "Movie",
  "_id" : "1",
  "_version" : 1,
  "exists" : true, 
"_source" : 
{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972 
}
}

[STEP 6.2] Search documents by type field (title)
$ curl -XGET "localhost:9200/movies/Movie/_search?q=title:Godfather"

 {"took":22,
"timed_out":false,
"_shards":{"total":5,"successful":5,"failed":0},
"hits":{
"total":1,
"max_score":0.30685282,
"hits":
[{"_index":"movies",
"_type":"Movie",
"_id":"1",
"_score":0.30685282, 
"_source" : 
{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972 
}
}]
}
}


OR Using Elasticsearch Query DSL, 





[7] Not to miss 
[7.1] 40 mins presentation by +Shay Banon 


                                   Shay Banon - ElasticSearch: Big Data, Search, and Analytics

The presentation includes following topics : 
data design patterns
The "kagillion" shards problem
Simple data flows
"Users" data flow
"time" data flow (clicks etc)
more than search
questions?
etc

[7.2] ElasticSearch, "You know, for search", Presentation by Clinton Gormley
Also watch this guy at "Getting down and dirty with Elasticsearch"





[7.3] book exploring elasticsearch, Andrew Cholakian



References
ElasticSearch 101– a getting started tutorial, http://joelabrahamsson.com/elasticsearch-101/

ElasticSearch in 5 minutes, http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html

http://blog.florian-hopf.de/2013/09/simple-event-analytics-with.html

http://www.javacodegeeks.com/2013/04/getting-started-with-elasticsearch.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-health.html

http://thediscoblog.com/blog/2013/09/03/effortless-elasticsearch-clustering/

Scaling massive elastic search clusters - Rafał Kuć - Sematext, http://www.slideshare.net/kucrafal/scaling-massive-elastic-search-clusters-rafa-ku-sematext



Shards and replicas in Elasticsearch, http://stackoverflow.com/a/15705989/432903

No comments:

Post a Comment