Friday, 27 September 2013

map reduce in ruby

Install ruby

$ irb
.map()
Invokes the given {block} once for each element of self, creating a new array with the values returned by the block.
Documentation here.

1.9.3-p448 :007 > patients = ['Prayag', 'Paras', 'James', 'Angelina', 'Scarlette', 'Christian']
 => ["Prayag", "Paras", "James", "Angelina", "Scarlette", "Christian"] 

1.9.3-p448 :008 > patients.map {|patient| patient.upcase}
 => ["PRAYAG", "PARAS", "JAMES", "ANGELINA", "SCARLETTE", "CHRISTIAN"] 

1.9.3-p448 :010 > patients.map {|patient| patient.reverse}
 => ["gayarP", "saraP", "semaJ", "anilegnA", "ettelracS", "naitsirhC"] 

1.9.3-p448 :011 > patients.map {|patient| patient.size}
 => [6, 5, 5, 8, 9, 9]


.reduce()
Combines all elements of enum by applying a binary operation specified by a {block}.
Documentation here.

1.9.3-p448 :012 > rainbow.reduce(0) {|acc, n| acc += 1}
 => 30 

1.9.3-p448 :014 > patients.reduce(0) {|acc, n| acc += n.length}
 => 42 

1.9.3-p448 :016 > patients.reduce([]) {|acc, n| acc.push(n.upcase)}
 => ["PRAYAG", "PARAS", "JAMES", "ANGELINA", "SCARLETTE", "CHRISTIAN"] 


References
Understanding Ruby’s Select, Map, and Reduce, http://feynmanliang.com/?p=319
Understanding map and reduce, http://railspikes.com/2008/8/11/understanding-map-and-reduce

Friday, 6 September 2013

Hibernate Madness

Hibernate is db agnostic(=db generalised) Java ORM invented by Gavin King being frustrated working with EJB 2 style entity beans and brittle handwritten persistence layers. (Interview with Gavin King, founder of Hibernate, 2009)


automatic dirty checking
a feature of hibernate that saves time and effort to update the database when states of objects are modified inside a transaction.
All persistent objects are monitored by hibernate. It detects which objects have been modified and then calls update statements on all updated objects.

Example at Automatic Dirty Checking

[1] hql join @CollectionTable

For an @Entity Service having a property tags (as a @CollectionTable), 

@Entity
public class Service extends AbstractEntity<Long> {
            private static final long serialVersionUID = 9116959642944725990L;

        @ElementCollection(fetch = FetchType.EAGER, targetClass = java.lang.String.class)
        @CollectionTable(name = "service_tags", joinColumns = @JoinColumn(name = "s_id"))
        @Column(name = "tag")
        private Set<String> tags;
    }

To get Services with tags.tag="Telecome", 

select s from Service s INNER JOIN s.tags t where s.status=0 and and VALUE(s.tags) in ('Telecome'))


[2] hibernate spring query_cache

STEP 1 ADD EhCache 3.3.1 dependency as a cache provider
Add repository to artifactory.






Then add dependency to build.gradle as below : 
description 'eccount-core'


configurations {
all*.exclude module: 'commons-logging'
all*.exclude module: 'log4j'
all*.exclude module: 'icu4j'
all*.exclude module: 'catalina'

}
dependencies {

       compile 'org.hibernate:hibernate-ehcache:3.3.1.GA'
}



STEP 2 ADD hibernate query caching conf to resources/hibernate.cfg.xml


<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE hibernate-configuration PUBLIC
        "-//Hibernate/Hibernate Configuration DTD 3.0//EN"
        "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">
<hibernate-configuration>
    <session-factory>
        <property name="hibernate.default_batch_fetch_size">16</property>
        <property name="hibernate.max_fetch_depth">5</property>

        <property name="hibernate.cache.use_query_cache">true</property>
        <property name="hibernate.cache.use_second_level_cache">true</property>
        <property name="hibernate.generate_statistics">true</property>
        <property name="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</property> 
<property name="hibernate.cache.provider_configuration_file_resource_path">ehcache.xml</property>
        <property name="hibernate.c3p0.timeout">300</property>

    </session-factory>
</hibernate-configuration>

STEP 3 ADD resources/ehcache.xml


<?xml version="1.0" encoding="UTF-8"?>
<ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:noNamespaceSchemaLocation="http://ehcache.org/ehcache.xsd">

<diskStore path="/home/prayag/cache_"/>
<defaultCache
        eternal="false"
        maxElementsInMemory="1000"
        overflowToDisk="true"
        diskPersistent="true"
        timeToLiveSeconds="300"
        />
</ehcache>



STEP 4 ADD Spring's CacheManager conf to WEB-INF/jpa-context.xml


<?xml version="1.0" encoding="UTF-8"?>
<beans
    xmlns:p="http://www.springframework.org/schema/p" 
    xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:cache="http://www.springframework.org/schema/cache" 
xmlns:mvc="http://www.springframework.org/schema/mvc"
xmlns:context="http://www.springframework.org/schema/context" xmlns:tx="http://www.springframework.org/schema/tx"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc-3.0.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd
http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx.xsd
http://www.springframework.org/schema/cache
        http://www.springframework.org/schema/cache/spring-cache.xsd"
default-autowire="byName">

<!-- Scans within the base package of the application for @Components to 
configure as beans -->

<bean id="entityManagerFactory"
class="com.zazzercode.server.jpa.monitor.JavamelodyContainerEntityManagerFactoryBean">
<property name="jpaVendorAdapter">
<bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter">
<property name="showSql" value="${db.showSql}" />
<property name="generateDdl" value="${db.generateDdl}"/>
<property name="databasePlatform" value="${db.dialect}" />
</bean>
</property>
<property name="persistenceUnitName" value="${db.persistenceUnit}"/>
</bean>

<bean id="dataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource" destroy-method="close">
<property name="driverClass" value="${db.driver}"/>
<property name="jdbcUrl" value="${db.url}"/>
<property name="user" value="${db.username}"/>
<property name="password" value="${db.password}"/>
<property name="minPoolSize" value="${db.poolsize.min}"/>
<property name="maxPoolSize" value="${db.poolsize.max}"/>
<property name="maxStatementsPerConnection" value="15"/>
<property name="idleConnectionTestPeriod" value="3000"/>
</bean>

<bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
<property name="entityManagerFactory" ref="entityManagerFactory"></property>
</bean>

<tx:annotation-driven />
<cache:annotation-driven />

<bean
class="org.springframework.orm.jpa.support.PersistenceAnnotationBeanPostProcessor" />

     <!-- generic cache manager -->
<bean id="cacheManager" class="org.springframework.cache.support.SimpleCacheManager">
 <property name="caches">
    <set>
     <bean class="org.springframework.cache.concurrent.ConcurrentMapCacheFactoryBean" p:name="merchantServices"/>
   </set>
      </property>
</bean>

</beans>


STEP 5 ADD Spring's @Cacheable to the Springdata Repository method 


@Cacheable("merchantServices")
@Query("select s from Service s JOIN s.statusList sas where s.status=?1 and s.priviligedUser.priviligedUserType IN (2,4) and s.id IN (?2) and sas.active=true and sas.transactorType=?3 ORDER BY s.name")
List<Service> getAllMerchantServicesByStatusAndServiceId(ServiceStatus status, List<Long> services, TransactorType transactor);


STEP 6 Check diskStore
Cache elements are stored with a key.
In diskStore, I see following files

prayag@prayag:~/cache_$ ls -l
total 0
-rw-rw-r-- 1 prayag prayag 0 Sep 20 13:31 org.hibernate.cache.StandardQueryCache.data
-rw-rw-r-- 1 prayag prayag 0 Sep 20 13:31 org.hibernate.cache.StandardQueryCache.index
-rw-rw-r-- 1 prayag prayag 0 Sep 20 13:31 org.hibernate.cache.UpdateTimestampsCache.data
-rw-rw-r-- 1 prayag prayag 0 Sep 20 13:31 org.hibernate.cache.UpdateTimestampsCache.index


Reference
Spring Data Repository caching results, http://stackoverflow.com/q/17896118/432903

How to configure Hibernate statistics in Spring 3.0 application, http://stackoverflow.com/a/6708644/432903

Sunday, 1 September 2013

mongodb hacks

connect to mongod server using ssh, 

use mongo client to connect to server, choose database;
> use recommendation;
switched to db recommendation

> db.getCollectionNames()
[ "movies", "system.indexes" ]

> show collections;
movies
system.indexes

Indexing
---------------------------------------------

> db.movies.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "recommendation.movies"
}
]

> db.movies.stats()
{
"ns" : "recommendation.movies",
"count" : 967696,   //100K
"size" : 232274176, // 232.274 M
"avgObjSize" : 240,
"storageSize" : 335,900672,
"numExtents" : 14,
"nindexes" : 1,
"lastExtentSize" : 92585984,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 1,
"totalIndexSize" : 32,818464,
"indexSizes" : {
"_id_" : 32,818464
},
"ok" : 1
}


[1.1] filter/ projection
db.movies.find({"title":"Apollo 13 (1995)"});
{
"_id" : ObjectId("55599a61f84a9472b69e2a42"),
"movieId" : 28,
"title" : "Apollo 13 (1995)",
"similarId" : 729,
"similarTitle" : "Nell (1994)",
"regularizedCorRelation" : 0.20162457556082056
}
{
"_id" : ObjectId("55599a69f84a9472b69e39f8"),
"movieId" : 28,
"title" : "Apollo 13 (1995)",
"similarId" : 809,
"similarTitle" : "Rising Sun (1993)",
"regularizedCorRelation" : 0.17383996759460163
}



db.movies.find({"title":"Apollo 13 (1995)", 
                "similarMovies.regularizedCorRelation" : 
                           {$gte : 0.09}
               });
{
"_id" : ObjectId("55599a61f84a9472b69e2a42"),
"movieId" : 28,
"title" : "Apollo 13 (1995)",
 "similarMovies" : 
 {
"similarId" : 729,
"similarTitle" : "Nell (1994)",
"regularizedCorRelation" : 0.20162457556082056
  },
  {
"similarId" : 809,
"similarTitle" : "Rising Sun (1993)",
"regularizedCorRelation" : 0.17383996759460163
  }
}


[1.2] sort (_id) in descending order
> db.payment_transactios.find().sort({_id:-1});

[1.3] limit 
> db.payment_transactios.find().sort({_id:-1}).limit(1);

> db.payment_transactios.find().sort({_id:-1}).limit(1).pretty();
OR
> db.payment_transactios.find().sort({_id:-1}).limit(1).forEach(printjson);
> db.payment_transactios.find().sort({_id:-1}).limit(1).toArray();

DBQuery.prototype._prettyShell = true


Aggregation
----------------------------------------

//get DISTINCT movie id
> db.movies.aggregate( [ { $group : { _id : "$movieId" } } ] )
{ "_id" : 1596 }
{ "_id" : 1309 }
{ "_id" : 1672 }
{ "_id" : 1679 }
{ "_id" : 1463 }
{ "_id" : 1510 }
{ "_id" : 1650 }
{ "_id" : 1523 }
{ "_id" : 1430 }
{ "_id" : 1482 }
{ "_id" : 1500 }
{ "_id" : 1432 }
{ "_id" : 1504 }
{ "_id" : 711 }
{ "_id" : 1662 }
{ "_id" : 1156 }
{ "_id" : 1420 }
{ "_id" : 1130 }
{ "_id" : 1577 }
{ "_id" : 1403 }

// sort the DISTINCT movie ids
> db.movies.aggregate( [ { $group : { _id : "$movieId" } }, {$sort :{_id:1}} ] )
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
{ "_id" : 4 }
{ "_id" : 5 }
{ "_id" : 6 }
{ "_id" : 7 }
{ "_id" : 8 }
{ "_id" : 9 }
{ "_id" : 10 }
{ "_id" : 11 }
{ "_id" : 12 }
{ "_id" : 13 }
{ "_id" : 14 }
{ "_id" : 15 }
{ "_id" : 16 }
{ "_id" : 17 }
{ "_id" : 18 }
{ "_id" : 19 }

{ "_id" : 20 }



// count by moviedId
db.movies.aggregate([{
              $group : {
                 _id : "$movieId", 
                 count: { $sum:1 }
              }}]);

{ "_id" : 1596, "count" : 1 }
{ "_id" : 1309, "count" : 1 }
{ "_id" : 1672, "count" : 1 }
{ "_id" : 1679, "count" : 1 }
{ "_id" : 1463, "count" : 1 }
{ "_id" : 1510, "count" : 1 }
{ "_id" : 1650, "count" : 1 }
{ "_id" : 1523, "count" : 1 }
{ "_id" : 1430, "count" : 4 }
{ "_id" : 1482, "count" : 1 }
{ "_id" : 1500, "count" : 1 }
{ "_id" : 1432, "count" : 4 }
{ "_id" : 1504, "count" : 2 }
...


References
How do I drop a MongoDB database, from the command line?, http://stackoverflow.com/a/8857323/432903

Troubleshoot the Map Function, http://docs.mongodb.org/manual/tutorial/troubleshoot-map-function/

Mongo - get occurrence of lastnames, http://stackoverflow.com/a/5544226/432903

Javascript and MapReduce, http://jcla1.com/blog/2013/05/11/javascript-mapreduce/

http://www.mongovue.com/2010/11/03/yet-another-mongodb-map-reduce-tutorial/