Open Source in Big Data is Gaining Traction

Without big data, most IT Heads of E-Commerce firms would be headless chickens

“Without Big Data, we wouldn’t know how to survive. It’s oxygen to our business.”

ShopClues has from day one been a metrics driven organisation. They have been maintaining business intelligence (cousin of big data) reports from day one. Every line of business and core management reviews these reports on daily, weekly and monthly basis – to get insight into the operations, business opportunities and challenges.

Shopclues is gaining traction. So is another new E-Commerce player, Koovs. Both these companies are betting big on big data – as their arms to fight it out in a market where Flipkart and Myntras have already made a mark.

Mrinal Chatterjee, CIO, Shopclues.com says, “Without Big Data, we wouldn’t know how to survive. It’s oxygen to our business.”

Amit Shukla , CIO and Co-Founder, Koovs.com says, “We use big data solution, we have a large volume of data that gets updated every second, and that has to be available across multiple networks and servers. On our portal, we monitor the browsing pattern, the shopping pattern, past purchase trends and other customers’ data to better understand the customers’ behaviours. This helps us sell and cross sell products, and provide an informed visual merchandising, personalised offers and predictive marketing.”

Data is not getting longer, it's getting wider. The fundamental to most of the success in Big Data strategy is a deep understanding of the business.

Sanjay Kharb, Vice President - Infrastructure and BI at MakeMyTrip, says, “We look at leveraging it in two ways: a) to better understand problems or strategic areas that we are targeting, and b) to come up with ‘Aha’ discoveries from the data that we did not know about before. So, we do deep analytics as well as mining of the data.”

Identifying Big Data is the major task for ITDMs. Big Data is the large volume of unstructured form of valuable data that is difficult to manage using the conventional database solutions, and requires advanced architectures to store, analyse and to be made available across the network in the high velocity.

Big Data Solutions – Is Open Source Doing Better?

The biggest question that hovers around IT decision makers is – which Big Data solution to go for. Some are proprietary and others are from open sources.

After and extensive research by Koovs technical team, they migrated to Apache Cassandra, an open source distributed database management system, which is designed to handle large amount of data across different nodes – making data highly available with no single point of failure. It has advanced option of data analytics, which is used to define trends, and filtering data in micro categories.

Shukla says, “We need fast data processing and results – to provide good experience to our users ‘On the fly’.”

ShopClues.com is big into open source technologies. They have taken some of the best in the industry open source platforms, and integrated and customised them to meet their needs.

Chatterjee says, “These tools allow us to run our BI reporting. We have also integrated core business metrics back into our transaction systems for real time decision support systems.”

For Kharb, Big Data strategy leverages Open-Source heavily and makes use of proprietary solutions as needed.

He says, “We have developed our own data-hub and warehouse to enable us to derive batch and near real-time intelligence. Besides, we have created a Service-oriented architecture layer to distribute these data in consumable format for our online and offline products and services. Our Datawarehouse supports a lot of post transactions operations and reconciliations.”

Big Data Architecture

The main point, at which Koovs uses Cassandra, is to provide high availability of data. Referring to the CAP Theorem, Amit and his team can understand that it is impossible for a distributed system to provide one with consistency, availability and partition tolerance at the same time.

Koovs’s Architecture:

Client request comes to ELB (Load balancer)
ELB sends the requests to the application servers where the web services are deployed
Webservice makes a call to cache servers, if cache is available returns the data to client, if not then it gets the data from cassandra cluster, updates the cache and returns the data to client
The partition is based on user and it follows RP (Random Partitioner) for clustering.

The key for designing the Cassandra based architecture:

O(1) node lookup
Explicit replication
Eventually consistent
Scalibility - new machines are added, with no downtime or interruption to applications
Broken node auto replacement
Full and increamental backup to S3
Cassandra clusters on separate availaibilty zones

The system architecture at Shopclues is confidential but at a high level.

Chatterjee says, “We have an ETL platform to correlate and push data into a central OLAP platform, and we run BI reporting on top of this platform. This OLAP platform also feeds data back into OLTP system for real time automated decision support systems.”

For Make My Trip, the fundamental requirement was to create a system that is nimble, extensible, high-performing and capable of data analysis the traditional batch processing way as well as incorporating real-time data.

Kharb says, “We evaluated several proprietary solutions and then decided to build our own, using open-source. The key to our strategy is a distributed data storage and processing strategy leveraging Hadoop and a bouquet of related technologies. For near Reatime needs, we use a combination of open source Kafka, Storm, ElasticSearch and Couchbase as the stores. We treat all our data activities as events and all processing is done on them. QlikView and SAS also find a usage in our overall BI initiatives that are tied together with Big Data.”

Buying into Line of Business

The solution that is easy to integrate with our existing system, scalable and can be easily monitored is what Amit looks at. Using the large volume of data that come from different sources, we use our business function to identify and make better use of data like:

Data Analyst filters out, and identifies the data that can be used for the analysis.
Marketing team identifies the campaign and promotions that should be launched.
Sales team identifying the potential customer base and suggesting the relevant offers for campaign.
Logistics for identifying the trends for the location of product demand.
Customer service for identifying, which part of the website is mostly visited and which product is most looked for.

At Make My Trip, the BI team does not work in isolation. We have a strong partnership between the BI/Tech, Product and Marketing teams. All of us are aware of the data being leveraged and jointly determine any additional data sources that could provide great insights.

For Big Data and BI, at ShopClues.com it has always been a close partnership between technology and business.

Chatterjee says, “Our technology team excels in working closely with business to identify – what business metrics the system needs to log. This is done at the time OLTP system is developed. These metrics are then harnessed in Big Data initiatives. Big Data involvement starts at the time transaction system is developed.”

Cheap Nike Air Max 2017 For Online Sale