Archive for the month “April, 2014”

KickStart to Big Data!

Hello friends,

After writing lots of blogs on events and meet-up, now its time for some change. So, this time I am writing a blog on a very Big topic i.e. Big Data.

After learning about ‘Big Data’ and its concepts I personally think the name should have been ‘Vig Data’ because there are lots of V’s involved here. (That was a joke 🙂 )

So before you go through this long time killing blog, Just want to tell you that the purpose of writing this blog is to give you an initial knowledge about Big Data in a summarized way.

Word Cloud "Big Data"

Definitions of Big Data:

  • Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
  • Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured.
  • Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data is big data.
  • The term big data, especially when used by vendors, may refer to the technology (which includes tools and processes) that an organization requires to handle the large amounts of data and storage facilities.
  • Big Data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time.
  • Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

The Origin of Big Data :

  • Big Data Originated as the three Vs: Volume, Velocity, and Variety.
  • This is the most venerable and well-known definition, first coined by Doug Laney of Gartner over twelve years ago.
  • Since then, many others have tried to take it to eleven with additional Vs including Validity, Veracity, Value, and Visibility etc.

Why Big Data?:

You’ll Manage Data Better:

  • Many of today’s big data and business intelligence tools let users sit in the driver’s seat and work with data without going through too many complicated technical steps.

  • This added layer of abstraction has enabled numerous use cases where data in a wide variety of formats has been successfully mined for specific purposes.

  • One example is real-time video processing.

End Users Can Visualize Data:

  • A big data initiative is going to require next-level data visualization tools, which present Big data in easy-to-read charts, graphs and slide-shows.

  • In the vast quantities of data being examined, these applications will be able to offer processing engines that let end users query and manipulate information quickly.

New Business Opportunities:

  • As big data analytics tools continue to mature, more users are realizing the competitive advantage to being a data-driven enterprise.

  • Social media sites have identified opportunities to generate revenue from the data they collect by selling ads based on an individual user’s interests. This lets companies target specific sets of individuals that fit an ideal client or prospect profile.

  • Big data use cases in about in retail, where the focus is on gaining insights by studying consumer behaviour in online stores or physical shopping centres.

Your Data Analysis Methods, Capabilities Will Evolve:

  • Data is no longer simply numbers in a database. Text, audio and video files can also provide valuable insight; the right tools can even recognize specific patterns based on predefined criteria.

  • This happens using natural language processing tools, which can prove vital to text mining, sentiment analysis, clinical language and name entity recognition efforts.

  • One example that highlights the use of audio analysis and big data comes from Matter-sight. This call centre tool can match incoming caller to the appropriate customer agent by using predictive behavioural routing and other analytics technology.

Other Points:

  • Big Data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services.

  • Sophisticated analytics can substantially improve decision-making, minimize risks, and unearth valuable insights that would otherwise remain hidden.

  • Big Data can unlock significant value by making information transparent.

  • Big Data can be used to develop the next generation of products and services.

So how does it work?

  • It depends on the technology used and what you are trying to achieve through the use of BigData.
  • BigData consists of different types of technologies which work together to achieve the end goal: extracting value from data that would have been previously considered ‘dead’.
  • Here are some of the key technologies / concepts associated with BigData:
    Hadoop, HDFS, NoSQL, MapReduce, MongoDB, Cassandra, PIG, HIVE, HBase.

Hadoop and Big Data:

  • Doug Cutting, Cloudera’s Chief Architect, helped create Apache Hadoop out of necessity as data from the web exploded, and grew far beyond the ability of traditional systems to handle it.
  • Hadoop was initially inspired by papers published by Google outlining its approach to handling an avalanche of data, and has since become the de facto standard for storing, processing and analyzing hundreds of terabytes, and even petabytes of data.
  • Apache Hadoop is 100% open source, and pioneered a fundamentally new way of storing and processing data.
  • Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data.
  • With Hadoop, no data is too big. And in today’s hyper-connected world where more and more data is being created every day, Hadoop’s breakthrough advantages mean that businesses and organizations can now find value in data that was recently considered useless.

Reveal Insight From All Types of Data, From All Types of Systems

  • Hadoop can handle all types of data from disparate systems: structured, unstructured, log files, pictures, audio files, communications records, email– just about anything you can think of, regardless of its native format.

  • You don’t need to know how you intend to query your data before you store it. Hadoop lets you decide later and over time can reveal questions you never even thought to ask.

  • Hadoop lets you see relationships that were hidden before and reveal answers that have always been just out of reach.

Redefine the Economics of Data:

  • Hadoop’s cost advantages over legacy systems redefine the economics of data.

  • Legacy systems, while fine for certain workloads, simply were not engineered with the needs of Big Data in mind and are far too expensive to be used for general purpose with today’s largest data sets.

  • As it relies in an internally redundant data structure and is deployed on industry standard servers rather than expensive specialized data storage systems, you can afford to store data not previously viable.

  • Enterprises who build their Big Data can afford to store literally all the data in their organization, and keep it all online for real-time interactive querying, business intelligence, analysis and visualization.

Restructure Your Thinking:

  • With data growing so rapidly and the rise of unstructured data accounting for 90% of the data today, the time has come for enterprises to re-evaluate their approach to data storage, management and analytics.
  • Legacy systems will remain necessary for specific high-value, low-volume workloads, and compliment the use of Hadoop-optimizing the data management structure in your organization by putting the right Big Data workloads in the right systems.
  • The cost-effectiveness, scalability and streamlined architectures of Hadoop will make the technology more and more attractive.

Apache Hadoop:

Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users.It is licensed under the Apache License 2.0.

The Apache Hadoop framework is composed of the following modules:

  • Hadoop Common – contains libraries and utilities needed by other Hadoop modules
  • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.
  • Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users’ applications.
  • Hadoop MapReduce – a programming model for large scale data processing.

Apache Hadoop’s MapReduce and HDFS components originally derived respectively from Google’s MapReduce and Google File System (GFS) papers.

Hello!! If you are still reading this blog, Just want to thank you for your patience.

If you don’t like reading too much of text then we have something for you too.

In my next blog we will discuss “How to setup up single-node Hadoop cluster backed by the Hadoop Distributed File System, running on Ubuntu Linux. ”

Hope you enjoyed reading my blog.

Have a great time!

Cheers! 🙂


Terms And Conditions Of Use:

All content provided on this blog ( is for informational purposes only. The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site.

The owner( of this blog will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of this information.





Mozilla India Inter-Community Meetup 2014

Community is never “I” its always “We”. Building a community is not a one head task. Huge amount of labor is required to build an awesome community. Mozilla India is one of the awesome community I am involved with.

After organizing and attending series of event, it was time for most important and awesome event. It was time for “Mozilla India Inter-Community Meetup 2014” . It was one of the most fruitful and productive meetup I ever have been to. It was a three days event. First day concentrating basically on Task Forces and the other two day for community discussion.


Without stretching my blog much I will concentrate on L10N, Webmaker and FSA. As I was involved in these discussion.

Firefox Student Ambassador:


What are you most proud of in 2013?

  • Achieved figure of 2000 FSAs in India in year 2013.
  • We have 500 Firefox Clubs in India.
  • Big contribution of FSA in achieving 5000 Mozillains.
  • About 100 FSA in Institution of national importance like IITs, IIITs and NITs.

What was your biggest challenge in 2013?

  • Providing swags to FSA events.
  • Communication between FSA and Reps.

What is your big goal for 2014?

  • 5000 FSA and 1000 Clubs in India.
  • Quality training of FSA for better contribution.
  • Develop a community of FSA developer group

Suggested Solution to some common FSA related problems:

  1. SWAGS: Most FSA’s are not getting any swags for their events.

    1. Encourage FSA’s to organize events without Swags because learning comes first and swag next.

    2. We can ask the FSA’s to fill up the “Event Response Form” so that a Rep can be assigned for that event.

    3. The assigned Rep will help with Swags and Budget.

  2. Permission from Institution: Sometimes FSA’s are facing problem in getting dates from Institution because some Institution requires official letter from respective organization.

    1. We can help FSA’s by sending a letter from to institution authorities for conducting a successful event.

    2. We can ask the respective assigned Reps to send a letter to the institution.

  3. Less Communication between Reps and FSA’s:

    1. Engage more Reps to volunteer to communicate with FSA’s on regular basis.

    2. Encourage FSA’s by analyzing their past events and help them to improve.

  4. How to share my (FSA) activities with Mozilla ?

    1. FSA activities can be shared using FSA Facebook group or via Twitter (@mozstudents).

    2. The best way to share is write a blog and take lots of pics.




Challenges :

  • Off-line resources – to #teachtheweb in rural and places with low band width

  • Productive event format – what makes the webmaker event productive ?

  • Follow-up – recruited but no response when contacted over mails

  • Infrastructure – not all places have minimal resources that is needed during the event

  • Training of mentors – needed to make sure the speaker is conveying right information

  • Templates (Region Specific) – needed to make people interested in making some stuff using webmaker tools

  • Hacking – lot of people claims aren’t we hacking someone’s website

  • Understanding of tools – how to work on tools – eg : trimming the length of the audio

  • Handling advanced users – make users understand the definition of webmaker and its use beyond the tools

Solutions :

  • Sustainability of contributors: Making contributors BEST understand about webmaker project so they will be passionate about their contribution
  • Off-line resources: Webmaker team is right now work on offline kits. Soon , we may have them ! A lot of webmaker offline activities as available, on the portal.
  • Productive event format: People should understand the need of being web literate and willing to share their learning with others.
  • Follow-up: A structured follow-up procedure to be designed. Create a structured process for asking the organizers to send in the best (filtered) makes and event photos
  • Training of mentors – Train the trainers sessions, A guide for mentors includes better understanding of tools, web litercy and the webmaker mission.
  • Templates (Region Specific): Every webmaker mentor should create a template every month which will be showcased during mozilla india online (IRC) meetup
  • Handling advanced users – make users understand the definition of webmaker and its use beyond the tools

Localization (L10N):



  • Not aware of input, resource and tools

  • Translation/transliteration

  • Use of Machine Translation

  • Use of Standard Tools

  • No standard review process and quality checks

  • Lack of training to new joiners

  • Issue in communication

  • SUMO translation process

  • Lack of l10n teams in other languages (ONLY 12 languages in Mozilla)

  • Less events in localization


  • Conduct training events for localizers, schools, colleges etc .

  • Need of context based translation and trans-creation

  • Need of proper documentation

  • Tools available in FOSS : POOTLE, TRANSIFEX, ZANATA, TRANSLATEWIKI.. Mozilla to take a call on one STANDARD TOOL

  • Stable Review process – Review has to be done for most ready projects such as Firefox browser, fennec, Firefox OS.

  • Localization specific events: Awareness sprints by use of social media or direct visit to institutions to make the mass aware of the localized products and thereby generate interest.

  • More inclusive communication by REPS with the respective community.

  • SUMO to adopt PUBLICAN or similar tools for simplification of translation process.

  • Encourage other language communities to work with Mozilla L10n.

This time I tried to make my blog more productive by using more text and less images.

Personally I enjoyed being with all the Mozillians.


Have a great time!

Cheers! 🙂

Event Page: Click

Event Pics: Click Click  Click Click

Other Blogs:  Blog 1 Blog 2 Blog 3  Blog 4  Blog 5   Blog 6



Organizing events in IIT’s is always a pleasure. So this time we geared up for organizing MozSetup in IIT KGP.

Planning was done with the help of my Mentor Sayak and FSA’s of IIT KGP Firefox Club.

The event was planned to be conducted on 29th and 30th March, So we organized a small Pre-meetup before the event on 26th of March.(MozMeetup Pics)

Agendas for first day of this event was:

Talk Sessions:

  • Inauguration of Firefox Club @ IIT KGP .
  • Introduction to Mozilla and the Mozilla Mission.
  • Introduction to Mozilla Products and Innovations.
  • How to start with Contributions and where to start from.
  • Introductions to Webamaker Tools.
  • Hands-on session on Webamaker Tools.
  • Introduction to Localization
  • Introduction to FSA.
  • Getting Started with Mozilla Central Codebase.
  • Introduction to Firefox OS.

So we began by calling a FSA  Devavrat Walinjkar from IIT KGP Firefox Club. He explained about how they formed the club and what are their future plans.


Then we moved on to Mozilla mission by Gauthamraj. He explained everyone about Mozilla, Mozilla Products and its innovation.

Next Saurabh took the stage to explain “How to start with Contributions and where to start from”. He explained about different areas of contribution.

Now it was time for Webmaker Session, So we called Gauthamraj again to explain about awesome Webmaker tools i.e. Thimble, Popcorn Maker and X Ray Goggles. He also gave a hand on session to the participants.

Next we switched on for Localization session by Biraj. He explained the importance of Localization and how we can contribute in it.

After Localization session Saurabh explained about Mozilla Central Codebase and how to contribute in development.

The last session of first day was on FirefoxOS by Biraj and Sumantro. Both of them explained about the awesome FirefoxOS and how we can make Apps for it.

We ended the first day with some good talks.

Agendas for second day of this event was:

  • Firefox OS App Development.
  • BugsAhoy, and submitting your first patch to Mozilla.
  • Bugs resolution code sprint.

On the second everyone joined to teach each and every participants about how we can start contributing in App development, How we can solve bugs, how we can submit patch to Mozilla.

So this time Manish, Saurabh, Sankha, Sayak, Soumya, Sumantro, Biraj, Gaurav and others were totally engaged with participants to teach them to the maximum.

Firefox App development was taken care by Biraj, Sayak, and Sumantro.

BugsAhoy, and submitting your first patch to Mozilla was taken care by Manish, Saurabh, Sankha, Gaurav and Soumya.

We got nearly six successful submission by participants which can be found here.

After successful submission by participants in Bugs resolution code sprint they were awarded for their effort.

Then we also awarded FSA of IIT KGP Firefox Club for their effort.

Special thanks to Gaurav Jain and other FSA’s of IIT KGP Firefox Club.

Some pics of community Lunch:

Event Page:

Event Pics:

Hope you enjoyed reading my blog! 🙂



Post Navigation