How to chose a Hadoop File Format ?

Well, this is an interesting question and a very commonly faced issue by the Big Data developer considering to store data in the Hadoop, which offers to store the large files in a distributed manner. well here, the question is not the only tough part but answer is also a little tricky and requires a proper analysis before making call to chose a format.
With my experience and understanding, I am trying to note down my thoughts on this and hopefully it will be an aid to you. Like i mentioned this requires an analysis, this is all about considering few particular factors:

  1. Tool of choice for the data processing
  2. Do the data structure/schema changes over the time ?
  3. Any impact if we split the data and process ?
  4. Whether the data is to be compressed or not
  5. what type of data to be processed
  6. Data volume
  7. whether the job is read intensive, update intensive or update intensive

Comments

Popular posts from this blog

What if you do not set the property 'hadoop.tmp.dir'?