How to chose a Hadoop File Format ?
Well, this is an interesting question and a very commonly faced issue by the Big Data developer considering to store data in the Hadoop, which offers to store the large files in a distributed manner. well here, the question is not the only tough part but answer is also a little tricky and requires a proper analysis before making call to chose a format.
With my experience and understanding, I am trying to note down my thoughts on this and hopefully it will be an aid to you. Like i mentioned this requires an analysis, this is all about considering few particular factors:
With my experience and understanding, I am trying to note down my thoughts on this and hopefully it will be an aid to you. Like i mentioned this requires an analysis, this is all about considering few particular factors:
- Tool of choice for the data processing
- Do the data structure/schema changes over the time ?
- Any impact if we split the data and process ?
- Whether the data is to be compressed or not
- what type of data to be processed
- Data volume
- whether the job is read intensive, update intensive or update intensive
Comments
Post a Comment