How to chose a Hadoop File Format ?
Well, this is an interesting question and a very commonly faced issue by the Big Data developer considering to store data in the Hadoop, which offers to store the large files in a distributed manner. well here, the question is not the only tough part but answer is also a little tricky and requires a proper analysis before making call to chose a format. With my experience and understanding, I am trying to note down my thoughts on this and hopefully it will be an aid to you. Like i mentioned this requires an analysis, this is all about considering few particular factors: Tool of choice for the data processing Do the data structure/schema changes over the time ? Any impact if we split the data and process ? Whether the data is to be compressed or not what type of data to be processed Data volume whether the job is read intensive, update intensive or update intensive