%0 Journal Article %A HU Zhi-gang %A LIU Xiu-lei %A YU Jun-yang %T Smallfiles on HDFS Merging based on the Energy Efficiency %D 2015 %R 10.13190/j.jbupt.2015.06.008 %J Journal of Beijing University of Posts and Telecommunications %P 34-38 %V 38 %N 6 %X

The map reduce program operated on Hadoop distributed file system (HDFS) has a high-energy-cost problem caused by existence of small files. In order to solve this problem, the article established a new energy model of Hadoop node cluster to analyze data then proved that there exists the optimal file size on Hadoop which can reduce the energy cost of program operation to the lowest level, and based on the above data and the margin analysis theory, a judging strategy was put forward, which can find the optimal file size from the angle of energy cost and visit cost. This strategy can merge the small files on HDFS to the optimal file size according to the cost efficiency, so to get the best benefit. The existence of optimal sized data block was proved by examination, and the reasonability and validity of identifying the data block size by the combination of cost and efficiency under the margin analysis theory are proved as well by examination.

%U https://journal.bupt.edu.cn/EN/10.13190/j.jbupt.2015.06.008