For testing purposes I'm trying to load a massive amount of small files into HDFS. Actually we talk about 1 Million (1'000'000) files with a size from 1KB to 100KB. I generated those files with an R-Script on a Linux-System in one folder. Every file has a information structure that contains a header with product information and a different number of columns with numeric information.
The problem is when I try to upload those local files into HDFS with the command:
hdfs dfs -copyFromLocal /home/user/Documents/smallData /
Then i get one of the following Java-Heap-Size errors:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
I use the Cloudera CDH5 distribution with a Java-Heap-Size about 5 GB. Is there another way than increasing this Java-Heap-Size even more? Maybe a better way to load this mass amount of data into HDFS?
I'm very thankfully for every helpful comment!
via Chebli Mohamed
Aucun commentaire:
Enregistrer un commentaire