Lecture de fichiers HDFS et locaux en Java

Je souhaite lire les chemins d’access aux fichiers, qu’ils soient HDFS ou locaux. Actuellement, je passe les chemins locaux avec le préfixe fichier: // et les chemins HDFS avec le préfixe hdfs: // et écrit du code comme suit

Configuration configuration = new Configuration(); FileSystem fileSystem = null; if (filePath.startsWith("hdfs://")) { fileSystem = FileSystem.get(configuration); } else if (filePath.startsWith("file://")) { fileSystem = FileSystem.getLocal(configuration).getRawFileSystem(); } 

De là, j’utilise les API du système de fichiers pour lire le fichier.

Pouvez-vous s’il vous plaît laissez-moi savoir s’il existe un meilleur moyen que celui-ci?

Est-ce que ça a du sens,

 public static void main(Ssortingng[] args) throws IOException { Configuration conf = new Configuration(); conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml")); conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml")); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); System.out.println("Enter the file path..."); Ssortingng filePath = br.readLine(); Path path = new Path(filePath); FileSystem fs = path.getFileSystem(conf); FSDataInputStream inputStream = fs.open(path); System.out.println(inputStream.available()); fs.close(); } 

Vous n’êtes pas obligé de mettre cette vérification si vous passez par là. Obtenez le FileSystem directement à partir de Path et faites ce que vous voulez.

Vous pouvez obtenir le système de FileSystem de la manière suivante:

 Configuration conf = new Configuration(); Path path = new Path(ssortingngPath); FileSystem fs = FileSystem.get(path.toUri(), conf); 

Vous n’avez pas besoin de déterminer si le chemin d’access commence par hdfs:// ou file:// . Cette API fera le travail.

S’il vous plaît vérifier l’extrait de code ci-dessous qui liste les fichiers du chemin d’access HDFS; à savoir la chaîne de chemin qui commence par hdfs:// . Si vous pouvez fournir la configuration Hadoop et le chemin local, il listera également les fichiers du système de fichiers local. à savoir la chaîne de chemin qui commence par file:// .

  //helper method to get the list of files from the HDFS path public static List listFilesFromHDFSPath(Configuration hadoopConfiguration, Ssortingng hdfsPath, boolean recursive) { //resulting list of files List filePaths = new ArrayList(); FileSystem fs = null; //try-catch-finally all possible exceptions try { //get path from ssortingng and then the filesystem Path path = new Path(hdfsPath); //throws IllegalArgumentException, all others will only throw IOException fs = path.getFileSystem(hadoopConfiguration); //resolve hdfsPath first to check whether the path exists => either a real directory or o real file //resolvePath() returns fully-qualified variant of the path path = fs.resolvePath(path); //if recursive approach is requested if (recursive) { //(heap issues with recursive approach) => using a queue Queue fileQueue = new LinkedList(); //add the obtained path to the queue fileQueue.add(path); //while the fileQueue is not empty while (!fileQueue.isEmpty()) { //get the file path from queue Path filePath = fileQueue.remove(); //filePath refers to a file if (fs.isFile(filePath)) { filePaths.add(filePath.toSsortingng()); } else //else filePath refers to a directory { //list paths in the directory and add to the queue FileStatus[] fileStatuses = fs.listStatus(filePath); for (FileStatus fileStatus : fileStatuses) { fileQueue.add(fileStatus.getPath()); } // for } // else } // while } // if else //non-recursive approach => no heap overhead { //if the given hdfsPath is actually directory if (fs.isDirectory(path)) { FileStatus[] fileStatuses = fs.listStatus(path); //loop all file statuses for (FileStatus fileStatus : fileStatuses) { //if the given status is a file, then update the resulting list if (fileStatus.isFile()) filePaths.add(fileStatus.getPath().toSsortingng()); } // for } // if else //it is a file then { //return the one and only file path to the resulting list filePaths.add(path.toSsortingng()); } // else } // else } // try catch(Exception ex) //will catch all exception including IOException and IllegalArgumentException { ex.printStackTrace(); //if some problem occurs return an empty array list return new ArrayList(); } // finally { //close filesystem; not more operations try { if(fs != null) fs.close(); } catch (IOException e) { e.printStackTrace(); } // catch } // finally //return the resulting list; list can be empty if given path is an empty directory without files and sub-directories return filePaths; } // listFilesFromHDFSPath 

Si vous souhaitez vraiment utiliser l’API java.io.File, la méthode suivante vous aidera à répertorier les fichiers uniquement à partir du système de fichiers local. à savoir la chaîne de chemin commençant par file:// .

  //helper method to list files from the local path in the local file system public static List listFilesFromLocalPath(Ssortingng localPathSsortingng, boolean recursive) { //resulting list of files List localFilePaths = new ArrayList(); //get the Java file instance from local path ssortingng File localPath = new File(localPathSsortingng); //this case is possible if the given localPathSsortingng does not exit => which means neither file nor a directory if(!localPath.exists()) { System.err.println("\n" + localPathSsortingng + " is neither a file nor a directory; please provide correct local path"); //return with empty list return new ArrayList(); } // if //at this point localPath does exist in the file system => either as a directory or a file //if recursive approach is requested if (recursive) { //recursive approach => using a queue Queue fileQueue = new LinkedList(); //add the file in obtained path to the queue fileQueue.add(localPath); //while the fileQueue is not empty while (!fileQueue.isEmpty()) { //get the file from queue File file = fileQueue.remove(); //file instance refers to a file if (file.isFile()) { //update the list with file absolute path localFilePaths.add(file.getAbsolutePath()); } // if else //else file instance refers to a directory { //list files in the directory and add to the queue File[] listedFiles = file.listFiles(); for (File listedFile : listedFiles) { fileQueue.add(listedFile); } // for } // else } // while } // if else //non-recursive approach { //if the given localPathSsortingng is actually a directory if (localPath.isDirectory()) { File[] listedFiles = localPath.listFiles(); //loop all listed files for (File listedFile : listedFiles) { //if the given listedFile is actually a file, then update the resulting list if (listedFile.isFile()) localFilePaths.add(listedFile.getAbsolutePath()); } // for } // if else //it is a file then { //return the one and only file absolute path to the resulting list localFilePaths.add(localPath.getAbsolutePath()); } // else } // else //return the resulting list; list can be empty if given path is an empty directory without files and sub-directories return localFilePaths; } // listFilesFromLocalPath