Je souhaite lire les chemins d’access aux fichiers, qu’ils soient HDFS ou locaux. Actuellement, je passe les chemins locaux avec le préfixe fichier: // et les chemins HDFS avec le préfixe hdfs: // et écrit du code comme suit
Configuration configuration = new Configuration(); FileSystem fileSystem = null; if (filePath.startsWith("hdfs://")) { fileSystem = FileSystem.get(configuration); } else if (filePath.startsWith("file://")) { fileSystem = FileSystem.getLocal(configuration).getRawFileSystem(); }
De là, j’utilise les API du système de fichiers pour lire le fichier.
Pouvez-vous s’il vous plaît laissez-moi savoir s’il existe un meilleur moyen que celui-ci?
Est-ce que ça a du sens,
public static void main(Ssortingng[] args) throws IOException { Configuration conf = new Configuration(); conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml")); conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml")); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); System.out.println("Enter the file path..."); Ssortingng filePath = br.readLine(); Path path = new Path(filePath); FileSystem fs = path.getFileSystem(conf); FSDataInputStream inputStream = fs.open(path); System.out.println(inputStream.available()); fs.close(); }
Vous n’êtes pas obligé de mettre cette vérification si vous passez par là. Obtenez le FileSystem directement à partir de Path et faites ce que vous voulez.
Vous pouvez obtenir le système de FileSystem
de la manière suivante:
Configuration conf = new Configuration(); Path path = new Path(ssortingngPath); FileSystem fs = FileSystem.get(path.toUri(), conf);
Vous n’avez pas besoin de déterminer si le chemin d’access commence par hdfs://
ou file://
. Cette API fera le travail.
S’il vous plaît vérifier l’extrait de code ci-dessous qui liste les fichiers du chemin d’access HDFS; à savoir la chaîne de chemin qui commence par hdfs://
. Si vous pouvez fournir la configuration Hadoop et le chemin local, il listera également les fichiers du système de fichiers local. à savoir la chaîne de chemin qui commence par file://
.
//helper method to get the list of files from the HDFS path public static List listFilesFromHDFSPath(Configuration hadoopConfiguration, Ssortingng hdfsPath, boolean recursive) { //resulting list of files List filePaths = new ArrayList (); FileSystem fs = null; //try-catch-finally all possible exceptions try { //get path from ssortingng and then the filesystem Path path = new Path(hdfsPath); //throws IllegalArgumentException, all others will only throw IOException fs = path.getFileSystem(hadoopConfiguration); //resolve hdfsPath first to check whether the path exists => either a real directory or o real file //resolvePath() returns fully-qualified variant of the path path = fs.resolvePath(path); //if recursive approach is requested if (recursive) { //(heap issues with recursive approach) => using a queue Queue fileQueue = new LinkedList (); //add the obtained path to the queue fileQueue.add(path); //while the fileQueue is not empty while (!fileQueue.isEmpty()) { //get the file path from queue Path filePath = fileQueue.remove(); //filePath refers to a file if (fs.isFile(filePath)) { filePaths.add(filePath.toSsortingng()); } else //else filePath refers to a directory { //list paths in the directory and add to the queue FileStatus[] fileStatuses = fs.listStatus(filePath); for (FileStatus fileStatus : fileStatuses) { fileQueue.add(fileStatus.getPath()); } // for } // else } // while } // if else //non-recursive approach => no heap overhead { //if the given hdfsPath is actually directory if (fs.isDirectory(path)) { FileStatus[] fileStatuses = fs.listStatus(path); //loop all file statuses for (FileStatus fileStatus : fileStatuses) { //if the given status is a file, then update the resulting list if (fileStatus.isFile()) filePaths.add(fileStatus.getPath().toSsortingng()); } // for } // if else //it is a file then { //return the one and only file path to the resulting list filePaths.add(path.toSsortingng()); } // else } // else } // try catch(Exception ex) //will catch all exception including IOException and IllegalArgumentException { ex.printStackTrace(); //if some problem occurs return an empty array list return new ArrayList(); } // finally { //close filesystem; not more operations try { if(fs != null) fs.close(); } catch (IOException e) { e.printStackTrace(); } // catch } // finally //return the resulting list; list can be empty if given path is an empty directory without files and sub-directories return filePaths; } // listFilesFromHDFSPath
Si vous souhaitez vraiment utiliser l’API java.io.File, la méthode suivante vous aidera à répertorier les fichiers uniquement à partir du système de fichiers local. à savoir la chaîne de chemin commençant par file://
.
//helper method to list files from the local path in the local file system public static List listFilesFromLocalPath(Ssortingng localPathSsortingng, boolean recursive) { //resulting list of files List localFilePaths = new ArrayList (); //get the Java file instance from local path ssortingng File localPath = new File(localPathSsortingng); //this case is possible if the given localPathSsortingng does not exit => which means neither file nor a directory if(!localPath.exists()) { System.err.println("\n" + localPathSsortingng + " is neither a file nor a directory; please provide correct local path"); //return with empty list return new ArrayList (); } // if //at this point localPath does exist in the file system => either as a directory or a file //if recursive approach is requested if (recursive) { //recursive approach => using a queue Queue fileQueue = new LinkedList (); //add the file in obtained path to the queue fileQueue.add(localPath); //while the fileQueue is not empty while (!fileQueue.isEmpty()) { //get the file from queue File file = fileQueue.remove(); //file instance refers to a file if (file.isFile()) { //update the list with file absolute path localFilePaths.add(file.getAbsolutePath()); } // if else //else file instance refers to a directory { //list files in the directory and add to the queue File[] listedFiles = file.listFiles(); for (File listedFile : listedFiles) { fileQueue.add(listedFile); } // for } // else } // while } // if else //non-recursive approach { //if the given localPathSsortingng is actually a directory if (localPath.isDirectory()) { File[] listedFiles = localPath.listFiles(); //loop all listed files for (File listedFile : listedFiles) { //if the given listedFile is actually a file, then update the resulting list if (listedFile.isFile()) localFilePaths.add(listedFile.getAbsolutePath()); } // for } // if else //it is a file then { //return the one and only file absolute path to the resulting list localFilePaths.add(localPath.getAbsolutePath()); } // else } // else //return the resulting list; list can be empty if given path is an empty directory without files and sub-directories return localFilePaths; } // listFilesFromLocalPath