转自:
从NameNode节点获取数据块所在节点等信息
客户端在和数据节点建立流式接口的TCP连接,读取文件数据前需要定位数据的位置,所以首先客户端在 DFSClient.callGetBlockLocations()
方法中调用了远程方法 ClientProtocol.getBlockLocations()
,调用该方法返回一个LocatedBlocks对象,包含了一系列的LocatedBlock实例,通过这些信息客户端就知道需要到哪些数据节点上去获取数据。这个方法会在NameNode.getBlockLocations()中调用,进而调用FSNamesystem.同名的来进行实际的调用过程,FSNamesystem有三个重载方法,代码如下:
LocatedBlocks getBlockLocations(String clientMachine, String src, long offset, long length) throws IOException { LocatedBlocks blocks = getBlockLocations(src, offset, length, true, true, true); if (blocks != null) { //如果blocks不为空,那么就对数据块所在的数据节点进行排序 //sort the blocks // In some deployment cases, cluster is with separation of task tracker // and datanode which means client machines will not always be recognized // as known data nodes, so here we should try to get node (but not // datanode only) for locality based sort. Node client = host2DataNodeMap.getDatanodeByHost( clientMachine); if (client == null) { Listhosts = new ArrayList (1); hosts.add(clientMachine); String rName = dnsToSwitchMapping.resolve(hosts).get(0); if (rName != null) client = new NodeBase(clientMachine, rName); } DFSUtil.StaleComparator comparator = null; if (avoidStaleDataNodesForRead) { comparator = new DFSUtil.StaleComparator(staleInterval); } // Note: the last block is also included and sorted for (LocatedBlock b : blocks.getLocatedBlocks()) { clusterMap.pseudoSortByDistance(client, b.getLocations()); if (avoidStaleDataNodesForRead) { Arrays.sort(