Up: Systems Programming

Directory and File Paths

slide 001 Hello and welcome to the fourth episode of the Software
Carpentry lectures on handling directories and files in
Python. In the previous episodes, we’ve seen how to explore
directories and enquire about their contents. In this one we’ll
look more at handling directory and file paths, again using the
os.path module.
slide 002 We may want to build up paths from variables containing
directory or file names. These variables might come from other
functions, from configuration files or from the user, via a GUI,
or the command-line. For example, here we have three variables
we might use to build a file path.
slide 003 We could create a new variable, path by appending base, a string
with a file separator, user, another file separator string and
datadir. This would work just fine.
slide 004 But the use of the file separator string isn’t very clean. More
seriously, it assumes we’re running on Linux or UNIX which means
our code isn’t very portable. What if we want to run on Windows
too, which uses a backslash as its file separator?
slide 005 Python provides a join function in its os.path module that means
we don’t have to worry about file separators.
slide 006 Join is one of those useful functions that takes two or more
arguments.
slide 008 Join picks a file separator based upon what it knows to be the
current operating system.
slide 009 And if we ran this on Windows, this is what we would get.
slide 010 Note the backslashes in the path. Actually they are double
backslashes but this is only because we are printing them.
slide 011 But, you might say, what about that initial forward slash. How
do we handle that?
slide 012 Python again comes to our rescue with its normpath
function. Normpath converts paths to be consistent with the
current operating system.
slide 013 So for Windows it will convert forward slashes to backslashes.
slide 014 And here’s another example.
slide 015 Normpath does more than just convert file separators. Take, for
example this messy looking path. Putting this into normpath
gives us…
slide 016 …something far cleaner.
slide 017 Normpath also removes duplicated file separators.
slide 018 …and removes the dot shorthand for the current directory.
slide 019 It also tries to resolve the double dot short-hand that
represents parent directories.
slide 020 Sometimes we might have a path and want to get the last part of
the path, for example the file name or the last
directory. Python provides the dirname and basename functions to
do this.
slide 021 Here is a path…
slide 022 Dirname extracts the directories up to but not including the
last component, in this example a file, in the path.
slide 023 Basename returns the last component in the path, in this case
it’s a file name.
slide 024 Split combines the behaviour of both dirname and basename and
returns a pair.
slide 025 The first element in the pair is the same as what dirname
returns.
slide 026 And the second, the same as what basename returns.
slide 027 Another similar function is splitext.
slide 028 Splitext returns a pair consisting of…
slide 029 All of the path up to but not including the file extension.
slide 030 And, the file extension itself. If there is no file extension
then this is just an empty string.
slide 031 Splitdrive also returns a pair.
slide 032 This consists of a drive name. This will be an empty string if
running on Linux or Unix.
slide 033 And it also returns the rest of the path.
slide 034 We may not know if a path is relative or absolute.
slide 035 isabs is a function that checks this.
slide 036 It just checks whether the path begins with a forward slash, for
Linux and Unix, or a backslash, after the drive has been
removed, for Windows.
slide 037 Abspath converts a relative path to an absolute path.
slide 039 It uses the current working directory, returned by getcwd which
we saw in an earlier episode. It just adds this directory to the
front of the path. Then it normalizes the path in a similar way
to normpath. And let’s check that it is indeed now an absolute
path.
slide 040 It is.
slide 041 And here’s another example, with more normalisation needed.
slide 042 This sets the absolute path to be users vlad data dot-dot
dot-dot
slide 043 But then normalizes the dot-dot parent directory short-hand to
get to users.
slide 044 It is important to remember that none of these operations check
whether the directories or files in the paths actually
exist. They are useful, though, as they allow you to build paths
for directories or files you will create later.
slide 045 But it also means you need to do these checks yourself. So,
remember os.path’s exists function.
slide 046 In this episode we saw a number of useful os.path
functions. Join can join relative paths together using the file
separator of the current operating system. Normpath allows us to
convert a path to be consistent with the current operating
system as well as cleaning it up and removing
redundancy. Dirname can get the path to the final directory or
file in a path. Basename can get the name of the final directory
or file in a path. Split combines the dirname and basename,
accessing both the path to the final directory or file and this
directory or file itself. Splitext allows us to get a file
extension. And, splitdrive allows us to get a drive
name. Finally, isabs allows us to see whether a path is relative
or absolute and abspath converts a relative path to an absolute
one.
slide 047 Thank you for listening.

  1. No comments yet.
  1. No trackbacks yet.