If you're like most novice Python programmers, you likely are able to envision entire applications in your head but, when it comes time to begin writing code and a blank editor window is staring you in the face, you feel lost and overwhelmed. In today's article, I'll discuss the method I use to get myself started when beginning a program from scratch. By the end of the article, you should have a good plan of attack for starting development for any application.
Before a line of code is ever written, the first thing I do is create a virtual environment.
What is a virtual environment? It's a Python installation completely segregated
from the rest of the system (and the system's default Python installation). Why
is this useful? Imagine you have two projects you work on locally. If both use
the same library (like
requests) but the first uses an older version (and
can't upgrade due to other libraries depending on the old version of
requests), how do you manage to use the newest version of
requests in your
new project? The answer is virtual environments.
To get started, install
virtualenvwrapper (a wrapper around the fantastic
virtualenv package). Add a line to your .bashrc or equivalent file to source
/usr/local/bin/virtualenvwrapper.sh and reload your profile by
file you just edited. You should now have a command,
via tab-completion. If you're using Python 3.3+, virtual environments are
supported by the language, so no package installation is required.
mkvirtualenv <my_project> will
create a new virtualenv named
my_project, complete with
already installed (for Python 3,
python -m venv <my_project> followed by
source <my_project>/bin/activate will do the trick).
Now that you've got your virtual environment set up, it's time to initialize
your source control tool of choice. Assuming it's
git (because, come on...),
git init .. It's also helpful to add a
.gitignore file right away
to ignore compiled Python files and
__pycache__ directories. To do so, create
a file named
.gitignore with the following contents:
Now is also a good time to add a
README to the project. Even if you are the only
person who will ever see the code, it's a good exercise to organize your
README should describe what the project does, its requirements,
and how to use it. I write
READMEs in Markdown, both because GitHub
auto-formats any file named
README.md and because I write all documents in
Lastly, create your first commit containing the two files (
README.md) you just created. Do so via
git add .gitignore README.md,
git commit -m "initial commit".
I begin almost every application the same way: by creating a "skeleton" for the application consisting of functions and classes with docstrings but no implementation. I find that, when forced to write a docstring for a function I think I'm going to need, if I can't write a concise one I haven't thought enough about the problem.
To serve as an example application, I'll use a script recently created by a tutoring client during one of our sessions. The goal of the script is to create a csv file containing the top grossing movies of last year (from IMDB) and the keywords on IMDB associated with them. This was a simple enough project that it could be completed in one session, but meaty enough to require some thought.
First, create a main file to serve as the entry point to your application. I'll
imdb.py. Next, copy-and-paste the following code into your editor
(and change the docstring as appropriate):
1 2 3 4 5 6 7 8 9
While it may not look like much, this is a fully functional Python program. You
can run it directly and get back the proper return code (
0, though to be
fair, running an empty file will also return the proper code). Next I'll create
stubs for the functions and/or classes that I think I'll need:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
That seems reasonable. Notice that the functions both include parameters (i.e.
get_keywords_for_movie, includes the parameter
movie_url). This may seem
odd when implementing stubs. Why include any parameters at this point? The
reasoning is the same as for pre-writing the docstring: if I don't know what
arguments the function will take, I haven't thought it through enough.
At this point, I'd probably commit to
git, as I've done
a bit of work that I wouldn't like to lose. After that's done, it's on to the
implementation. I always begin implementing
main, as it's the "hub" connecting
all other functions. Here's the implementation for
1 2 3 4 5 6 7 8 9 10 11
Despite the fact that
haven't been implemented yet, I know enough about them to make use of them.
main does exactly what we discussed in the beginning: gets the top grossing
movies and outputs a csv file of their keywords.
Now all that remains is the implementation of the missing functions.
Interestingly enough, even though we know
get_keywords_for_movie will be called after
get_top_grossing_movie_links, we can implement them in whatever order we like.
This isn't the case if you simply started writing the script from scratch,
adding things as you go. You would be forced to write the first function before
you could move on to the second. The fact that we can implement (and test!) the
functions in any order shows they are loosely coupled.
1 2 3 4 5 6 7 8
We're using both
BeautifulSoup, so we need to install them with
pip. Now would be a good time to list the project's requirements via
requirements.txt and commit them. This way, we can always create a virtual
environment and install exactly the packages and versions we need to run the
The list comprehension that is returned may look odd, but it's simply doing an
additional, nested iteration over the results of the first and using the
elements from the nested iteration. With list comprehensions, you can chain as
for statements as you'd like.
The last step is the implementation of
1 2 3 4 5 6 7 8 9 10
Reasonably straightforward. The
if movie_title != 'X' was due to my
being a bit too permissive. Rather than try to get it just right, I simply filter out
the links that are bogus with the
Here is the contents of
imdb.py in their entirety:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
The application, which began as a blank editor window, is now complete. Running
output.csv, containing exactly what we'd hoped for. With a
script of this length, I wouldn't write tests as the output of the script is
the test. However, it would certainly be possible (since our functions are
loosely coupled) to test each function in isolation.
Hopefully, you now have a plan of attack when faced with starting a Python project from scratch. While everyone has their own method of starting a project, mine is just as likely to work for you as any other, so give it a try. As always, if you have any questions, feel free to ask in the comments or email me at firstname.lastname@example.org.Posted on by Jeff Knupp