Introduction to Python


Python is an excellent, cross-platform, object-oriented interpreted language. Besides ease of use, its main characteristic is that it enforces indentation (don't indent, and the program won't run.)

As of Septembre 2004, there are weaknesses to be aware of if you intend to use Python to write GUI apps for Windows, though:


At least three distributions of Python are currently available for the Windows platform (PythonWare used to be yet another package, but it's been deprecated):

If you only need a basic distribution, try out Tiny Python.

The "import" statement looks for module files in the directories specified in the $PYTHONPATH environment variable. If the named module isn't found in these directories, it returns an error. The first time Python imports a module, it automatically compiles the module as saves it as bytecode; this bytecode file has the same name as the module file, but ends in a .pyc extension. These .pyc files are automatically recompiled if the module changes in any way.

"On Windows, you can also use extension .pyw and interpreter program pythonw.exe instead of .py and python.exe. The w variants run Python without a text-mode console, and thus without standard input and output. These variants are appropriate for scripts that rely on GUIs. You normally use them only when the script is fully debugged, to keep standard output and error available for information, warnings, and error messages during development."

If you are using UltraEdit as your favorite editor, here's the section to add in UE's wordfile.txt to handle Python documents.

Installing Python 2 and Python 3 on the same Windows host



Checking script syntax






Wheel, .whl, "allows for binary redistribution of libraries"


Why import a library twice?


import mylib

import mylib.lib

How to find the list of methods/properties a library offers?

How to uninstall a module?

c:\>pip list

c:\>pip uninstall somemodule

Data Structures


All entries must be of the same data type.

import array as arr

a = arr.array("I",[3,6,9])


Collection of heterogeneous items. Mutable.

x = []

x1 = [1,'apple',3]


List vs. array?

"With arrays, you can perform an operations on all its item individually easily, which may not be the case with lists", eg.


"NumPy arrays are very heavily used in the data science world to work with multidimensional arrays. They are more efficient than the array module and Python lists in general."


"Tuples are another standard sequence data type. The difference between tuples and list is that tuples are immutable, which means once defined you cannot delete, add or edit any values inside it."

Tuples are enclosed in parentheses.

x_tuple = (1,2,3,4,5)

y_tuple = ('c','a','k','e')



"Dictionaries are made up of key-value pairs. key is used to identify the item and the value holds as the name suggests, the value of the item."

Dictionaries are built with curly brackets.

x_dict = {'Edward':1, 'Jorge':2, 'Prem':3, 'Joe':4}

del x_dict['Joe']


{'Edward': 1, 'Jorge': 2, 'Prem': 3}

x_dict['Edward'] # Prints the value stored with the key 'Edward'.


'Sets are a collection of distinct (unique) objects. These are useful to create lists that only hold unique values in the dataset. It is an unordered collection but a mutable one, this is very helpful when going through a huge dataset.'

y_set = set('COOKIE')

print(y_set) # Single unique 'o'

{'I', 'O', 'E', 'C', 'K'}

Collections, heapq

Those are additional data structures.

Code Snippets

Running an external program


Leaving for loop early

"break" or "continue"


Watch out when using shortcuts like += on large strings, as they seem to be much slowed than the more lengthy "mystring = mstring + something".

File I/O

Checking if a directory exists


import os

... or

import os
if not os.path.isdir("./mydir"):

Writing to a text file

log = open('test.txt','w')
log.write("Some string")

Caution: Under Windows, \r\n turns into 0D0D0A. To get the expected 0D0A, just use \n .

Important: Although Python3 uses Unicode, it happily writes data in Latin1 under Windows unless told otherwise:

stuff = "Crème"
with open("cp1252.txt", 'w') as outFile:
with open("utf8.txt", mode='w',encoding='utf-8') as outFile:

Reading from a text file in one go

f = open("c:/test.txt", "r")
data =
print data

Reading from a text file, line by line

f = open("c:/test.txt", "r")
textlines = f.readlines()
for line in textlines:
    print line

Reading for a text file, edit each line, save into new file

import re,sys
MAGIC = 10
f = open("C:\\input.txt", "r")
textlines = f.readlines()
#rewrite lines to new file
log = open('output.txt','w')
#search for pattern using regex
p = re.compile('^\{(.+?)\}')
for line in textlines:
        m =
        nugget = int(
        nugget += MAGIC
        #update line
        start = str(start)
        end = str(end)
        new = "{%s){%s}" % (start,end)
        line = p.sub(new,line)
        #print line
        #adds extra newline :-/
        #print>>log, line
        log.write("%s" % line)

Finding if a file is missing from a directory

We'll read a list of files from a text file, and then check if the file exists:

import os.path
f = open(PATH + "files.txt", "r")
textlines = f.readlines()
for line in textlines:
        line = line.strip()
        if not os.path.isfile(PATH + line):
                print "%s NOT FOUND" % line

Append stuff to a text file

A first way is to open a file in "a" mode:

f = open("c:/test.txt", "a")
f.write("This is an appended line.\r\n")

Another way:

import glob
f = open("", "r")
template = "\n\n" +
for frm in glob.glob('*.txt'):
        f = open(frm, "r+")
        content =
        if 'my pattern' not in content:

Checking that a file exists


import os
if os.path.exists(file):
    return 1
    return 0

... or

import os
def exists(file):
    return os.access(file, os.F_OK)

Checking the size of a file

import os
print os.stat(file)[ST_SIZE]

Displaying the last modified date of a file

os.stat() returns the date a file was last modified in epoch, ie. the origin of times being the number of seconds since January 1st 1970. To turn an epoch into eg. YYYY-MM-DD:

filetime = os.stat('myfile.txt')[ST_MTIME]
#turns epoch into tuple such as (2004, 8, 13, 2, 35, 2, 4, 226, 0)
filetime = time.gmtime(filetime)
#turns tuple into formatted string
print time.strftime("%Y-%m-%d",filetime)

Reading a value from a key in a section of an INI file

import ConfigParser
p = ConfigParser.ConfigParser()
p.readfp (open('index.ini'))
    print p.get('files',file)
    print "section 'files' not found"
    print "ok"

Reading all the key/value items in a section in an INI file

import ConfigParser
p = ConfigParser.ConfigParser()
p.readfp (open('index.ini'))
for item in p.items('files'):
    print("key = " + item[0] + " value = " + item[1])

Writing data to an INI file

Oddly enough, the ConfigParser doesn't have a write() method, so you need to read the INI file, make the changes in memory, open the file in write mode, and write to it:

def writeini(file,size):
    p = ConfigParser.ConfigParser()'index.ini')
    p.set('files', file, size)
    fp = open('index.ini','w')

Setting the current directory

import os

Looping through each file in a directory

import glob
for file in glob.glob('*.htm*'):

Note: On the Windows platform, glob() mixes forward- and backslashes, while open() doesn't allow backslashes altogether ("IOError: [Errno 2] No such file or directory: '.\\mydir\myfile.txt' ".)

Reading information from MS Word files

import win32com.client
app = win32com.client.Dispatch('Word.Application')
doc = app.Documents.Add('c:\\stuff.doc')
for rev in doc.Revisions:
    print rev.Author

Using SQLite as file-based database


When reading data from SQLite3 (which is saved in UTF-8/16), and saving them into a plain text file, Python3 uses the locale as default, eg. cp1252. To save data as UTF-8, make sure you use the following switch:

import sqlite3
con = sqlite3.connect('input.sqlite')
con.row_factory = sqlite3.Row
cur = con.cursor()
cur.execute("SELECT name FROM table1");
results = cur.fetchall()
output = open("output.txt", "w", encoding='UTF-8')
for row in results:


Several wrappers are available to access SQLite from Python, but two stand out: "pysqlite implements Python's DBAPI and was integrated into Python [2.5]. There is another wrapper, APSW ("Another Python SQLite Wrapper"), which is thinner and closer to SQLite's C API."


Installing APSW: Just run the EXE that matches your version of Python, eg. apsw-3.3.13-r1.win32-py2.5.exe

Here's how to display information:

import os, sys, time
import apsw
print "Using APSW file",apsw.__file__
print "APSW version",apsw.apswversion()
print "SQLite version",apsw.sqlitelibversion()

Here's how to play with SQLite:

if os.path.exists("dbfile"):

cursor.execute("create table foo(x,y,z)")
cursor.execute("insert into foo values(1,2,3)")
cursor.execute("insert into foo values(4, 'five', 6.0)")
for row in cursor.execute("select * from foo"):
    print row
for m,n,o in cursor.execute("select x,y,z from foo"):
    print m,n,o

Another example of using APSW (reading a tab-delimited text file to insert books into SQLite)

import re, apsw
f = open("books.tsv", "r")
textlines = f.readlines()
#Extract ISBN + box
p = re.compile('^(.+)\t(\d+)$')
for line in textlines:
        m =
        if m:
                isbn =
                box =
                sql = "SELECT COUNT(isbn) FROM books WHERE isbn='%s'" % isbn
                for row in cursor.execute(sql):
                        #Record not found -> Insert
                        if not row[0]:
                                print "No record found for ISBN " + isbn
                                cursor.execute("INSERT INTO books (id,isbn,box) VALUES (NULL,?,?)", (isbn,box))

Here's how to perform an INSERT and display the values for each column:

cursor.execute("INSERT INTO person (name, address, tel, web, email)
VALUES (:name, :address, :tel, :web, :email)", locals())

Here's how to safely update/insert data and display the resulting query:

sql = 'UPDATE companies SET name=?,address=?,zip=? WHERE id=?;'
        cursor.execute(sql, (name,address,zip,id) )
        print "Failed UPDATING"

Using regular expressions

Here's how to loop through a list of web pages, and check whether a given pattern is found therein:

import sys
import urllib
import re
for i in range(1,10):
        f = urllib.urlopen("" % i)
        #re.I = ignore case
        if'stringtofind',, re.I):
                print "Found in %s" % i

Another way to do this:

p = re.compile('stringtofind')
    print "Found"
    print "Not found"

Here's how to compile a regex, find a pattern, and save it to a file:

p = re.compile('(<some>.+</some>)',re.DOTALL)
m =
if m:
        inputdata =
        print("Pattern not found")
inputdata = inputdata.replace('<other>','<yet>')
with open(output, 'w') as outputfile:

Here's how to load a web page, isolate a section, and display it (Note: you cannot call twice, hence the copying of the page into the 'page' variable):

log = open('found.txt','w')
for i in range(1,10):
        f = urllib.urlopen("" % i)
        print "Checking page %i" % i
        page =
        if'some text',page, re.I):
      '<span class=subject>"(.+?)"</span>',page,re.I)
                if m:
                        log.write("Found in %s\n" % i)

Here's how to read an HTML file, and display the string between the TITLE tags, in any:

import re
f = open('myfile.html', "r")
inputfile =
m ='<title>(.*?)</title>',inputfile,re.I)
if m:

... or if you need to extract more than one set of items:

p = re.compile('blabla (.+?) blabla (.+?)')
packed = p.findall(inputfile)
if packed:
    for x in packed:
        print "Item 1 = " + x[0] + " Item 2 = " + x[1]"

If you need to call a regex a great number of times, you can increase performance by compiling the search pattern:

p = re.compile('[0-9]+')
m ='tempo999')

To replace an item with another item, use re.sub():

print re.sub('john','jane','john doe')

Note that re.sub() is very much slower than using a string's replace() method:

stuff = stuff.replace('_',' ')

Also, the string with which to replace the pattern to search for must have its backslashes escaped prior to calling re.sub(), using the r prefix to indiquate a raw string (ie. with its backslashes treated as regular characters):

toreplace = r"\\"
body = "#"
print re.sub("#",toreplace,body)

If you wish to tell the re module to treat the replace pattern as is even when it contains backslashes, add a call to its escape() function:

toreplace = re.escape(r"\\")
body = "#"
print re.sub("#",toreplace,body)

Here's how to rewrite a phone number:

import sys,re
#Turn 0123456789 into
p = re.compile(r'(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)')
phone = p.sub(r'\1.\2.\3.\4.\5',sys.argv[1])
print phone

Important: By default, the regex library can't handle even European characters, so you must set a locale, and add the re.LOCALE switch:    

import locale
#BAD : Let Python handle it locale.setlocale(locale.LC_ALL, 'FR')
locale.setlocale(locale.LC_ALL, '') 
mypattern = re.compile("(\d+)\s+(\w+)\s+(\d+)",re.LOCALE)

More infos on using regexes in Python:

Driving a web browser

As of April 2021, there are at least two modules to manage a web browser through a Python script: the webbrowser module, and the Selenium module.

mechanize might be too basic.

webbrowser module


Requires that the Selenium Server be running; Selenium is written in Java

Connecting to a web server

Here's how to use urllib to POST to a script:

import urllib
url = ""
data = {'myfield': somevalue}
urldata = urllib.urlencode(data)
results = urllib.urlopen(url, urldata).read()
print results

Here is an example session that uses the 'GET' method to retrieve a URL containing parameters:

import urllib
params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
f = urllib.urlopen("" % params)

The following example uses the 'POST' method instead:

import urllib
params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
f = urllib.urlopen(" ", params)

The following example uses an explicitly specified HTTP proxy, overriding environment settings:

import urllib
proxies = {'http': ''}
opener = urllib.FancyURLopener(proxies)
f ="")

The following example uses no proxies at all, overriding environment settings:

import urllib
opener = urllib.FancyURLopener({})
f ="") ()

Here's how to use Libcurl to POST to a script:

  1. Install Python and Libcurl (eg.
  2. Install PyCurl
  3. Use this script:

Here's how to log on to a web server through POST with support for cookies:

urllib vs urllib2 vs httplib

cookielib vs. ClientCookie

Playing with date/time

Here's how to display the current date and time:

import time
import locale
#displays '08/20/04 22:05:15'
print time.strftime('%c')
#displays 'French_France.1252'
print locale.setlocale(locale.LC_ALL,'')
#displays '20/08/2004 22:05:15'
print time.strftime('%c')

The time value as returned by gmtime(), localtime(), and strptime(), and accepted by asctime(), mktime() and strftime(), is a sequence of 9 integers. The return values of gmtime(), localtime(), and strptime() also offer attribute names for individual fields.




Printing the content of each key:

for i in stuff.keys():
    print i + "=" + stuff[i]

Commenting a block of text

This is one
block of text

ie. three double-quotes in a row.

Exiting a script

import sys

Handling a long line of code

To break a long line of code:

if (somevar) or \

Sending an e-mail

Here's how to send an e-mail through code, passing one parameter to the script:

from email.MIMEText import MIMEText
import smtplib,sys
body='''this text will become the body of the message
Using triple-quotes you can span it easily over multiple lines.
the result of an action'''
msg = MIMEText(body)
From = ""
To = ""
msg['From'] = From
msg['To'] = To
msg['Subject'] = "Call from " + sys.argv[1]
server = smtplib.SMTP("")

Tips from the Python Tutorial

Calling a non-COM DLL from Python

Calling a COM DLL from Python

Shortcut to the last result

In interactive mode, the last printed expression is assigned to the variable _. This means that when you are using Python as a desk calculator, it is somewhat easier to continue calculations, for example:

>>> price * tax
>>> price + _

Long lines

If a statement or string is too long to fit on a line,use the backslash:

Note that whitespace at the beginning of the line is\

You can also use """ or ''' :

print """
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to


Unlike a C string, Python strings cannot be changed. Assigning to an indexed position in the string results in an error.


Unlike strings, which are immutable, it is possible to change individual elements of a list:

a = [’spam’, ’eggs’, 100, 1234]
a[2] = a[2] + 23
[’spam’, ’eggs’, 123, 1234]

Variable number of function parameters

When a final formal parameter of the form **name is present, it receives a dictionary containing all keyword argu-ments whose keyword doesn’t correspond to a formal parameter. This may be combined with a formal parameter of the form *name (described in the next subsection) which receives a tuple containing the positional arguments beyond the formal parameter list. (*name must occur before **name.) For example, if we define a function like this:

def cheeseshop(kind, *arguments, **keywords):

Importing modules

There is even a variant to import all names that a module defines:

from fibo import * 

This imports all names except those beginning with an underscore (_).


The built-in function dir() is used to find out which names a module defines. It returns a sorted list of strings. Without arguments, dir() lists the names [ie. variables and functions] you have defined currently.

page 42


An easy and satisfactory way to distribute your Python script on a Windows host is to compile it with Py2exe (which analyses your script, and tries to extract all the required modules into a ZIP file), and combine the different files using either a standard installer like InnoSetup or NSIS, or combine all the files into the main EXE generated by py2exe using PE Bundle which will extract those extra files at runtime transparently:

  1. Install py2exe, and write a setup script (call it

    from distutils.core import setup
    import py2exe

  2. Open a DOS box, and run the following: python py2exe
  3. A directory named ./dist is create by py2exe, and contains all the files that are required to run your script on a bare Windows host. You can remove the ./build directory (temp stuff)
  4. Combine those few files into a single EXE using either your favorite installer, or PE Bundle
  5. More information available on py2exe

An alternative to py2exe is PyInstaller: "PyInstaller is a program that converts (packages) Python programs into stand-alone executables, under Windows, Linux and Irix. [...] PyInstaller is an effort to rescue, maintain and further develop Gordon McMillan's Python Installer (now PyInstaller). Their official website is not longer available and the original package is not longer maintained. Believing that it is still far superior to py2exe, we have setup this site to continue its further development."

First, read the following to understand the issue of compiling and/or distributing Python scripts:




"setuptools () is a collection of enhancements to distutils which let you build .egg files. Once you start using egg files you can include dependencies between package versions and if your product requires a bunch of other packages the installation step will download and install the appropriate versions.

See for instructions on installing packages built in this way, but in short, the user has to run from the EasyInstall page, and then a command like:


would download and install your package and all the other products it depends on. If at a later stage they want to upgrade to a more recent version then all they need to do is to run:

        easy_install --upgrade MyPackage

Installed eggs usually exist in a single file (importable zip) which makes uninstalling especially easy: just one file to delete."


py2exe is a Python distutils extension which converts python scripts into executable windows programs, able to run without requiring a python installation.

  1. Install py2exe
  2. Create a script
  3. Run the script including the -w (Windows) option to hide the DOS box that Python opens even when running a GUI application
  4. Distribute the resulting .EXE and its dependent DLLs, or generate an installer

Note that even a no-thrill window developed with the wxPython toolkit with just a tiny menu bar that displays a dialog box, turns into a 300KB EXE, and requires 4 binaries for a total of 2.5Meg (and that's after compressing the four dependencies with UPX).

For information, internally, Python source code is always translated into a "virtual machine code" or "byte code" representation before it is interpreted (by the "Python virtual machine" or "bytecode interpreter").  In order to avoid the overhead of parsing and translating modules that rarely change over and over again, this byte code is written on a file whose name ends in ".pyc" whenever a module is parsed (from a file whose name ends in ".py").

When the corresponding .py file is changed, it is parsed and translated again and the .pyc file is rewritten. There is no performance difference once the .pyc file has been loaded (the bytecode read from the .pyc file is exactly the same as the bytecode created by direct translation).  The only difference is that loading code from a .pyc file is faster than parsing and translating a .py file, so the presence of precompiled .pyc files will generally improve start-up time of Python scripts.

If desired, the Lib/ module/script can be used to force creation of valid .pyc files for a given set of modules. Note that the main script executed by Python, even if its filename ends in .py, is not compiled to a .pyc file.  It is compiled to bytecode, but the bytecode is not saved to a file.

McMillan Installer


If you are looking for a way to translate Python programs in order to distribute them in binary form, without the need to distribute the interpreter and library as well, have a look at the script in the Tools/freeze directory [find it in the ActivePython distribution; Guess this refers to the standard Python distro.)

This creates a single binary file incorporating your program, the Python interpreter, and those parts of the Python library that are needed by your program.  Of course, the resulting binary will only run on the same type of platform as that used to create it.

"There is a tool called freeze that is included with Python that does this.  I havn't done it on Windows yet and I have heard that there are some tricks or potential problems with it.  Check the back-postings at dejanews for details.  Basically it scans you code for all imported modules and builds a C module that has all the compiled python modules encoded within it.  Then you compile and linke this file against the Python library and you end up with an executable that along with any binary extension modules you may need will be a distributable version of your program."


"The Freeze utility that comes with Python itself requires a source distribution, a C compiler and linker which makes for a complex environment for creating executables. In addition, this method is very slow for creating executables as compared to the other methods. py2exe is intended for development on Windows only and cx_Freeze is intended for cross platform development. Installer uses an import hook which means that the development environment and runtime environment are considerably different."




"The PyPy project aims at producing a simple runtime-system for the Python language. We want to express the basic abstractions within the Python Language itself. We later want to have a minimal core which is not written in Python and doesn't need CPython anymore. We want to take care that technologies such as PSYCO and Stackless will easily integrate."






"SWIG is a software development tool that connects programs written in C and C++ with a variety of high-level programming languages. SWIG is primarily used with common scripting languages such as Perl, Python, Tcl/Tk, and Ruby, however the list of supported languages also includes non-scripting languages such as Java, OCAML and C#."



"PyChecker is a tool for finding bugs in python source code. It finds problems that are typically caught by a compiler for less dynamic languages, like C and C++. It is similar to lint."


As of 2018, I tried a few of them (IDLE, MS Visual Studio, PyCharm, Wing, Eric), and found PyScripter to be the simplest to install and use.


Main page; Support

To make the IDE actually… readable, choose View > Styles > Windows 10

To set a keyboard shortcut to toggle comments (Source Code > Toggle Comment), use Tools > Options > IDE Shortcuts; I used CTRL+SHIFT+B (B as "block"), since it was available.

Tools > Configure Tools: To use the current script work directory as the Working directory, use "$[ActiveDoc-Dir]"

Instead of the internal Python interpreter, use the external interpreter, so you can easily kill a rogue application if need be.


What causes "Remote Interpreter Reinitialized"?'
How to change encoding, so that UTF-8 strings are displayed correctly (eg. "é"   "é")?
How to remove "greyed out" lines when script stopped running due to error? Can't read code. Nothing in "Run".
How to hide black vertical line in middle of editing window?

Tools > Options > Editor Options : Edge column = 0

How to add items from Tools > Tools into user toolbar?

1. Right-click on User Toolbar

2. Customize

3. Commands tab

4. External Tools

5. Select item, and drag 'n drop it to the User Toolbar.


OLD If you prefer to use an IDE instead of a basic text editor, here are the choices I would recommend:

Bigger list here




Programmer Studio



Visual Python


The eric3 Python IDE

SPE - Stani's Python Editor

Boa Constructor



PythonWin, a.k.a. PyWin32





Writing GUI apps

Some infos

Below is a list of tools to let you build GUI applications. Most are just wrappers around a set of widgets such as Windows' native widgets, wxWidgets, or QT, bringing you back to the days of Windows programming Petzold-style (Mmm...), but some also offer a GUI designer lilke VB, ie. you can draw the windows interface with the mouse. You can read more in the page Gui Programming on the Python site.

Note that the WYSIWYG GUI designer that feels most like VB's is QT Designer, which you can get either directly from QT or by buying the BlackAdder IDE.

Alternatively, you could also use a GUI desiging tool such as the antiquated MS Dialog Editor or its more modern equivalents, just to draw the interface with the mouse and get the coordinates for each widget, and copy/paste this into code. Here are some suggestions I got:

Here are the widgets and/or Python wrappers those GUI designers may require:

Designing the UI as a resource

"Quick side note: depending on your GUI needs, ctypes can be a pretty easy way to go. Create your GUI as resources (e.g. in MS Visual Studio) and wrap them into a tiny DLL. Then use ctypes to load them at runtime and run CreateDialogIndirect. Most of the work involved is simply looking in header files for the values of various Win32 messages and constants, but once you do it the first time you can re-use much of the code over and over."


PythonWin, a.k.a. PyWin32 is not only an IDE, but also an MFC wrapper so you can build Win32 apps without any extra widgets set. Take a look at the samples under Drive:\Python23\Lib\site-packages\

PythonWin offers the following modules to wrap the Win32 APIs:

Note that Python Win32, a.k.a. Win32all, is part of the ActivePython package, so if you use ActivePython instead of the standard Windows version of Python, Python Win32 is already installed.

More information:

Here's the familiar "Hello, World!" as a dialog box in PyWin32:

from pywin.mfc import dialog, window
import win32con
dlgStatic = 130
dlgButton = 128
class Mydialog(dialog.Dialog):
    def OnInitDialog(self):
        rc = dialog.Dialog.OnInitDialog(self)
        return rc
style = (win32con.DS_MODALFRAME |
    win32con.WS_POPUP |
    win32con.WS_VISIBLE |
    win32con.WS_CAPTION |
    win32con.WS_SYSMENU |
cs = win32con.WS_CHILD | win32con.WS_VISIBLE
s = win32con.WS_TABSTOP | cs
w = 64
h = 64
#1. Let's create a dialog box with a label and a pushbutton
dlg = [["PyWin32",(0, 0, w, h), style,  None,   (8, "MS Sans Serif")],]
dlg.append([dlgStatic,"OK",   win32con.IDOK,      (7, h - 18, 50, 14), s | win32con.BS_PUSHBUTTON])
dlg.append([dlgStatic, "Hello, world!", -1, (7, 9, 50, 14), cs | win32con.SS_LEFT])
#2. Let's start the dialog
d = Mydialog(dlg)
#3. Display it

Here's how to add a progress bar, set its range, and increment it:

def OnInitDialog(self):
    rc = dialog.Dialog.OnInitDialog(self)
    self.pbar = win32ui.CreateProgressCtrl()
    self.pbar.CreateWindow (win32con.WS_CHILD | win32con.WS_VISIBLE, (7, 30, 270, 50), self, 1001)
#Find out how many *.HTM* in /input, and set range of progress bar
filecount = 0
for file in glob.glob('*.htm*'):
for file in glob.glob('*.htm*'):

Python GUI API Project


More infos here.


.Net (Mono, DotGNU)

This is very early development, but if you like bleeding edge stuff, you could start looking at how to develop applications using either MS' official .Net framework and its tools (VS.Net and the Python add-on, etc.), or the compatible open-source versions that are Mono and DotGNU. Take a look at IronPython, and boo.



"If you like GTK+, you might want to try the glade designer and parse the XML file with libglade and pygtk.  (Generated code is bad). Remember, glade generates XML.  XML is not code, XML is data.  And data is not code.  As long as you stay away from generated code, you will be safe.  Yup, the best of two worlds -- a graphical form designer that stores information in XML data to be parsed by your own python program."

"BTW, there's a python port of glade underway:"


Binding to the TnFox Toolkit?





RipSting’s Blender-Python GUI Designer

Blender GUI Wizard

ActiveState GUI Builder


PythonWorks Pro

EasyDialogs for Windows




sdk32 - Partial Python wrap of the Win32 Platform SDK




QT Designer



"Dabo is a 3-tier, cross-platform application development framework, written in Python atop the wxPython GUI toolkit. And while Dabo is designed to create database-centric apps, that is not a requirement. Lots of people are using Dabo for the GUI tools to create apps that have no need to connect to a database at all."





Boa Constructor

See above




Visual TCL

"Visual Tcl is a freely-available, high-quality application development environment for UNIX, Windows, Macintosh and AS400 platforms. Visual Tcl is written entirely in Tcl/Tk and generates pure Tcl/Tk code. This makes porting your Visual Tcl applications either unnecessary or trivial. Visual Tcl is covered by the GNU General Public License."

PAGE - Python Automatic GUI Generator


Writing GUIs with Tcl/Tk and TKinter


Tkinter is Python's object-oriented layer on top of Tcl/Tk. Tk only offers basic widgets; If you need more, check out wxPython and PyQt.

Three main concepts: Widgets, event handling, and geometry management (pack, grid, place; pack is the simplest for simple layouts, grid is the most commonly used, and place is the least popular but provides the best control).


Layout/Geometry Managers

Historically, Tkinter supports three layout managers:


The container frame is organized into a two-dimensional table where each cell can hold one widget. However, widgets can be made to span multiple cells.


.pack(side=LEFT|RIGHT|TOP|BOTTOM, fill=X|Y|BOTH,expand=YES|NO,anchor=N|NE|E|SE|S|SW|W|NW)

The pack manager is ideally suited for the following two kinds épauleof situation:


Tkinder provides the following widgets:


To check that Python is correctly installed and that Tkinter works, open a terminal window, and run the following command: python -m tkinter

Loading Tkinter

from tkinter import *

import tkinter

import tkinter as tk

Hello, world!

import tkinter as tk
root = tk.Tk()
root.title("My title")
w = tk.Label(root, text="Hello Tkinter!")
#Fit the size of the window to the given text


A simple OK dialog:

from tkinter import Tk
from tkinter import messagebox
# Hide parent window; in Windows, use ".pyw" as the extension to hide the terminal window as well
messagebox.showinfo("My title", "Hello")

An Yes/No dialog:

from Tkinter import *
from tkMessageBox import *
def answer():
    showerror("Answer", "Sorry, no answer available")
def callback():
    if askyesno('Verify', 'Really quit?'):
        showwarning('Yes', 'Not yet implemented')
        showinfo('No', 'Quit has been cancelled')
Button(text='Quit', command=callback).pack(fill=X)
Button(text='Answer', command=answer).pack(fill=X)

To hide the main window:

from tkinter import Tk
from tkinter.filedialog import askopenfilename
# we don't want a full GUI, so keep the root window from appearing
# show an "Open" dialog box and return the path to the selected file
filename = askopenfilename()
if not filename:

Displaying text: Message, and Text

The Message widget has more features than Label, and the Text widget has even more features.


import tkinter as tk
master = tk.Tk()
whatever_you_do = "Whatever you do will be insignificant, but it is very important that you do it.\n(Mahatma Gandhi)"
msg = tk.Message(master, text = whatever_you_do)
msg.config(bg='lightgreen', font=('times', 24, 'italic'))


import tkinter as tk
import random
def change_label():
root = tk.Tk()
root.title("Changing label")
button = tk.Button(root, text='Change', width=25, command=change_label)

Closing an application

Button(master, text='Quit', command=master.quit)


var1 = IntVar()
Checkbutton(master, text="male", variable=var1).grid(row=1, sticky=W)

Radio button



                  padx = 20,





Variable : tell Tkinter what radiobutton should be the default choice

Text = radiobutton label ; Value = ?


For just a single line of text.

import tkinter as tk
def show_entry_fields():
        print("First Name: %s" % (e1.get()))
master = tk.Tk()
tk.Label(master, text="First Name").grid(row=0)
e1 = tk.Entry(master)
e1.grid(row=0, column=1)
        text='Show', command=show_entry_fields).grid(row=3, column=1, sticky=tk.W, pady=4)


Multiple lines of text.

import tkinter as tk
root = tk.Tk()
S = tk.Scrollbar(root)
T = tk.Text(root, height=4, width=50)
S.pack(side=tk.RIGHT, fill=tk.Y)
T.pack(side=tk.LEFT, fill=tk.Y)
quote = """HAMLET: To be, or not to be--that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune
Or to take arms against a sea of troubles
And by opposing end them. To die, to sleep--
No more--and by a sleep to say we end
The heartache, and the thousand natural shocks
That flesh is heir to. 'Tis a consummation
Devoutly to be wished."""
T.insert(tk.END, quote)

Showing picture

To eg. display a picture. Note that the Canvas object can only display GIF and PGM/PPM files.

import tkinter as tk
from PIL import ImageTk, Image
master = tk.Tk()
img = ImageTk.PhotoImage("IMG_20190522_164109.jpg"))
panel = tk.Label(master, image = img)
panel.pack(side = "bottom", fill = "both", expand = "yes")

Variable Classes

Some widgets (like text entry widgets, radio buttons and so on) can be connected directly to application variables by using special options: variable, textvariable, onvalue, offvalue, and value. This connection works both ways: if the variable changes for any reason, the widget it's connected to will be updated to reflect the new value. These Tkinter control variables are used like regular Python variables to keep certain values. It's not possible to hand over a regular Python variable to a widget through a variable or textvariable option. The only kinds of variables for which this works are variables that are subclassed from a class called Variable, defined in the Tkinter module. They are declared like this:

x = StringVar() # Holds a string; default value ""

x = IntVar() # Holds an integer; default value 0

x = DoubleVar() # Holds a float; default value 0.0

x = BooleanVar() # Holds a boolean, returns 0 for False and 1 for True

To read the current value of such a variable, call the method get(). The value of such a variable can be changed with the set() method.

Web development

More infos here.

Database access


PyGeoj, "a simple Python Geojson file reader and writer."

Encoding JSON = serialization or marshaling; decoding = deserialization.

dumps() is used to handle data in RAM while dump() is to write them to disk.

turn json into Python objets

Use load() and loads().









number (int)


number (real)









with open("data_file.json", "r") as read_file:

    todos = json.loads(response.text)




with open("data_file.json", "r") as read_file:

    json_string = json.dumps(read_file)

turn dictionary into json

data = {

    "president": {

        "name": "Zaphod Beeblebrox",

        "species": "Betelgeusian"



json_string = json.dumps(data)



data = {

    "president": {

        "name": "Zaphod Beeblebrox",

        "species": "Betelgeusian"



with open("data_file.json", "w") as write_file:

    json.dump(data, write_file)


What package/library/module is recommended to work with geoJSON files?

c:\>pip install json

Collecting json

  Could not find a version that satisfies the requirement json (from versions: )

No matching distribution found for json

c:\>pip search geojson

geojson (2.4.0)                    - Python bindings and utilities for GeoJSON

geojsontools (0.0.3)               - Functions for manipulating geojsons

geojson_elevation (0.1)            - GeoJSON compatible elevation proxy

geojson_utils (0.0.2)              - Python helper functions for manipulating GeoJSON

PyGeoj (0.22)                      - A simple Python GeoJSON file reader and writer.


pip install geojson



with open('myfile.geojson') as f:

    gj = geojson.load(f)


for feature in gj['features'][0]:




props = {"name": "My name","country": "Spain"}

point = Point((-115.81, 37.24))


features = []


feature_collection = FeatureCollection(features)


with open('myfile.geojson', 'w') as f:

   dump(feature_collection, f)


"geopy makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources."

Working with XML/HTML

Note: PyXML is deadware

minidom: simplified implementation of DOM

ElementTree (ET): More Pythonic interface than DOM; lxml is an enhanced version of ET

BeautifulSoup uses lxml, if available, and is an easy way to work with HTML/XML

"untangle is a simple library which takes an XML document and returns a Python object which mirrors the nodes and attributes in its structure."

More infos on XML here.


"Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree. […] Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility." Python 3.x should use BeautifulSoup4. Once parsed, BS builds a tree of Python objects (Tag, NavigableString, BeautifulSoup, and Comment.)

"A string corresponds to a bit of text within a tag. Beautiful Soup uses the NavigableString class to contain these bits of text. A NavigableString is just like a Python Unicode string, except that it also supports some of the features described in Navigating the tree and Searching the tree. You can convert a NavigableString to a Unicode string with str: unicode_string = str(tag.string)"

"If you want to use a NavigableString outside of Beautiful Soup, you should call unicode() on it to turn it into a normal Python Unicode string. If you don’t, your string will carry around a reference to the entire Beautiful Soup parse tree, even when you’re done using Beautiful Soup. This is a big waste of memory."

BS provides two ways to find elements: find(_all)(), and select() with more sophisticated features as it's a CSS selector from Soup Sieve).

Different classes, to make it easier to find elements:


.string vs .text? "The string argument is new in Beautiful Soup 4.4.0. In earlier versions it was called text""kml Document") vs."kml > Document")? The former finds any "Document" tags below "kml", no matter where in the tree, while the latter look for it directly under "kml".

CSS: Difference between "#sister" and ".sister"?

find_all only searche tags (elements), or also strings within?

How to parse and output

from bs4 import BeautifulSoup
soup = BeautifulSoup(open('input.html', 'r'), 'xml')
soup = BeautifulSoup("<html>a web page</html>", 'html.parser')
#To work with each tag before having it prettified
for c in soup.contents:

Parsing XML

By default, BS will use an HTML parser unless you specifically tell it to use an XML parser (which will need to be installed)

soup = BeautifulSoup(data, 'xml') #



link = soup.a
for parent in link.parents:
find_parents() and find_parent() work their way up the tree:
a_string = soup.find(string="Lacie")
sibling_soup = BeautifulSoup("<a><b>text1</b><c>text2</c></a>", 'html.parser')
find_next_siblings() and find_next_sibling(), find_previous_siblings() and find_previous_sibling():
for sibling in soup.a.next_siblings:
.next_elements and .previous_elements #iterators to move forward or backward in the document as it was parsed
The find_all_next() method returns all matches, and find_next() only returns the first match. The find_all_previous() method returns all matches, and find_previous() only returns the first match.

Finding elements

soup.title #Important: "name" is a reserved keyword. To access a tag named <name>, use find()
soup.title.string #element's text
soup.title.get_text() #alternative
soup.get_text("|", strip=True)
soup.body.b #get the first <b> tag below <body>
#get all text within a tree
text for text in soup.stripped_strings
title_tag = head_tag.contents[0]
for child in title_tag.children:
soup.p #first paragraph
soup.p['class'] #display value of attribute
soup.find_all('a') #all hyperlinks
soup("a") #shortcut for soup.find_all("a")
soup.title(string=True) #shortcut for soup.title.find_all(string=True)
soup.find(id="link3") #all elements with that attribute
soup.find_all(string="Elsie") #first occurence <blah>Elsie</blah>
soup.find_all(string=["Tillie", "Elsie", "Lacie"])
soup.find_all("a", limit=2)
#Grab attributes
tag = soup.find('meta', {'name': 'keywords'})
Note: If find_all() can’t find anything, it returns an empty list. If find() can’t find anything, it returns None:
for link in soup.find_all('a'):
    print(link.get('href')) #get content of href attribute, ie. link
for string in soup.stripped_strings:
    print(repr(string)) #returns a printable representation of the given object
find() and find_all() are the most populal search methods; Use filters (string, regex, list, function)
soup.find_all("p", "title")
soup.find_all(id=True) #all tags with an "id" attribute
soup.find_all(href=re.compile("elsie"), id='link1')
name_soup.find_all(attrs={"name": "email"}) #name is a reserved keyword
soup.find_all("a", class_="sister") #class is a reserved keyword
for tag in soup.find_all(re.compile("^b")):
soup.find_all(string="Elsie") #all tags that contain Elsie
BeautifulSoup has a .select() method which uses the SoupSieve package to run a CSS selector against a parsed document and return all the matching elements."p.strikeout.body") #CSS selector to search for tags that match two or more CSS classes"html head title") #same as soup.title"p > a") #directly under
soup.select_one(".sister") #only first one"#link1 ~ .sister")"#link1 + .sister")".sister")"[class~=sister]")"#link1")"#link1,#link2")'a[href]')'a[href=""]')'a[href^=""]')'a[href$="tillie"]')'a[href*=".com/el"]')

Modifying the tree

Note: When calling eg. soup.mytag, BS will look for mytag anywhere in the three, not just right after soup
tag = soup.b = "blockquote"
new_tag.string = "Link text."
tag.string.replace_with("No longer bold")
tag = BeautifulSoup('<b id="boldest">bold</b>', 'html.parser').b
tag['id'] = 'verybold'
del tag['id']
tag = soup.a
tag.string = "New link text."
append()/insert() to add to an element's string (at the end, at a given location) which can be empty; new_tag() to add a whole tag. There's also insert_before() and insert_after(). Use clear() to empty a tag's string. Use extract()/decompose() to remove a tag from the tree.
replace_with() can be used with more than one argument: a_tag.b.replace_with(bold_tag, ".", i_tag)
soup = BeautifulSoup("<a>Foo</a>", 'html.parser')
soup.a.append("Bar") #<a>FooBar</a>
new_string = NavigableString("ed")
soup.a.append(new_string) #<a>FooBared</a>
Important: append/insert is used to edit the string of a tag, which can include a whole block (ie. to add a new tag), not just the string of a basic tag
original_tag = soup.b
new_tag = soup.new_tag("a", href="")
new_tag.string = "Link text."
original_tag.append(new_tag) # <b><a href="">Link text.</a></b>
To clean a tag that holds multiple NavigableString objects after using .append():
soup = BeautifulSoup("<a>Soup</a>", 'html.parser')
soup.a.extend(["'s", " ", "on"])
soup # <a>Soup's on</a>
soup.a.contents # ['Soup', ''s', ' ', 'on']
from bs4 import Comment
new_comment = Comment("Nice to see you.")
tag.append(new_comment) # <b>Hello there<!--Nice to see you.--></b>
tag = soup.a #<a href="">I linked to <i></i></a>
tag.insert(1, "but did not endorse ") #<a href="">I linked to but did not endorse <i></i></a>
soup #<b>leave</b>
tag = soup.new_tag("i")
tag.string = "Don't"
soup.b.string.insert_before(tag) # <b><i>Don't</i>leave</b>
div = soup.new_tag('div')
div.string = 'ever'
soup.b.i.insert_after(" you ", div) # <b><i>Don't</i> you <div>ever</div> leave</b>
soup.b.contents # [<i>Don't</i>, ' you', <div>ever</div>, 'leave']
soup #<a href="">I linked to <i></i></a>
tag = soup.a
tag.clear() # <a href=""></a>
a_tag = soup.a #<a href="">I linked to <i></i></a>
i_tag = soup.i.extract()
a_tag # <a href="">I linked to</a>
i_tag # <i></i>
a_tag = soup.a #<a href="">I linked to <i></i></a>
i_tag = soup.i
a_tag # <a href="">I linked to</a>
a_tag = soup.a #<a href="">I linked to <i></i></a>
new_tag = soup.new_tag("b")
new_tag.string = ""
a_tag.i.replace_with(new_tag) # <a href="">I linked to <b></b></a>
bold_tag = soup.new_tag("b")
bold_tag.string = "example"
i_tag = soup.new_tag("i")
i_tag.string = "net"
a_tag.b.replace_with(bold_tag, ".", i_tag) # <a href="">I linked to <b>example</b>.<i>net</i></a>
#<p>I wish I was bold.</p>
# <b>I wish I was bold.</b>
a_tag = soup.a #<a href="">I linked to <i></i></a>'
a_tag.i.unwrap() # <a href="">I linked to</a>
#to inject a tree into another
doc = BeautifulSoup("<document><content/>INSERT FOOTER HERE</document", "xml")
footer = BeautifulSoup("<footer>Here's the footer</footer>", "xml")
doc.find(text="INSERT FOOTER HERE").replace_with(footer)
header = soup.header
header.string = "blah"


soup.a.prettify() #just a subpart of the tree
#raw output
unicode_string = str(tag.string)
Note: The str() function returns a string encoded in UTF-8. See Encodings for other options. You can also call encode() to get a bytestring, and decode() to get Unicode.
If you need more sophisticated control over your output, you can use Beautiful Soup’s Formatter class:
from bs4.formatter import HTMLFormatter
formatter = HTMLFormatter(uppercase)

To find where elements are located in the source file:

soup = BeautifulSoup(markup, 'html.parser')
for tag in soup.find_all('p'):
    print(repr((tag.sourceline, tag.sourcepos, tag.string)))

To copy an element (which won't be part of the tree):

import copy
p_copy = copy.copy(soup.p)

To only parse and find certain elements:

from bs4 import SoupStrainer
only_a_tags = SoupStrainer("a")

To investigate what BS does:

from bs4.diagnose import diagnose
with open("bad.html") as fp:
    data =


Regardless of how it's encoded originally, when loaded into Beautiful Soup, it's converted to Unicode. Beautiful Soup uses a sub-library called Unicode, Dammit to detect a document’s encoding and convert it to Unicode.

from bs4 import UnicodeDammit
dammit = UnicodeDammit("Sacr\xc3\xa9 bleu!")
# Sacré bleu!
# 'utf-8'

Unicode, Dammit’s guesses will get a lot more accurate if you install one of these Python libraries: charset-normalizer, chardet, or cchardet.

If you have your own suspicions as to what the encoding might be, you can pass them in as a list:

dammit = UnicodeDammit("Sacr\xe9 bleu!", ["latin-1", "iso-8859-1"])
# Sacré bleu!
# 'latin-1'

You can check the encoding found by BS using "soup.original_encoding".

If you happen to know a document’s encoding ahead of time, you can avoid mistakes and delays by passing it to the BeautifulSoup constructor as from_encoding:

soup = BeautifulSoup(markup, 'html.parser', from_encoding="iso-8859-8")

When you write out a document from Beautiful Soup, you get a UTF-8 document, even if the document wasn’t in UTF-8 to begin with. If you don’t want UTF-8, you can pass an encoding into prettify():


To read

Internal xml module: minidom and ElementTree

"The ElementTree library was contributed to the standard library by Fredrick Lundh. It includes tools for parsing XML using event-based and document-based APIs, searching parsed documents with XPath expressions, and creating new or modifying existing documents."

"Python has two interfaces — minidom and Element Tree — probably because Element Tree was integrated into the standard library a good deal later after minidom came to be. The reason for this was likely its far more "Pythonic" API compared to the W3C-controlled DOM." (Source)

Python's ElementTree has only limited support for XPath. If you need more, try lxml.

Besides the Python implementation xml.etree.ElementTree, there is also a C implementation in xml.etree.cElementTree, which in Python3 is used automatically. The xml.etree.cElementTree module is now deprecated.

Python's XML module includes…

ET has two classes for this purpose - ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.

Issue I had while learning how to use minidom and ET:


Since they're largely compatible, any tutorial about Element(Tree) will do, not just the limited doco from lxml which assumes people already know ET.

"lxml is significantly faster [than ElementTree], can be used to parse HTML, and supports XPath. […] lxml is also easier to use with namespaces." (Source) lxml.etree versus ElementTree

"The lxml toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. lxml.etree follows the ElementTree API as much as possible, building it on top of the native libxml2 tree."

What's the difference between tree and root?

Note: In lxml at least, there's a bug when parsing an HTML through a filename rather than a file handle, with lxml adding "&#13;" before each carriage-return:

#BAD &#13;
tree = et.parse(INPUT,parser)
with open(INPUT) as tempfile:
        tree = et.parse(tempfile, parser=parser)

If need be, encoding/decoding can be specified: print(ET.tostring(root, encoding='utf8').decode('utf8')).

Here's how to find elements, and get their parents, which is need to delete the element:

for movie in root.findall("./foo/bar/[@multiple='Yes']..."):

The difference between iterfind() and findall() is that the former returns an iterator, and only searches through the tree as needed, while findall() first returns all the data.


It's possible to only get descendants under a given tag:

tag_name = "ellipse"
for descendant in root.iter(tag_name):

Dealing with namespaces is more convenient when using .iterfind(), which accepts an optional mapping of prefixes to domain names:

namespaces = {"": "","custom": ""}
for descendant in root.iterfind("g", namespaces):

findtext() and itertext() work on elements' text.

There is no move() method: You'll have to find + append + remove:

action = root.find("./genre[@category='Action']")
new_dec = ET.SubElement(action, 'decade')
new_dec.attrib["years"] = '2000s'
xmen = root.find("./genre/decade/movie[@title='X-Men']")
dec2000s = root.find("./genre[@category='Action']/decade[@years='2000s']")
dec1990s = root.find("./genre[@category='Action']/decade[@years='1990s']")


pip install lxml

Quick test

import lxml.etree as et
tree = et.parse("input.gpx")
root = tree.getroot()
#Retrieves direct children nodes of the root
for child in root:
    print(child.tag, child.attrib)
    #? print(root[0][1].text)


An XML file is made of elements (or "nodes"). Each element has a tag, and possibly attributes and text.

The ElementTree package consists of two classes: ElementTree (the whole structure) and Element (nodes).

You first need to read the input, either from a file or a string, have ET parse it and return a pointer to either the tree (ET.parse("myfile.xml") followed by tree.getroot()) or the root element directly (ET.fromstring()).

Once you have a pointer to the root element, you can navigate and modify the tree before writing the edited output back to a file.

Ways to get/set infos from an element:

To create a tree from code:

a = ET.Element('a')
b = ET.SubElement(a, 'b')
c = ET.SubElement(a, 'c')
d = ET.SubElement(c, 'd')
ET.dump(a) -> <a><b /><c><d /></c></a>


root = etree.HTML("<p>data</p>")

XPath ("XML Path Language and uses") has more features than ElementTree to find elements (ElementTree's Supported XPath syntax). "The .find*() methods are usually faster than the full-blown XPath support. They also support incremental tree processing through the .iterfind() method, whereas XPath always collects all results before returning them. They are therefore recommended over XPath for both speed and memory reasons, whenever there is no need for highly selective XPath queries."

"ElementTree objects have a method getpath(element), which returns a structural, absolute XPath expression to find that element:

c  = etree.SubElement(a, "c")
d2 = etree.SubElement(c, "d")
tree = etree.ElementTree(c)


"For ElementTree, the xpath method performs a global XPath query against the document (if absolute) or against the root node (if relative):

r = tree.xpath('/foo/bar')


"The XPath class compiles an XPath expression into a callable function. The compilation takes as much time as in the xpath() method, but it is done only once per class instantiation. This makes it especially efficient for repeated evaluation of the same XPath expression. Just like the xpath() method, the XPath class supports XPath variables:

root = etree.XML("<root><a><b/></a><b/></root>")
find = etree.XPath("//b")


"ElementTree supports a language named ElementPath in its find*() methods. One of the main differences between XPath and ElementPath is that the XPath language requires an indirection through prefixes for namespace support, whereas ElementTree uses the Clark notation ({ns}name) to avoid prefixes completely. The other major difference regards the capabilities of both path languages. Where XPath supports various sophisticated ways of restricting the result set through functions and boolean expressions, ElementPath only supports pure path traversal without nesting or further conditions."

lxml.etree vs lxml.objectify: The two modules provide different ways of handling XML. However, objectify builds on top of lxml.etree and therefore inherits most of its capabilities and a large portion of its API. lxml.etree is a generic API for XML and HTML handling. It aims for ElementTree compatibility and supports the entire XML infoset. It is well suited for both mixed content and data centric XML. Its generality makes it the best choice for most applications. lxml.objectify is a specialized API for XML data handling in a Python object syntax. It provides a very natural way to deal with data fields stored in a structurally well defined XML format. Data is automatically converted to Python data types and can be manipulated with normal Python operators. Look at the examples in the objectify documentation to see what it feels like to use it. Objectify is not well suited for mixed contents or HTML documents. As it is built on top of lxml.etree, however, it inherits the normal support for XPath, XSLT or validation.

Parsing HTML

If BeautifulSoup's UnicodeDammit doesn't solve an incorrect encoding declaration, ElementSoup makes use of the BeautifulSoup parser to build an lxml HTML tree from broken HTML.

E-factory makes it possible to quickly generate HTML pages and fragments:

from lxml.html import builder as E
from lxml.html import usedoctest
html = E.HTML(
                E.LINK(rel="stylesheet", href="great.css", type="text/css"),
                E.TITLE("Best Page Ever")
                E.H1(E.CLASS("heading"), "Top News"),
                E.P("World News only on this page", style="font-size: 200%"),
                        "Ah, and here's some more text, by the way.",
                        lxml.html.fromstring("<p>... and this is a parsed fragment ...</p>")
print lxml.html.tostring(html)

lxml.html.open_in_browser(lxml_doc) writes the document to disk and open it in the default browser.

lxml.html also supports working with links and forms, and cleaning HTML (removing embedded or script content, special tags, CSS style annotations, etc.)


Namespaces are required in XML files that contain data from different sources that might use elements with the same  name, eg. "name" as a way to know which one is meant each time. In simple XML files, namespaces are not required, and can be removed from the source file to make things easier.

Namespaces can be any string; It's a convention to use a URL, which can point to a document providing information about it.

Namespaces can be either set in the input file, or through ElementTree:

<Author xmlns:xsi="" xsi:type="Application_t">
#Default namespaces follow this format.
#Be sure to replace "URI" with the actual URI in your XML document.
ET.register_namespace('', "URI")
ET.register_namespace('xsi', "")


Some interesting methods:

class lxml.etree._Element

class lxml.etree._ElementTree


Note: The tree is binary data, while the root is string:

#with open(OUTPUFILE, 'wb') as writer:
with open(OUTPUTFILE, 'wt',encoding='utf-8') as writer:
       #TypeError: write() argument must be str, not bytes

To print the whole tree:

print(ET.tostring(root, encoding='utf8').decode('utf8'))

#makes no difference: Still binary dump

Displaying infos:

tree = etree.ElementTree(root)
tree.docinfo.public_id = '-//W3C//DTD XHTML 1.0 Transitional//EN'
tree.docinfo.system_url = 'file://local.dtd'

lxml also supports indenting:

etree.indent(root, space="    ")
etree.indent(root, space="\t")

Outputing XML, HTML, text:

print(etree.tostring(root)) #Default is XML
print(etree.tostring(root, method='html', pretty_print=True))
print(etree.tostring(root, method='text', encoding="UTF-8"))

To start from a clean plate:

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(filename, parser)

In case there are redundant namespaces:

parser = et.XMLParser(ns_clean=True,remove_blank_text=True)
tree   = et.parse(INPUTFILE, parser)

To write the tree to a file:

f = open('doc.xml', 'w')
f.write(etree.tostring(root, pretty_print=True))

Another way to get pretty printing is to use Tidy:

import tidy
f.write(tidy.parseString(your_xml_str, **{'output_xml':1, 'indent':1, 'input_xml':1}))

If ET complains about encoding, try this:

print(ET.tostring(root, encoding='utf8').decode('utf8'))

Finding elements

There are multiple ways to search for elements

An element acts like a list where the children are items in the list, eg. len(root) returns the number of elements below the root. Attributes are dictionaries.

find() returns a single element, if any, while findall() returns a dictionary.

findall() is part of the original ElementTree API. It supports a simple subset of the XPath language, without predicates, conditions and other advanced features. For instance, it doesn't allow the use of "/" ("SyntaxError: cannot use absolute path on element"). findall() returns only elements with a tag which are direct children of the current element.

Likewise, findall() doesn't support the "|" symbol to search for different tags, while xpath() does:

for el in root.xpath('.//tag1/*|.//tag2/*'):
    print(el.tag, el.text)

A simpler alternative:

for el in root.iter('tag1', 'tag2'):
    print(el.tag, el.text)

Note: find/findall/iterfind() methods are recommended over using xpath() because they are faster and support incremental searches, and also simplify namespace usage, ie. only use .xpath() for advanced queries.

Important: If findall() returns nothing although the query looks good, it might be an issue with the namespaces. In that case, either remove all namespaces in the input file, or change the search string.

Important: To check if find() found an element, use: if element.find('...') is not None.

lxml also offers two functions to get sibblings: getprevious()/getnext(). It also provides getparent().

CHECK Important: Checking if find/findall is not empty requires two different ways:

r = root.find('./Document/name')
#if et.iselement(r):
if r is not None:
tracks = root.findall('.//LineString')
if len(tracks):

This shows the top-level element in the tree:


This will return all the elements right below the root, not any deeper:

for child in root:
    print(child.tag, child.attrib)

To get all the elements in the tree…

for elem in root.iter():

… or only some elements, anywhere in the tree:

for movie in root.iter('movie'):

Find the first element that matches:

for wpt in root.find("wpt"):

Getting the text of the second attribute in a meta element in the head section:

description = root.xpath('string(//meta[@name="description"]/@content)')
if len(description):

find() and find() supports a sub-set of XPath, which provides a more powerful way to navigate a tree. Here's how to find all waypoints below the root in a GPX file:

for wpt in root.findall("./wpt"):
Searching with XPath

Note that xpath() returns a list, even if it found only one element:

element = template_tree.xpath('//myelement')
if len(element):
        html_tree = lxml.html.fragment_fromstring("<div>blah</div>", parser=lxml.html.HTMLParser())
        parent = element[0].getparent()


More infos:


"lxml supports an alternative API similar to the Amara bindery or gnosis.xml.objectify through a custom Element implementation. The main idea is to hide the usage of XML behind normal Python objects, sometimes referred to as data-binding. It allows you to use XML as if you were dealing with a normal Python object hierarchy."


Help Archives




Can I get rid of namespace infos while working with data?

The klugy way to remove namespaces from the source file is to run a regex through the source file, and read the result into the root

A cleaner way is to parse the XML, and then remove all references to the namespace(s):

# Remove namespace prefixes
tree = et.parse(INPUTFILE)
root = tree.getroot()
for elem in root.getiterator():
        #ValueError: Invalid input tag of type <class 'cython_function_or_method'>
        #et.tag = et.QName(elem).localname
        # For elements, replace qualified name with localname
        if not(type(elem) == et._Comment):
                elem.tag = et.QName(elem).localname
        # Remove attributes that are in a namespace
        for attr in elem.attrib:
                if "{" in attr:
# Remove unused namespace declarations
How to add text when using append()?

for waypoint in root.findall('gpx:wpt', namespaces=NSMAP):

        #How to set text?

        waypoint.append( ET.Element("dummy"))

Difference between .iter() and .findall()?

"Element.findall() finds only elements with a tag which are direct children of the current element."

Diff between root.write(ET.tostring()) and tree.write()?

with open("removed.time.gpx", 'wb') as doc:

        #Diff with tree.write('output.xml') ?

        root.write(ET.tostring(tree, pretty_print = True))

What's the difference between tree and root (parse vs. fromstring)?

parse() returns an ElementTree while fromstring() returns an Element.

What's the point of getroot()?

Needed with functions that return a whole ElementTree instead of a specific Element (node):

tree = et.parse("input.gpx")
root = tree.getroot()

"fromstring() parses XML from a string directly into an Element, which is the root element of the parsed tree. Other parsing functions may create an ElementTree." (Source)

"The getroot() method is available on xml.etree.ElementTree.ElementTree objects, not xml.etree.ElementTree.Element objects. ET.fromstring() returns the latter type. You already have the root element."


$ pip install fastkml (or "pip install -r requirements.txt" from the base of the source tree; To build KML files, FastKML requires Shapely which requires libgeos).

Note: If the input KML has something it doesn't like, fastkml (or lxml2?) might just complain with "ValueError" without saying which line it didn't like.

from fastkml import kml
k = kml.KML()
with open(kml_file, 'rt', encoding="utf-8") as myfile:
#features() returns a generator object that you can iterate over
for f in k.features():
features = list(k.features())
f2 = list(features[0].features())
print k.to_string(prettyprint=True)
with open(input) as f:
  doc = parser.parse(f)
root = doc.getroot()
folder = root.Document.Folder # parent of Placemark
for pm in folder.getchildren():
  print(pm.tag, pm.getparent().tag)
  if pm.tag == '{}Placemark':
    keep = False
    #zipcode = ''
    for sd in pm.ExtendedData.SchemaData.getchildren():
      if 'ZCTA5CE10' in sd.values():
              if sd.text in zipcodes:
                  #zipcode = sd.text
                  keep = True
    if not keep:
      removed += 1
      kept += 1
#doc.write('output.kml', xml_declaration=True, encoding='UTF-8')

More infos


"pyKML is based on the lxml.objectify API which provides a Pythonic API for working with XML documents. pyKML adds additional functionality specific to the KML language. pyKML depends on the lxml Python library, which in turn depends on two C libraries: libxml2 and libxslt. Given this, the first step to installing pyKML is to get lxml running on your system."

The XML parser is used to read an existing KML file (pykml.parser.parse), or write a KML object to a file (lxml.etree.tostring).

"For complete stand alone programs that demonstrate how to use pyKML, check out the pyKML Examples."

"This type of attribute-based access is provided by the lxml packages’s objectify API. pyKML users are encouraged to familiarize themselves with the objectify API documentation on the lxml website, because pyKML inherits this functionality."

"KML documents that you create can be validated against XML Schema documents, which define the rules of which elements are acceptible and what ordering can be used. Both the OGC KML schema and the Google Extension schemas are included with pyKML."


Install lxml

To check if lxml is installed, run Python, and type "import lxml"

If not:

Install PyKML

pip install pykml

Run Python, and type "import pykml"

Have PyKML create a ready-to-use script

from pykml.factory import write_python_script_for_kml_document

import urllib.request as urllib2 #urllib2 was used in Python 2

from pykml import parser

url = ''

fileobject = urllib2.urlopen(url)

doc = parser.parse(fileobject).getroot()

script = write_python_script_for_kml_document(doc)



from lxml import etree

from pykml import parser

from pykml.factory import KML_ElementMaker as KML

To validate:

from pykml.parser import Schema

schema_ogc = Schema("ogckml22.xsd")

schema_gx = Schema("kml22gx.xsd")

doc = KML.kml(GX.Tour())

#The .validate() method only returns True or False



#More details


Here's how to read a KML file, and remove an element:

from pykml import parser
from lxml import etree
with open("input.kml") as f:
  doc = parser.parse(f)
root = doc.getroot()
folder = root.Document.Folder.Placemark
for pm in folder.getchildren():
   #To prevent PyKML from prepending {},
   #use regex to remove namespace: <kml xmlns="">
   #if pm.tag=="ExtendedData":
    if pm.tag=="{}ExtendedData":
outfile = open("output.kml","wb")
outfile.write(etree.tostring(doc, pretty_print=True))

Here's how to read coordinates:

import lxml
#import pykml
from pykml import parser
with open('dummy.kml') as f:
    doc = parser.parse(f).getroot()
for e in doc.Document.Folder.Placemark:
  coor = e.LineString.coordinates.text.split(',')

Here's how to read from a file, and copy data into a new file:

from pykml import parser
from lxml import etree
from pykml.factory import KML_ElementMaker as KML
with open('input.kml') as f:
    tree = parser.parse(f)
root = tree.getroot()
coords = root.Document.Folder.Placemark.LineString.coordinates
doc = KML.kml(
outfile = open('output.kml','wb')
outfile.write(etree.tostring(doc, pretty_print=True))


"Unfortunately, simplekml is just a kml generator, it cannot read and manipulate existing kml, only create it. You will have to use an alternative, such as pyKML." (Source)

pip install simplekml


import simplekml
kml = simplekml.Kml() = "Test""botanicalgarden.kml")

How to remove id? <Document id="1">

How to read existing KML file, extract needed items (eg. Placemark), edit them, and save everything to a new KML file?

Working with GPX files with gpxpy

pip install gpxpy

Quick code:

import gpxpy
f = open(path_to_gpx_file, 'r')
p = gpxpy.parse(f)
print("{} track(s)".format(len(p.tracks)))

To read from a GPX file:

a = gpx.tracks[0]
b = a.segments[0]
c = b.points[1]
d = [c.longitude, c.latitude, c.elevation, c.time]

To create a new GPX file from scratch:

# Create first track in our GPX:
gpx_track = gpxpy.gpx.GPXTrack()
# Create first segment in our GPX track:
gpx_segment = gpxpy.gpx.GPXTrackSegment()
# Create points:
gpx_segment.points.append(gpxpy.gpx.GPXTrackPoint(2.1234, 5.1234, elevation=1234))
gpx_segment.points.append(gpxpy.gpx.GPXTrackPoint(2.1235, 5.1235, elevation=1235))
gpx_segment.points.append(gpxpy.gpx.GPXTrackPoint(2.1236, 5.1236, elevation=1236))
# You can add routes and waypoints, too...
print 'Created GPX:', gpx.to_xml()


"xmltodict is a Python module that makes working with XML feel like you are working with JSON".

Won't do if you need to add a key, but fine if you just need to read, and possibly change any value.

Notes from John E. Simpson's "XPath and XPointer" (2002)

XPath is used for locating XML content within an XML document; XPointer is the standard for addressing such content, once located.

As support for XPath is integrated into the Document Object Model (DOM), DOM developers may also find XPath a convenient alternative to walking through document trees.

"An XPath" consists of one or more chunks of text, delimited by any of a number of special characters, assembled in any of various formal ways. Each chunk, as well as the assemblage as a whole, is called an XPath expression.

Most XPath expressions, by far, locate a document's contents or portions thereof. These pieces of content are located by way of one or more location steps — discrete units of XPath "meaning" — chained together, usually, into location paths.

An XPath expression can be said to consist of various components: tokens and delimiters. The expression taxcut/* locates all elements that are children of a taxcut element.

XPath is capable of processing four data types: string, numeric, Boolean, and nodes (or node-sets).

Most nodes have names. Three important terms:

In a location path, the root node is represented by a leading / (forward slash) character.

There's an XPath function, normalize-space() that trims all leading and trailing whitespace from a given element's content.

Editing with XMLStarlet

Read xmlstarlet-ug.pdf

XMLStarlet is an open-source, command-line application that supports testing XPath queries.

Checking the structure: xml el input.xml



How to find the type of a variable/output?


UnicodeDecodeError: 'ascii' codec can't decode byte

Python uses Unicode internally, and may need some help when it can't successfully figure out which page code is used to encode a string:

except UnicodeDecodeError:
    except UnicodeDecodeError:

More information:

How to check for errors?

"Pylint analyzes Python source code looking for bugs and signs of poor quality."

How to enhance performance?

Why are strings immutable?

Read that question in a ng. Does it mean a string in Python is read-only?

What's the difference between "import mymodule" and "from mymodule import *"?

The former forces you to prepend the module's name to every membre, eg. mymodule.mymethod(), while the latter imports all the methods into the current namespace, letting you call the methods without the module name. Although easier to use, make sure those new methods don't clash with your current namespace...

Is there a native-code compiler for Windows?

Check out py2exe. Other sources of information are Distributing Python Apps and How can I create a stand-alone binary from a Python script? Also take a look at Psyco.

Py? Pyc? Pyd? Pyo? Pyw?

(From Boudewijn Rempt's book on PyQT): "The translation from Python code to byte-code only happens once: Python saves a compiled version of your code in another file with the extension .pyc, or an optimized compiled version of your code that removes assert statements and line-number tracking in a file with the extension .pyo.

However, that is only done with Python files that are imported from other files: the bootstrap script will be compiled to bytecode every time you run it, but python will create a myapp.pyc from a file (which is not shown here)."


"unindent does not match any outer indentation level" : If copy/pasting code from a web page, make sure there are not hidden characters that confuse Python.

Hiding the DOS box when running under Windows?

"Python.exe is used for console mode programs and Pythonw.exe is used for GUI applications that don't need a console window. Python.exe can also be used for GUI programs, but then you get a console window in addition to your GUI window(s)."

How to call a PowerBasic DLL from Python?

How to hide the console window when running an EXE generated by py2exe?


From VB to Python