Introduction to Python

Introduction

Python is an excellent, cross-platform, object-oriented interpreted language. Besides ease of use, its main characteristic is that it enforces indentation (don't indent, and the program won't run.)

As of Septembre 2004, there are weaknesses to be aware of if you intend to use Python to write GUI apps for Windows, though:

Python is originally a command-line, text-mode scripting language, so requires some add-on to build GUI apps. There is no integrated way to build Windows applications, so you must use and distribute a toolkit (most people use wxPython, which is a wrapper around the famous, open-source, cross-platform wxWidgets toolkit)
Unless you're ready to shell out $ to buy VS.Net ($?) and ActiveState's Visual Python plug-in ($300), you won't find an affordable IDE on par with VB or Delphi, which means you'll have to settle for GUI builders and IDE's that feel somewhat clunky (developing something like VS.Net is quite a lot of work)
Performance can be a problem for heavy-duty programs, in which case you might have to delegate the computing-intensive parts of your program to a DLL compiled into native code

Setup

At least three distributions of Python are currently available for the Windows platform (PythonWare used to be yet another package, but it's been deprecated):

the version from Python.org, which requires your downloading the Win32all extension yourself
ActivePython, which includes the Win32all extension, and an IDE (Note: Looks like their license restricts redistributions...)
the Enthought edition (don't know how active it is)

If you only need a basic distribution, try out Tiny Python.

The "import" statement looks for module files in the directories specified in the $PYTHONPATH environment variable. If the named module isn't found in these directories, it returns an error. The first time Python imports a module, it automatically compiles the module as saves it as bytecode; this bytecode file has the same name as the module file, but ends in a .pyc extension. These .pyc files are automatically recompiled if the module changes in any way.

"On Windows, you can also use extension .pyw and interpreter program pythonw.exe instead of .py and python.exe. The w variants run Python without a text-mode console, and thus without standard input and output. These variants are appropriate for scripts that rely on GUIs. You normally use them only when the script is fully debugged, to keep standard output and error available for information, warnings, and error messages during development."

If you are using UltraEdit as your favorite editor, here's the section to add in UE's wordfile.txt to handle Python documents.

Installing Python 2 and Python 3 on the same Windows host

C:\Python27\python.exe

C:\Users\fred\AppData\Local\Programs\Python\Python37-32\python.exe

Checking script syntax

Concepts

module

library

package

namespace

Wheel, .whl, "allows for binary redistribution of libraries"

pip

Why import a library twice?

eg.

import mylib

import mylib.lib

How to find the list of methods/properties a library offers?

help() and dir() are OK, they don't show the whole picture.
vars()
The inspect module
The pydoc module (python -m pydoc mymodule)
The ast module ("Abstract Syntax Trees")

How to uninstall a module?

c:\>pip list

c:\>pip uninstall somemodule

Data Structures

Array

All entries must be of the same data type.

import array as arr

a = arr.array("I",[3,6,9])

List

Collection of heterogeneous items. Mutable.

x = []

x1 = [1,'apple',3]

print(x1[1])

List vs. array?

"With arrays, you can perform an operations on all its item individually easily, which may not be the case with lists", eg.

array_char.tostring()

"NumPy arrays are very heavily used in the data science world to work with multidimensional arrays. They are more efficient than the array module and Python lists in general."

Tuples

"Tuples are another standard sequence data type. The difference between tuples and list is that tuples are immutable, which means once defined you cannot delete, add or edit any values inside it."

Tuples are enclosed in parentheses.

x_tuple = (1,2,3,4,5)

y_tuple = ('c','a','k','e')

x_tuple[0]

Dictionary

"Dictionaries are made up of key-value pairs. key is used to identify the item and the value holds as the name suggests, the value of the item."

Dictionaries are built with curly brackets.

x_dict = {'Edward':1, 'Jorge':2, 'Prem':3, 'Joe':4}

del x_dict['Joe']

x_dict

{'Edward': 1, 'Jorge': 2, 'Prem': 3}

x_dict['Edward'] # Prints the value stored with the key 'Edward'.

Sets

'Sets are a collection of distinct (unique) objects. These are useful to create lists that only hold unique values in the dataset. It is an unordered collection but a mutable one, this is very helpful when going through a huge dataset.'

y_set = set('COOKIE')

print(y_set) # Single unique 'o'

{'I', 'O', 'E', 'C', 'K'}

Collections, heapq

Those are additional data structures.

Code Snippets

Running an external program

Leaving for loop early

"break" or "continue"

Operators

Watch out when using shortcuts like += on large strings, as they seem to be much slowed than the more lengthy "mystring = mstring + something".

File I/O

Checking if a directory exists

Either...

import os

try:

os.mkdir("./mydir")

except:

pass

... or

import os

if not os.path.isdir("./mydir"):

os.mkdir("./mydir")

Writing to a text file

log = open('test.txt','w')

log.write("Some string")

log.close()

Caution: Under Windows, \r\n turns into 0D0D0A. To get the expected 0D0A, just use \n .

Important: Although Python3 uses Unicode, it happily writes data in Latin1 under Windows unless told otherwise:

stuff = "Crème"

with open("cp1252.txt", 'w') as outFile:

outFile.write(stuff)

with open("utf8.txt", mode='w',encoding='utf-8') as outFile:

outFile.write(stuff)

Reading from a text file in one go

f = open("c:/test.txt", "r")

data = f.read()

print data

f.close()

Reading from a text file, line by line

f = open("c:/test.txt", "r")

textlines = f.readlines()

for line in textlines:

print line

f.close()

Reading for a text file, edit each line, save into new file

import re,sys

MAGIC = 10

f = open("C:\\input.txt", "r")

textlines = f.readlines()

f.close()

#rewrite lines to new file

log = open('output.txt','w')

#search for pattern using regex

p = re.compile('^\{(.+?)\}')

for line in textlines:

m = p.search(line)

nugget = int(m.group(1))

nugget += MAGIC

#update line

start = str(start)

end = str(end)

new = "{%s){%s}" % (start,end)

line = p.sub(new,line)

#print line

#adds extra newline :-/

#print>>log, line

log.write("%s" % line)

log.close()

Finding if a file is missing from a directory

We'll read a list of files from a text file, and then check if the file exists:

import os.path

PATH="C:\\MYDIR\\"

f = open(PATH + "files.txt", "r")

textlines = f.readlines()

for line in textlines:

line = line.strip()

if not os.path.isfile(PATH + line):

print "%s NOT FOUND" % line

f.close()

Append stuff to a text file

A first way is to open a file in "a" mode:

f = open("c:/test.txt", "a")

f.write("This is an appended line.\r\n")

f.close()

Another way:

import glob

f = open("stuff.to.add.txt", "r")

template = "\n\n" + f.read()

f.close()

for frm in glob.glob('*.txt'):

f = open(frm, "r+")

content = f.read()

if 'my pattern' not in content:

f.seek(0,2)

f.write(template)

f.close()

Checking that a file exists

Either...

import os

if os.path.exists(file):

return 1

else:

return 0

... or

import os

def exists(file):

return os.access(file, os.F_OK)

Checking the size of a file

import os

print os.stat(file)[ST_SIZE]

Displaying the last modified date of a file

os.stat() returns the date a file was last modified in epoch, ie. the origin of times being the number of seconds since January 1st 1970. To turn an epoch into eg. YYYY-MM-DD:

filetime = os.stat('myfile.txt')[ST_MTIME]

#turns epoch into tuple such as (2004, 8, 13, 2, 35, 2, 4, 226, 0)

filetime = time.gmtime(filetime)

#turns tuple into formatted string

print time.strftime("%Y-%m-%d",filetime)

Reading a value from a key in a section of an INI file

import ConfigParser

p = ConfigParser.ConfigParser()

p.readfp (open('index.ini'))

try:

print p.get('files',file)

except:

print "section 'files' not found"

else:

print "ok"

Reading all the key/value items in a section in an INI file

import ConfigParser

p = ConfigParser.ConfigParser()

p.readfp (open('index.ini'))

for item in p.items('files'):

print("key = " + item[0] + " value = " + item[1])

Writing data to an INI file

Oddly enough, the ConfigParser doesn't have a write() method, so you need to read the INI file, make the changes in memory, open the file in write mode, and write to it:

def writeini(file,size):

p = ConfigParser.ConfigParser()

p.read('index.ini')

p.set('files', file, size)

fp = open('index.ini','w')

p.write(fp)

fp.close()

writeini("mykey","myvalue")

Setting the current directory

import os

os.chdir('./mydir')

Looping through each file in a directory

import glob

for file in glob.glob('*.htm*'):

Note: On the Windows platform, glob() mixes forward- and backslashes, while open() doesn't allow backslashes altogether ("IOError: [Errno 2] No such file or directory: '.\\mydir\myfile.txt' ".)

Reading information from MS Word files

import win32com.client

app = win32com.client.Dispatch('Word.Application')

doc = app.Documents.Add('c:\\stuff.doc')

for rev in doc.Revisions:

print rev.Author

Using SQLite as file-based database

Python3

When reading data from SQLite3 (which is saved in UTF-8/16), and saving them into a plain text file, Python3 uses the locale as default, eg. cp1252. To save data as UTF-8, make sure you use the following switch:

import sqlite3

con = sqlite3.connect('input.sqlite')

con.row_factory = sqlite3.Row

cur = con.cursor()

cur.execute("SELECT name FROM table1");

results = cur.fetchall()

output = open("output.txt", "w", encoding='UTF-8')

for row in results:

NUMBER=int(row["NUMBER"])

output.write(NUMBER)

output.close()

conn.close()

Python2

Several wrappers are available to access SQLite from Python, but two stand out: "pysqlite implements Python's DBAPI and was integrated into Python [2.5]. There is another wrapper, APSW ("Another Python SQLite Wrapper"), which is thinner and closer to SQLite's C API."

Note:

"If you choose pysqlite be aware that it will start transactions behind your back and can produce the wrong error message if you start your own transactions.
APSW works with Python 2.5, is usually a bit faster than pysqlite and is more frequently updated - it's not tied to the DBAPI."
Both pysqlite and APSW are hosted at http://initd.org/tracker/pysqlite (or http://pysqlite.org)

Installing APSW: Just run the EXE that matches your version of Python, eg. apsw-3.3.13-r1.win32-py2.5.exe

Here's how to display information:

import os, sys, time

import apsw

print "Using APSW file",apsw.__file__

print "APSW version",apsw.apswversion()

print "SQLite version",apsw.sqlitelibversion()

Here's how to play with SQLite:

if os.path.exists("dbfile"):

os.remove("dbfile")

connection=apsw.Connection("dbfile")

cursor=connection.cursor()

cursor.execute("begin")

cursor.execute("create table foo(x,y,z)")

cursor.execute("insert into foo values(1,2,3)")

cursor.execute("insert into foo values(4, 'five', 6.0)")

cursor.execute("commit")

for row in cursor.execute("select * from foo"):

print row

for m,n,o in cursor.execute("select x,y,z from foo"):

print m,n,o

connection.close(True)

Another example of using APSW (reading a tab-delimited text file to insert books into SQLite)

import re, apsw

connection=apsw.Connection("books.sqlite")

cursor=connection.cursor()

sql = "CREATE TABLE IF NOT EXISTS books (id INTEGER PRIMARY KEY, isbn VARCHAR, box VARCHAR, title VARCHAR)"

cursor.execute(sql)

f = open("books.tsv", "r")

textlines = f.readlines()

f.close()

#Extract ISBN + box

p = re.compile('^(.+)\t(\d+)$')

for line in textlines:

m = p.search(line)

if m:

isbn = m.group(1)

box = m.group(2)

sql = "SELECT COUNT(isbn) FROM books WHERE isbn='%s'" % isbn

cursor.execute(sql)

for row in cursor.execute(sql):

#Record not found -> Insert

if not row[0]:

print "No record found for ISBN " + isbn

cursor.execute("INSERT INTO books (id,isbn,box) VALUES (NULL,?,?)", (isbn,box))

connection.close(True)

Here's how to perform an INSERT and display the values for each column:

cursor.execute("INSERT INTO person (name, address, tel, web, email)
VALUES (:name, :address, :tel, :web, :email)", locals())

Here's how to safely update/insert data and display the resulting query:

sql = 'UPDATE companies SET name=?,address=?,zip=? WHERE id=?;'

try:

cursor.execute(sql, (name,address,zip,id) )

except:

print "Failed UPDATING"

raise

Using regular expressions

Here's how to loop through a list of web pages, and check whether a given pattern is found therein:

import sys

import urllib

import re

for i in range(1,10):

f = urllib.urlopen("http://www.acme.com/index.asp?page=%s" % i)

#re.I = ignore case

if re.search('stringtofind',f.read(), re.I):

print "Found in %s" % i

Another way to do this:

p = re.compile('stringtofind')

if p.search(f.read()):

print "Found"

else:

print "Not found"

Here's how to compile a regex, find a pattern, and save it to a file:

p = re.compile('(<some>.+</some>)',re.DOTALL)

m = p.search(inputdata)

if m:

inputdata = m.group(0)

else:

print("Pattern not found")

sys.exit()

inputdata = inputdata.replace('<other>','<yet>')

with open(output, 'w') as outputfile:

outputfile.write(inputdat)

Here's how to load a web page, isolate a section, and display it (Note: you cannot call f.read() twice, hence the copying of the page into the 'page' variable):

log = open('found.txt','w')

for i in range(1,10):

f = urllib.urlopen("http://www.acme.com/index.asp?page=%s" % i)

print "Checking page %i" % i

page = f.read()

if re.search('some text',page, re.I):

m=re.search('"(.+?)"',page,re.I)

if m:

log.write("Found in %s\n" % i)

log.flush()

log.close()

Here's how to read an HTML file, and display the string between the TITLE tags, in any:

import re

f = open('myfile.html', "r")

inputfile = f.read()

f.close()

m = re.search('<title>(.*?)</title>',inputfile,re.I)

if m:

print m.group(1)

... or if you need to extract more than one set of items:

p = re.compile('blabla (.+?) blabla (.+?)')

packed = p.findall(inputfile)

if packed:

for x in packed:

print "Item 1 = " + x[0] + " Item 2 = " + x[1]"

If you need to call a regex a great number of times, you can increase performance by compiling the search pattern:

p = re.compile('[0-9]+')

m = p.search('tempo999')

print m.group(0)

To replace an item with another item, use re.sub():

print re.sub('john','jane','john doe')

Note that re.sub() is very much slower than using a string's replace() method:

stuff = stuff.replace('_',' ')

Also, the string with which to replace the pattern to search for must have its backslashes escaped prior to calling re.sub(), using the r prefix to indiquate a raw string (ie. with its backslashes treated as regular characters):

toreplace = r"\\"

body = "#"

print re.sub("#",toreplace,body)

If you wish to tell the re module to treat the replace pattern as is even when it contains backslashes, add a call to its escape() function:

toreplace = re.escape(r"\\")

body = "#"

print re.sub("#",toreplace,body)

Here's how to rewrite a phone number:

#!/usr/bin/python

import sys,re

#Turn 0123456789 into 01.23.45.67.89

p = re.compile(r'(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)')

phone = p.sub(r'\1.\2.\3.\4.\5',sys.argv[1])

print phone

Important: By default, the regex library can't handle even European characters, so you must set a locale, and add the re.LOCALE switch:

import locale

#BAD : Let Python handle it locale.setlocale(locale.LC_ALL, 'FR')
locale.setlocale(locale.LC_ALL, '')

mypattern = re.compile("(\d+)\s+(\w+)\s+(\d+)",re.LOCALE)

More infos on using regexes in Python:

Driving a web browser

As of April 2021, there are at least two modules to manage a web browser through a Python script: the webbrowser module, and the Selenium module. mechanize might be too basic.

Selenium

https://towardsdatascience.com/controlling-the-web-with-python-6fceb22c5f08

#pip3 install -U selenium

#pip3 install webdriver-manager

from selenium import webdriver

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.chrome.service import Service

from webdriver_manager.chrome import ChromeDriverManager

options = Options()

options.add_argument("start-maximized")

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

driver.get("https://www.google.com")

webbrowser module

https://devtut.github.io/python/webbrowser-module.html

Connecting to a web server

Here's how to use urllib to POST to a script:

import urllib

url = "http://www.acme.com"

data = {'myfield': somevalue}

urldata = urllib.urlencode(data)

results = urllib.urlopen(url, urldata).read()

print results

Here is an example session that uses the 'GET' method to retrieve a URL containing parameters:

import urllib

params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})

f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)

print f.read()

The following example uses the 'POST' method instead:

import urllib

params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})

f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query ", params)

print f.read()

The following example uses an explicitly specified HTTP proxy, overriding environment settings:

import urllib

proxies = {'http': 'http://proxy.example.com:8080/'}

opener = urllib.FancyURLopener(proxies)

f = opener.open("http://www.python.org")

f.read()

The following example uses no proxies at all, overriding environment settings:

import urllib

opener = urllib.FancyURLopener({})

f = opener.open("http://www.python.org/")

f.read ()

Here's how to use Libcurl to POST to a script:

Install Python and Libcurl (eg. libcurl-7.16.2-win32-ssl-sspi.zip)
Install PyCurl
Use this script:

Here's how to log on to a web server through POST with support for cookies:

urllib vs urllib2 vs httplib

cookielib vs. ClientCookie http://www.voidspace.org.uk/python/articles/cookielib.shtml

Playing with date/time

Here's how to display the current date and time:

import time

import locale

#displays '08/20/04 22:05:15'

print time.strftime('%c')

#displays 'French_France.1252'

print locale.setlocale(locale.LC_ALL,'')

#displays '20/08/2004 22:05:15'

print time.strftime('%c')

The time value as returned by gmtime(), localtime(), and strptime(), and accepted by asctime(), mktime() and strftime(), is a sequence of 9 integers. The return values of gmtime(), localtime(), and strptime() also offer attribute names for individual fields.

Lists

Tuples

Dictionaries

Printing the content of each key:

for i in stuff.keys():

print i + "=" + stuff[i]

Commenting a block of text

"""

This is one

block of text

"""

ie. three double-quotes in a row.

Exiting a script

import sys

sys.exit()

Handling a long line of code

To break a long line of code:

if (somevar) or \

(someothervar):

Sending an e-mail

Here's how to send an e-mail through code, passing one parameter to the script:

#!/usr/bin/python

from email.MIMEText import MIMEText

import smtplib,sys

body='''this text will become the body of the message

Using triple-quotes you can span it easily over multiple lines.

the result of an action'''

msg = MIMEText(body)

From = "me@acme.com"

To = "you@acme.com"

msg['From'] = From

msg['To'] = To

msg['Subject'] = "Call from " + sys.argv[1]

server = smtplib.SMTP("smtp.isp.net")

server.sendmail(From,[To],msg.as_string())

server.quit

Tips from the Python Tutorial

Calling a non-COM DLL from Python

Calling a COM DLL from Python

Shortcut to the last result

In interactive mode, the last printed expression is assigned to the variable _. This means that when you are using Python as a desk calculator, it is somewhat easier to continue calculations, for example:

>>> price * tax

12.5625

>>> price + _

113.0625

Long lines

If a statement or string is too long to fit on a line,use the backslash:

Note that whitespace at the beginning of the line is\

significant."

You can also use """ or ''' :

print """

Usage: thingy [OPTIONS]

-h Display this usage message

-H hostname Hostname to connect to

"""

Strings

Unlike a C string, Python strings cannot be changed. Assigning to an indexed position in the string results in an error.

Lists

Unlike strings, which are immutable, it is possible to change individual elements of a list:

a = [’spam’, ’eggs’, 100, 1234]

a[2] = a[2] + 23

[’spam’, ’eggs’, 123, 1234]

Variable number of function parameters

When a final formal parameter of the form **name is present, it receives a dictionary containing all keyword argu-ments whose keyword doesn’t correspond to a formal parameter. This may be combined with a formal parameter of the form *name (described in the next subsection) which receives a tuple containing the positional arguments beyond the formal parameter list. (*name must occur before **name.) For example, if we define a function like this:

def cheeseshop(kind, *arguments, **keywords):

Importing modules

There is even a variant to import all names that a module defines:

from fibo import *

This imports all names except those beginning with an underscore (_).

Modules

The built-in function dir() is used to find out which names a module defines. It returns a sorted list of strings. Without arguments, dir() lists the names [ie. variables and functions] you have defined currently.

page 42

Compiling

An easy and satisfactory way to distribute your Python script on a Windows host is to compile it with Py2exe (which analyses your script, and tries to extract all the required modules into a ZIP file), and combine the different files using either a standard installer like InnoSetup or NSIS, or combine all the files into the main EXE generated by py2exe using PE Bundle which will extract those extra files at runtime transparently:

Install py2exe, and write a setup script (call it setup.py):

from distutils.core import setup
import py2exe

setup(console=["myapp.py"])
Open a DOS box, and run the following: python setup.py py2exe
A directory named ./dist is create by py2exe, and contains all the files that are required to run your script on a bare Windows host. You can remove the ./build directory (temp stuff)
Combine those few files into a single EXE using either your favorite installer, or PE Bundle
More information available on py2exe

An alternative to py2exe is PyInstaller: "PyInstaller is a program that converts (packages) Python programs into stand-alone executables, under Windows, Linux and Irix. [...] PyInstaller is an effort to rescue, maintain and further develop Gordon McMillan's Python Installer (now PyInstaller). Their official website is not longer available and the original package is not longer maintained. Believing that it is still far superior to py2exe, we have setup this site to continue its further development."

First, read the following to understand the issue of compiling and/or distributing Python scripts:

Distributing Python Modules and Installing Python Modules by Greg Ward
Python Installers
Can Python be compiled to machine code, C or some other language?
[Nick Patavalis in comp.lang.python] "Python needs drastic performance improvement if it is to scrap-off the "scripting language" stigma. The only way to get these improvements is making it possible for a python implementation to produce *efficient* *compiled* code. At the same time the dynamic-typing nature of the language is one of its most valuable characteristics. And this is one of the hardest problems when trying to write a decent python compiler.

If you define a function like:

def sum (a, b):
return a + b

How can the compiler know what code to produce? It could trace all the applications of sum(), and decide what types of arguments sum() is actually applied on. But this is not easy, and sometimes it is straight-out impossible."
"A possible way to reduce the time required to load the Python interpreter on a Windows box, is to load the DLL during boot time since Windows have a list of "KnownDLLs" that it pre loads at startup time and they stay in memory all the time: Try adding the python23.dll to:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\KnownDLLs
Compiling Python Code by Fredrik Lundh
"py2exe will make you a windows native exe to run. and for updates you just need to update it's library.zip. use NSIS for the installer, it can do anything you want an it's free.

One tip: make sure your three top-level scripts are as empty as possible: just import some other module and call a function. All your active code is then in some library.zip shared between the three, and you need never change alice.exe, bob.exe, and carol.exe

To elaborate, this approach has worked well for me when I've needed to release multiple .exe's together:

First make a "dispatching" module like this (call it run.py)

if __name__ == '__main__':
  app = sys.argv[0].lower()

  if app.find('alice') != -1:
    import alice

    alice.main()

  elif app.find('bob') != -1:

    import bob

    bob.main()

  etc...

Next, use py2exe to create an executable for run.py, and then copy run.exe to alice.exe, bob.exe, and carol.exe (the resulting .exe's will be very small). The reasoning behind this is that this way only one copy of all the DLLs, Python std library, and so forth need to be included in your release. "

Pyco

http://www.pythonapocrypha.com/projects/pyco/
"Pyco (pronounced like pico) is a program that creates standalone (single-file) Windows executables from your Python programs, with an emphasis on the finished executable being as small as possible. It does not require you to have a C compiler."

Psyco

Python JIT(Just-In-Time)-like compiler to accelerate Python programs on 386-compatible processors (x2 to x100 speed up !).
http://psyco.sourceforge.net/

setuptools

"setuptools () is a collection of enhancements to distutils which let you build .egg files. Once you start using egg files you can include dependencies between package versions and if your product requires a bunch of other packages the installation step will download and install the appropriate versions.

See http://peak.telecommunity.com/DevCenter/EasyInstall for instructions on installing packages built in this way, but in short, the user has to run ez_setup.py from the EasyInstall page, and then a command like:

easy_install http://example.com/path/to/MyPackage-1.2.3.tgz

would download and install your package and all the other products it depends on. If at a later stage they want to upgrade to a more recent version then all they need to do is to run:

easy_install --upgrade MyPackage

Installed eggs usually exist in a single file (importable zip) which makes uninstalling especially easy: just one file to delete."

py2exe

py2exe is a Python distutils extension which converts python scripts into executable windows programs, able to run without requiring a python installation.

Install py2exe
Create a script
Run the script including the -w (Windows) option to hide the DOS box that Python opens even when running a GUI application
Distribute the resulting .EXE and its dependent DLLs, or generate an installer

Note that even a no-thrill window developed with the wxPython toolkit with just a tiny menu bar that displays a dialog box, turns into a 300KB EXE, and requires 4 binaries for a total of 2.5Meg (and that's after compressing the four dependencies with UPX).

For information, internally, Python source code is always translated into a "virtual machine code" or "byte code" representation before it is interpreted (by the "Python virtual machine" or "bytecode interpreter"). In order to avoid the overhead of parsing and translating modules that rarely change over and over again, this byte code is written on a file whose name ends in ".pyc" whenever a module is parsed (from a file whose name ends in ".py").

When the corresponding .py file is changed, it is parsed and translated again and the .pyc file is rewritten. There is no performance difference once the .pyc file has been loaded (the bytecode read from the .pyc file is exactly the same as the bytecode created by direct translation). The only difference is that loading code from a .pyc file is faster than parsing and translating a .py file, so the presence of precompiled .pyc files will generally improve start-up time of Python scripts.

If desired, the Lib/compileall.py module/script can be used to force creation of valid .pyc files for a given set of modules. Note that the main script executed by Python, even if its filename ends in .py, is not compiled to a .pyc file. It is compiled to bytecode, but the bytecode is not saved to a file.

McMillan Installer

Similar to py2exe, but all the exe/dll/pyd files are grouped into a single EXE file.
http://www.mcmillan-inc.com/ : (Posted by PaulBaranowski onTuesday, April 27) "Unfortunately, Gordon has come under some hard financial times and his web site is not up at the moment. So I have offered to host the program here!"

Freeze

If you are looking for a way to translate Python programs in order to distribute them in binary form, without the need to distribute the interpreter and library as well, have a look at the freeze.py script in the Tools/freeze directory [find it in the ActivePython distribution; Guess this refers to the standard Python distro.)

This creates a single binary file incorporating your program, the Python interpreter, and those parts of the Python library that are needed by your program. Of course, the resulting binary will only run on the same type of platform as that used to create it.

"There is a tool called freeze that is included with Python that does this. I havn't done it on Windows yet and I have heard that there are some tricks or potential problems with it. Check the back-postings at dejanews for details. Basically it scans you code for all imported modules and builds a C module that has all the compiled python modules encoded within it. Then you compile and linke this file against the Python library and you end up with an executable that along with any binary extension modules you may need will be a distributable version of your program."

cx_Freeze

"The Freeze utility that comes with Python itself requires a source distribution, a C compiler and linker which makes for a complex environment for creating executables. In addition, this method is very slow for creating executables as compared to the other methods. py2exe is intended for development on Windows only and cx_Freeze is intended for cross platform development. Installer uses an import hook which means that the development environment and runtime environment are considerably different."

SQFreeze

"Based upon Fredrik Lundh's famous squeeze module, I added some code to create small stand-alone .exe's which can be used like "Python-applets". It is currently known as SQFREEZE. The system dll files are still required (so your users must stillhave Python installed), but the exe file can run from alone an doesn't need to unpack any modules."
"Latest news (07-Feb-99)" ?

Pyrex

Pyrex lets you write code that mixes Python and C data types any way you want, and compiles it into a C extension for Python
http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/

PyPy

"The PyPy project aims at producing a simple runtime-system for the Python language. We want to express the basic abstractions within the Python Language itself. We later want to have a minimal core which is not written in Python and doesn't need CPython anymore. We want to take care that technologies such as PSYCO and Stackless will easily integrate."

PyInline

The PyInline module allows you to put source code from other programming languages directly "inline" in a Python script or module
http://pyinline.sourceforge.net/

Py2Cmod

Designed to aid in the conversion of Python modules to C extension modules while keeping the same interface into the module.
http://sourceforge.net/projects/py2cmod/

Weave

The weave package allows the inclusion of C/C++ within Python code
http://www.scipy.org/documentation/weave/

pyPack

UPX/PKLite-like text-only compression of Python programs for compact and easy distribution over identation-breaking medias (like web forums).
http://sebsauvage.net/python/

SWIG

"SWIG is a software development tool that connects programs written in C and C++ with a variety of high-level programming languages. SWIG is primarily used with common scripting languages such as Perl, Python, Tcl/Tk, and Ruby, however the list of supported languages also includes non-scripting languages such as Java, OCAML and C#."

distutils

PyChecker

"PyChecker is a tool for finding bugs in python source code. It finds problems that are typically caught by a compiler for less dynamic languages, like C and C++. It is similar to lint."

IDEs

As of 2018, I tried a few of them (IDLE, MS Visual Studio, PyCharm, Wing, Eric), and found PyScripter to be the simplest to install and use.

PyScripter

A fast editor for Windows written in C++ (Delphi?)
http://code.google.com/p/pyscripter/

Main page; Support

To make the IDE actually… readable, choose View > Styles > Windows 10

To set a keyboard shortcut to toggle comments (Source Code > Toggle Comment), use Tools > Options > IDE Shortcuts; I used CTRL+SHIFT+B (B as "block"), since it was available.

Tools > Configure Tools: To use the current script work directory as the Working directory, use "$[ActiveDoc-Dir]"

Instead of the internal Python interpreter, use the external interpreter, so you can easily kill a rogue application if need be.

Q&A

What causes "Remote Interpreter Reinitialized"?'

How to change encoding, so that UTF-8 strings are displayed correctly (eg. "Ã©" "é")?

How to remove "greyed out" lines when script stopped running due to error? Can't read code. Nothing in "Run".

How to hide black vertical line in middle of editing window?

Tools > Options > Editor Options : Edge column = 0

How to add items from Tools > Tools into user toolbar?

1. Right-click on User Toolbar

2. Customize

3. Commands tab

4. External Tools

5. Select item, and drag 'n drop it to the User Toolbar.

Others

IDLE, "Python's very own minimal IDE"; It's part of Python, so just open a terminal and type "idle.bat"; Failed running a minimal .py script. Next
MS Visual Studio Code + Python extension; Too slow to start running Python script. Next
MS Visual Studio Community IDE + PTVS: Canceled install after an hour. Next.
PyCharm: 190MB download, 500MB install. Next.
Wing IDE: Slow to start; Failed with "UnicodeEncodeError: 'charmap' codec can't encode characters in position 52-53: character maps to <undefined>". Next.
Komodo IDE: Commercial. Next.
Eric (6-18.08): If running 32-bit Windows, you might have to downgrade using "pip install PyQt5==5.10.1 QScintilla==2.10.4" before running "python install.py". Next.
Thonny:
Eclipse + PyDev:
Sublime Text:
Atom:
Spyder:

OLD If you prefer to use an IDE instead of a basic text editor, here are the choices I would recommend:

If you prefer to use a tool that uses the Windows' native set of widgets, choose ActiveState Komodo, the ActiveState Python plug-in for VS.Net, or PyDev
For wxWidgets, use SPE or Boa
QT, check out BlackAdder or eric3
The FOX widget set, Arachno, but as of August 2004, only the version that supports Ruby is yet available
I hate Tkinter :-)

Bigger list here

PyDev

http://pydev.sourceforge.net

PyPe

Free, based on the Scintilla editor but with auto-completion added
http://pype.sourceforge.net/

DrPython

Free, auto-completion supported
http://drpython.sourceforge.net/

Programmer Studio

http://www.programmerstudio.com/
$299

ActiveState

Komodo

http://www.activestate.com/Products/Komodo/
Commercial. Remember to run the license, which is actually an EXE
Includes the Visual Python plug-in for the VS.Net IDE
Slow on a P3

Visual Python

Python plug-in for VS.Net
http://www.activestate.com/Products/Visual_Python/?_x=1

BlackAdder

Commercial IDE from the KDE consertium to build Python GUI apps using the PyQT wrapper to the QT set of widgets
Includes the QT Designer, but requires that you install Python and PyQT before installing BlackAdder
http://www.thekompany.com/products/blackadder/
Not as polished as VB (not told to install Python and PyQT beforehand, although the FAQ seemed to say it was included in BlackAdder "Do I have to use the versions of PyQt and Qt embedded in BlackAdder?"; The IDE used some non-standard font in 14 points, etc.)
Don't know how integrated the BlackAdder IDE and QT Designer are

The eric3 Python IDE

http://www.die-offenbachs.de/detlev/eric3.html
Free; Written in PyQt using the QScintilla editor widget ("In order to use eric3 under Win... operating systems you need a commercial or educational license of Qt 3.x from Trolltech and the corresponding license of PyQt (which includes QScintilla) from Riverbank.")
"eric is a GPL'ed IDE for Python written using PyQt. eric used to be bundled with PyQt but is now available separately"
Programming Tools: eric3

SPE - Stani's Python Editor

http://spe.pycs.net/
"Spe ships with wxGlade (gui designer), PyChecker (source code doctor) and Kiki (regular expression console). Spe is extensible with wxGlade."
As of Aug 2004, still some bugs to iron out, but is promising

Boa Constructor

http://boa-constructor.sourceforge.net/
wxWidgets-based IDE + GUI designer
While the latest official version is 0.2.3, 0.2.8 can be downloaded here (unzip under drive:\Python2x\Lib\site-packages\wxPython\tools\)
Launched with c:\python22\python c:\Python22\Lib\site-packages\wxPython\tools\boa\Boa.py if you are using the Python from Python.org. If using ActivePython, search for the boa.py script a little deeper

FOX

Arachno

http://www.python-ide.com/
Starting at USD 59$
"What is the expected release date of the final version ? We try to get it out in the first quarter of 2005"
Built with the Open Source FOX GUI Toolkit

PythonWin, a.k.a. PyWin32

Free, no thrill editor
PythonWin is a badly-chosen name, as it stands for either a Python wrapper around the MFC framework, or the IDE that ships with it and is (obviously) written using this wrapper. It's been renamed PyWin32
Available either from Mark Hammond's extension to python.org's packaged version of Python, or as part of the ActiveState distribution of Python

WPY

ftp://ftp.python.org/pub/python/wpy/
uses the TK widgets for Unix, but native look-and-feel on Windows and is based on the Microsoft Foundation Classes model
Files dated 14 june 2001

Pmw

http://pmw.sourceforge.net/doc/
Pmw is a toolkit for building high-level compound widgets in Python using the Tkinter module.

Idle

Integrated DeveLopment Environment for Python
http://www.python.org/idle/
Free, GUI written in TK... feature-poor

Wing

http://wingide.com/wingide
GUI in Motif, and possibly the Scintilla source code edit box (great expand/collapse feature)
Commercial

Writing GUI apps

Some infos

Below is a list of tools to let you build GUI applications. Most are just wrappers around a set of widgets such as Windows' native widgets, wxWidgets, or QT, bringing you back to the days of Windows programming Petzold-style (Mmm...), but some also offer a GUI designer lilke VB, ie. you can draw the windows interface with the mouse. You can read more in the page Gui Programming on the Python site.

Note that the WYSIWYG GUI designer that feels most like VB's is QT Designer, which you can get either directly from QT or by buying the BlackAdder IDE.

Alternatively, you could also use a GUI desiging tool such as the antiquated MS Dialog Editor or its more modern equivalents, just to draw the interface with the mouse and get the coordinates for each widget, and copy/paste this into code. Here are some suggestions I got:

Use a drawing software like SmartDraw or Visio, both of which have libraries with GUI widgets so you can build a mock-up
VB, Delphi, or VC++ to draw the interface
The Windows Platform SDK claims to provide a dialog editor
Purebasic has a nice gui designer (Visual Designer), and is quite reasonable at 59 euro. Purevision is a separate tool that costs just $25. I don't think it requires Purebasic
PB Forms from PowerBasic
Choose Your GUI Toolkit
Python can be called from Delphi: "Python for Delphi is a set of components that wrap up the Python Dll into Delphi. They let you easily execute Python scripts, create new Python modules and new Python types. You can create Python extensions as Dlls and much more.", Using Delphi and Python together
wxWindows vs. TclTk

Here are the widgets and/or Python wrappers those GUI designers may require:

Designing the UI as a resource

"Quick side note: depending on your GUI needs, ctypes can be a pretty easy way to go. Create your GUI as resources (e.g. in MS Visual Studio) and wrap them into a tiny DLL. Then use ctypes to load them at runtime and run CreateDialogIndirect. Most of the work involved is simply looking in header files for the values of various Win32 messages and constants, but once you do it the first time you can re-use much of the code over and over."

MFC

PythonWin, a.k.a. PyWin32 is not only an IDE, but also an MFC wrapper so you can build Win32 apps without any extra widgets set. Take a look at the samples under Drive:\Python23\Lib\site-packages\

PythonWin offers the following modules to wrap the Win32 APIs:

win32*, such as win32service, win32api, etc.
pywin*.mfc, such pywin.mfc, pywintypes, etc.
msvcrt
ntsecuritycon
dde
pythoncom
etc.

Note that Python Win32, a.k.a. Win32all, is part of the ActivePython package, so if you use ActivePython instead of the standard Windows version of Python, Python Win32 is already installed.

More information:

A CHM help file is available in Python2x\Doc\, and scripts can be found in Python2x\Lib\site-packages\
Python Win32
Python support for Windows, and Python Programming on Win32 using PythonWin by Mark Hammond
Python with COM - Get at your Office Data by Christian Tismer
Python and COM
Some demo code in C:\Python25\Lib\site-packages\
An example of a basic dialog/window in Python2x\Lib\site-packages\Pythonwin\pywin\Demos\createwin.py

Here's the familiar "Hello, World!" as a dialog box in PyWin32:

from pywin.mfc import dialog, window

import win32con

dlgStatic = 130

dlgButton = 128

class Mydialog(dialog.Dialog):

def OnInitDialog(self):

rc = dialog.Dialog.OnInitDialog(self)

return rc

style = (win32con.DS_MODALFRAME |

win32con.WS_POPUP |

win32con.WS_VISIBLE |

win32con.WS_CAPTION |

win32con.WS_SYSMENU |

win32con.DS_SETFONT)

cs = win32con.WS_CHILD | win32con.WS_VISIBLE

s = win32con.WS_TABSTOP | cs

w = 64

h = 64

#1. Let's create a dialog box with a label and a pushbutton

dlg = [["PyWin32",(0, 0, w, h), style, None, (8, "MS Sans Serif")],]

dlg.append([dlgStatic,"OK", win32con.IDOK, (7, h - 18, 50, 14), s | win32con.BS_PUSHBUTTON])

dlg.append([dlgStatic, "Hello, world!", -1, (7, 9, 50, 14), cs | win32con.SS_LEFT])

#2. Let's start the dialog

d = Mydialog(dlg)

#3. Display it

d.DoModal()

Here's how to add a progress bar, set its range, and increment it:

def OnInitDialog(self):

rc = dialog.Dialog.OnInitDialog(self)

self.pbar = win32ui.CreateProgressCtrl()

self.pbar.CreateWindow (win32con.WS_CHILD | win32con.WS_VISIBLE, (7, 30, 270, 50), self, 1001)

#Find out how many *.HTM* in /input, and set range of progress bar

filecount = 0

for file in glob.glob('*.htm*'):

filecount+=1

self.pbar.SetRange(0,filecount)

for file in glob.glob('*.htm*'):

self.pbar.SetStep(1)

self.pbar.StepIt()

[...]

Python GUI API Project

Python GUI API Project
"Provide implementations of the API for the three major platforms (Unix, Macintosh and Windows) that are small and lightweight, interposing as little code as possible between the Python application and the platform's underlying GUI facilities, and not bloating the Python installations or applications which use them."
Uses GTK, so should run on Windows with the ad hoc version of GTK

wxPython

More infos here.

PyQT

PyQT: GUI Programming with Python: QT Edition by Boudewijn Rempt (online)

.Net (Mono, DotGNU)

This is very early development, but if you like bleeding edge stuff, you could start looking at how to develop applications using either MS' official .Net framework and its tools (VS.Net and the Python add-on, etc.), or the compatible open-source versions that are Mono and DotGNU. Take a look at IronPython, and boo.

pyFLTK

pyFLTK: Wrapper for the Fast Light Tool Kit cross-platform graphical user-interface library; Latest news 12/17/03...

PyGTK

"If you like GTK+, you might want to try the glade designer and parse the XML file with libglade and pygtk. (Generated code is bad). Remember, glade generates XML. XML is not code, XML is data. And data is not code. As long as you stay away from generated code, you will be safe. Yup, the best of two worlds -- a graphical form designer that stores information in XML data to be parsed by your own python program."

"BTW, there's a python port of glade underway: http://gruppy.sicem.biz/componentes#gazpacho"

PyGTK: uses the famous GTK cross-platform framework

FXPy

FXPy: "FXPy is a Python extension module which provides an interface to the FOX GUI library" (site says "Last Update: $Date: 2002/04/02 21:12:59"...)

Binding to the TnFox Toolkit?

http://www.osnews.com/story.php?news_id=9701

PyGUI

http://www.cosc.canterbury.ac.nz/~greg/python_gui/
"This is a project to develop a cross-platform pythonic GUI API."

WAX

WAX: GUI toolkit that sits on top of wxPython. "Frankly, I don't have the time to work on this project very much. That's why I only add new features when I need them."

PyUI

http://pyui.sourceforge.net/
"PyUI is a user interface library written entirely in the high-level language python. It has a modular implementation that allows the drawing and event input to be performed by pluggable "renderers". This makes PyUI very portable and scalable. It can run in environments from hardware accelerated 3D surfaces to regular desktop windows. PyUI was originally targeted as a User Interface for games, but it has evolved into a more general UI toolkit with applicability outside of games."

Windows

RipSting’s Blender-Python GUI Designer

A nifty app apparently written in VB.Net
http://oregonstate.edu/~dennisa/Blender/BPG/

Blender GUI Wizard

http://www.angelfire.com/nt/teklord/GUIWizard.htm

ActiveState GUI Builder

Available in the Pro version of the Komodo IDE
"The GUI Builder supports building applications using TK with an extended widget set that includes BWidgets and IWidgets."

Venster

As of Aug 2004, prealpha
http://venster.sf.net/
"Venster is a highly native Windows GUI toolkit for Python based on the ctypes ffi library. The aim of Venster is to be a very lightweight wrapper around the standard Win32 API, making it easy to write slick windows applications in pure Python. [...] The windows API is called trough the use of Thomas Hellers's excellent ctypes module

PythonWorks Pro

http://www.pythonware.com/
No longer developed

EasyDialogs for Windows

http://www.averdevelopment.com/python/EasyDialogs.html
Requires the ctypes package
Only displays a few, set dialogs, ie. won't let you build your own windows

Dabo

"Dabo is a 3-tier, cross-platform application development framework, written in Python atop the wxPython GUI toolkit"

DynWin

"A dynamic GUI class library for Python and Win32"
http://www.nightmare.com/~rushing/dynwin/

PythonWin

Wrapper for the Microsoft Foundation Class library
http://www.python.org/windows/pythonwin/

sdk32 - Partial Python wrap of the Win32 Platform SDK

http://www.object-craft.com.au/projects/sdk32/

GTK

MojoView

http://www.netpromi.com/mojoview.html
"MojoView is a Python package that assists you in building simple PyGTK2 database applications."

QT

QT Designer

Obviously uses the QT widget set instead of Windows' or wxWidgets'
Comes with the QT set, but can be bought with the BlackAdder IDE

wxWidgets

Dabo

"Dabo is a 3-tier, cross-platform application development framework, written in Python atop the wxPython GUI toolkit. And while Dabo is designed to create database-centric apps, that is not a requirement. Lots of people are using Dabo for the GUI tools to create apps that have no need to connect to a database at all."

wxDesigner

commercial dialog editor and RAD tool for the free wxWidgets GUI library
http://www.roebling.de/
To add widgets, include a sizer (wxWidgets concept of a group of widgets)

wxGlade

GUI designer written in Python with the popular GUI toolkit wxPython
It is not (and will never be) a full featured IDE, but simply a "designer": the generated code does nothing apart from displaying the created widgets
http://wxglade.sourceforge.net/
To add widgets, include a sizer (wxWidgets concept of a group of widgets)

XRCed

http://xrced.sourceforge.net/
XRCed is a simple resource editor for wxWindows/wxPython GUI development which supports creating and editing files in XRC format. It is written in Python and uses wxPython GUI toolkit.

VisualWx

http://visualwx.altervista.org/
Visualwx aims to be the ultimate RAD tool, designed for c++,python,perl,ruby, lua and wxWidgets.
As of Aug 2004, still alpha but under active development

Boa Constructor

See above

PythonCard

http://pythoncard.sourceforge.net/
Improved features to deal with database servers. It doesn't have absolutely everything you require, but it's pretty good for alpha software. In particular, have a look at the dbBrowser sample application which shows how to interact with MySQL. Here's an article on working with the ADO interface, and here's a tutorial on working with COM objects.
Ready for prime time? "The first goal of the project is to produce a prototype. This will prove the applicability of the tools, concepts and high level design of PythonCard. To see how this is progressing;"
Building GUI Applications with PythonCard and PyCrust by Patrick O'Brien

Dialogblocks

Commercial RAD tool
http://www.anthemion.co.uk/dialogblocks/
Only handles C++ and XRC, not Python? (XRC = the wxWidgets resource format)
Like the other designers, doesn't let you draw windows like VB
DialogBlocks is programmed with the popular open source (free) GUI toolkit, wxWidgets, created in 1992 by DialogBlocks' author, Julian Smart.

Tcl/Tk

Visual TCL

"Visual Tcl is a freely-available, high-quality application development environment for UNIX, Windows, Macintosh and AS400 platforms. Visual Tcl is written entirely in Tcl/Tk and generates pure Tcl/Tk code. This makes porting your Visual Tcl applications either unnecessary or trivial. Visual Tcl is covered by the GNU General Public License."

PAGE - Python Automatic GUI Generator

http://page.sourceforge.net/
PAGE is an automatic GUI generator which bears a resemblance to Visual Basic.

Resources

Writing GUIs with Tcl/Tk and TKinter

Notes:

Using pythonw.exe instead of python.exe hides the console, so should only be used once an application is bug-free, or error messages can be displayed through the GUI. With the .pyw extension, a tkinter script won't show a terminal window either, but the script will also be unable to output text to the console, even if the script was launched manually through a terminal window
Starting from Python 3, the Tkinter module was renamed to tkinter (with a lowercase t). It also affects to the tkinter.ttk and deprecated) tkinter.tix extensions
Don't use "from … import *", as it mixes everything into a single namespace

Tkinter is Python's object-oriented layer on top of Tcl/Tk. Tk only offers basic widgets; If you need more, check out wxPython and PyQt.

Three main concepts: Widgets, event handling, and geometry management (pack, grid, place; pack is the simplest for simple layouts, grid is the most commonly used, and place is the least popular but provides the best control).

Books

NOK "Python GUI Programming with Tkinter" (2018) by Alan D. Moore
READING Tkinter GUI application development cookbook-Packt Publishing _ Alejandro rieur Rodas de Paz (écra2018).pdf
"TKinter GUI Application Development Blueprints" (2015) by Bhaskar Chaudhary
Modern Tkinter for Busy Python Developers - Roseman (Mark Roseman, 2012).pdf
Tkinter 8.5 reference- a GUI for Python - Shipman (New Mexico Tech Computer Center;2013;eng).pdf
"Python and Tkinter Programming" (2000) by John E. Grayson

Layout/Geometry Managers

Historically, Tkinter supports three layout managers:

pack: Easiest to use; "For simple layouts, such as when you want to place all the widgets on top of each other or side by side"
grid: "To prevent unexpected results in complex layouts, it is common to group widgets with frames so that when you pack all the widgets within a frame, you do not interfere with the arrangement of the other ones. In these cases, use the Grid geometry manager since it allows you to directly set the position of each widget with one call to the geometry manager and avoids the need for additional frames."; "considered the more versatile of the three geometry managers. It directly reassembles the grid concept that is commonly used in user interface design—a two-dimensional table divided into rows and columns, where each cell represents the space available for a widget"
place: "allows you to set the position and size of a widget in absolute terms, or in relative terms to another one. Of the three geometry managers, it is the least commonly used one. On the other hand, it can fit some complex scenarios where you want to freely position a widget or overlap a previously placed one. They shouldn't be mixed in the same application!"

Grid

The container frame is organized into a two-dimensional table where each cell can hold one widget. However, widgets can be made to span multiple cells.

Pack

.pack(side=LEFT|RIGHT|TOP|BOTTOM, fill=X|Y|BOTH,expand=YES|NO,anchor=N|NE|E|SE|S|SW|W|NW)

The pack manager is ideally suited for the following two kinds épauleof situation:

Placing widgets in a top-down manner
Placing widgets side by side

Widgets

Tkinder provides the following widgets:

Toplevel
Canvas
PanedWindow
Frame
Label
LabelFrame
Entry
Text
Button
Listbox
Checkbutton
OptionMenu
Radiobutton
Menu
Menubutton
Message
Scale
Scrollbar
Spinbox
Bitmap Class
Image Class

Test

To check that Python is correctly installed and that Tkinter works, open a terminal window, and run the following command: python -m tkinter

Loading Tkinter

#Bad
from tkinter import *

#Better
import tkinter

#Best
import tkinter as tk

Hello, world!

import tkinter as tk

root = tk.Tk()

root.title("My title")

w = tk.Label(root, text="Hello Tkinter!")

#Fit the size of the window to the given text

w.pack()

root.mainloop()

Dialog

A simple OK dialog:

from tkinter import Tk

from tkinter import messagebox

# Hide parent window; in Windows, use ".pyw" as the extension to hide the terminal window as well

Tk().withdraw()

messagebox.showinfo("My title", "Hello")

An Yes/No dialog:

from Tkinter import *

from tkMessageBox import *

def answer():

showerror("Answer", "Sorry, no answer available")

def callback():

if askyesno('Verify', 'Really quit?'):

showwarning('Yes', 'Not yet implemented')

else:

showinfo('No', 'Quit has been cancelled')

Button(text='Quit', command=callback).pack(fill=X)

Button(text='Answer', command=answer).pack(fill=X)

mainloop()

To hide the main window:

from tkinter import Tk

from tkinter.filedialog import askopenfilename

# we don't want a full GUI, so keep the root window from appearing

Tk().withdraw()

# show an "Open" dialog box and return the path to the selected file

filename = askopenfilename()

if not filename:

exit()

print(filename)

Displaying text: Message, and Text

The Message widget has more features than Label, and the Text widget has even more features.

Message

import tkinter as tk

master = tk.Tk()

whatever_you_do = "Whatever you do will be insignificant, but it is very important that you do it.\n(Mahatma Gandhi)"

msg = tk.Message(master, text = whatever_you_do)

msg.config(bg='lightgreen', font=('times', 24, 'italic'))

msg.pack()

tk.mainloop()

Buttons

import tkinter as tk

import random

def change_label():

button.config(text=str(random.randint(1,101)))

root = tk.Tk()

root.title("Changing label")

button = tk.Button(root, text='Change', width=25, command=change_label)

button.pack()

root.mainloop()

Closing an application

Button(master, text='Quit', command=master.quit)

Checkboxes

var1 = IntVar()

Checkbutton(master, text="male", variable=var1).grid(row=1, sticky=W)

Radio button

tk.Radiobutton(root,

text=language,

padx = 20,

variable=v,

command=ShowChoice,

value=val).pack(anchor=tk.W)

Variable : tell Tkinter what radiobutton should be the default choice

Text = radiobutton label ; Value = ?

Entry

For just a single line of text.

import tkinter as tk

def show_entry_fields():

print("First Name: %s" % (e1.get()))

master = tk.Tk()

tk.Label(master, text="First Name").grid(row=0)

e1 = tk.Entry(master)

e1.grid(row=0, column=1)

tk.Button(master,

text='Show', command=show_entry_fields).grid(row=3, column=1, sticky=tk.W, pady=4)

tk.mainloop()

Text

Multiple lines of text.

import tkinter as tk

root = tk.Tk()

S = tk.Scrollbar(root)

T = tk.Text(root, height=4, width=50)

S.pack(side=tk.RIGHT, fill=tk.Y)

T.pack(side=tk.LEFT, fill=tk.Y)

S.config(command=T.yview)

T.config(yscrollcommand=S.set)

quote = """HAMLET: To be, or not to be--that is the question:

Whether 'tis nobler in the mind to suffer

The slings and arrows of outrageous fortune

Or to take arms against a sea of troubles

And by opposing end them. To die, to sleep--

No more--and by a sleep to say we end

The heartache, and the thousand natural shocks

That flesh is heir to. 'Tis a consummation

Devoutly to be wished."""

T.insert(tk.END, quote)

tk.mainloop()

Showing picture

To eg. display a picture. Note that the Canvas object can only display GIF and PGM/PPM files.

import tkinter as tk

from PIL import ImageTk, Image

master = tk.Tk()

master.title("Join")

master.geometry("300x300")

master.configure(background='grey')

img = ImageTk.PhotoImage(Image.open("IMG_20190522_164109.jpg"))

panel = tk.Label(master, image = img)

panel.pack(side = "bottom", fill = "both", expand = "yes")

tk.mainloop()

Variable Classes

Some widgets (like text entry widgets, radio buttons and so on) can be connected directly to application variables by using special options: variable, textvariable, onvalue, offvalue, and value. This connection works both ways: if the variable changes for any reason, the widget it's connected to will be updated to reflect the new value. These Tkinter control variables are used like regular Python variables to keep certain values. It's not possible to hand over a regular Python variable to a widget through a variable or textvariable option. The only kinds of variables for which this works are variables that are subclassed from a class called Variable, defined in the Tkinter module. They are declared like this:

x = StringVar() # Holds a string; default value ""

x = IntVar() # Holds an integer; default value 0

x = DoubleVar() # Holds a float; default value 0.0

x = BooleanVar() # Holds a boolean, returns 0 for False and 1 for True

To read the current value of such a variable, call the method get(). The value of such a variable can be changed with the set() method.

Web development

More infos here.

Database access

http://www.python.org/sigs/db-sig/

json

https://realpython.com/python-json/

https://jsonplaceholder.typicode.com/

PyGeoj, "a simple Python Geojson file reader and writer."

Encoding JSON = serialization or marshaling; decoding = deserialization.

dumps() is used to handle data in RAM while dump() is to write them to disk.

turn json into Python objets

Use load() and loads().

JSON	Python
object	dict
array	list
string	str
number (int)	int
number (real)	float
true	True
false	False
null	None

read

with open("data_file.json", "r") as read_file:

todos = json.loads(response.text)

with open("data_file.json", "r") as read_file:

json_string = json.dumps(read_file)

turn dictionary into json

data = {

"president": {

"name": "Zaphod Beeblebrox",

"species": "Betelgeusian"

}

json_string = json.dumps(data)

print(json_string)

write

data = {

"president": {

"name": "Zaphod Beeblebrox",

"species": "Betelgeusian"

}

with open("data_file.json", "w") as write_file:

json.dump(data, write_file)

geoJSON

https://pypi.org/search/?q=geojson

What package/library/module is recommended to work with geoJSON files?

c:\>pip install json

Collecting json

Could not find a version that satisfies the requirement json (from versions: )

No matching distribution found for json

c:\>pip search geojson

geojson (2.4.0) - Python bindings and utilities for GeoJSON

geojsontools (0.0.3) - Functions for manipulating geojsons

geojson_elevation (0.1) - GeoJSON compatible elevation proxy

geojson_utils (0.0.2) - Python helper functions for manipulating GeoJSON

PyGeoj (0.22) - A simple Python GeoJSON file reader and writer.

Setup

pip install geojson

Features

Read

with open('myfile.geojson') as f:

gj = geojson.load(f)

gj['features'][0]

for feature in gj['features'][0]:

print(feature)

print(gj)

Write

props = {"name": "My name","country": "Spain"}

point = Point((-115.81, 37.24))

features = []

features.append(Feature(properties=props,geometry=point))

feature_collection = FeatureCollection(features)

with open('myfile.geojson', 'w') as f:

dump(feature_collection, f)

geopy

"geopy makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources."

https://geopy.readthedocs.io/en/stable/

https://programminghistorian.org/en/lessons/mapping-with-python-leaflet

Working with XML/HTML

https://stackabuse.com/reading-and-writing-xml-files-in-python/

Note: PyXML is deadware

minidom: simplified implementation of DOM

ElementTree (ET): More Pythonic interface than DOM; lxml is an enhanced version of ET

BeautifulSoup uses lxml, if available, and is an easy way to work with HTML/XML

"untangle is a simple library which takes an XML document and returns a Python object which mirrors the nodes and attributes in its structure."

More infos on XML here.

BeautifulSoup

"Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree. […] Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility." Python 3.x should use BeautifulSoup4. Once parsed, BS builds a tree of Python objects (Tag, NavigableString, BeautifulSoup, and Comment.)

"A string corresponds to a bit of text within a tag. Beautiful Soup uses the NavigableString class to contain these bits of text. A NavigableString is just like a Python Unicode string, except that it also supports some of the features described in Navigating the tree and Searching the tree. You can convert a NavigableString to a Unicode string with str: unicode_string = str(tag.string)"

"If you want to use a NavigableString outside of Beautiful Soup, you should call unicode() on it to turn it into a normal Python Unicode string. If you don’t, your string will carry around a reference to the entire Beautiful Soup parse tree, even when you’re done using Beautiful Soup. This is a big waste of memory."

BS provides two ways to find elements: find(_all)(), and select() with more sophisticated features as it's a CSS selector from Soup Sieve).

Different classes, to make it easier to find elements:

NavigableString
CData
ProcessingInstruction
Declaration
Doctype
Comment
etc.

Note: If the input data isn't in utf-8, BS will silently convert them, and edit the relevant meta line in the header if it's there — but won't add one if it isn't.

Questions

.string vs .text? "The string argument is new in Beautiful Soup 4.4.0. In earlier versions it was called text"

soup.select("kml Document") vs. soup.select("kml > Document")? The former finds any "Document" tags below "kml", no matter where in the tree, while the latter look for it directly under "kml".

CSS: Difference between "#sister" and ".sister"?

find_all only searche tags (elements), or also strings within?

How to parse and output

from bs4 import BeautifulSoup

#open in binary and let BS convert data to utf-8 if needed

soup = BeautifulSoup(open('input.html', 'rb'), 'xml')

#OR

soup = BeautifulSoup("<html>a web page</html>", 'html.parser')

print("Orig encod:",soup.original_encoding)

print(soup.prettify())

#To work with each tag before having it prettified

for c in soup.contents:

print(c.prettify())

Since BS doesn't add it if none is found in the header, here's how to add encoding information:

meta = soup.head.find("meta", {"http-equiv".lower():"Content-Type".lower()})

if meta is None:

metatag = soup.new_tag('meta')

metatag.attrs['http-equiv'] = 'Content-Type'

metatag.attrs['content'] = 'text/html; charset=utf-8'

soup.head.append(metatag)

else:

print("Found")

If you know how a file is (not) encoded, you can help BS by providing this information before it runs its Unicode, Damnit sub-library:

soup = BeautifulSoup(markup, 'html.parser', from_encoding="iso-8859-8")

soup = BeautifulSoup(markup, 'html.parser', exclude_encodings=["iso-8859-7"])

Parsing XML

By default, BS will use an HTML parser unless you specifically tell it to use an XML parser (which will need to be installed)

soup = BeautifulSoup(data, 'xml') #https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

Performance

Install lxml
Speed up encoding detection by installing the cchardet library
Parsing only part of a document won’t save you much time parsing the document, but it can save a lot of memory, and it’ll make searching the document much faster.

Navigating

link = soup.a

for parent in link.parents:

print(parent.name)

find_parents() and find_parent() work their way up the tree:

a_string = soup.find(string="Lacie")

a_string.find_parents("a")

sibling_soup = BeautifulSoup("<a>text1<c>text2</c></a>", 'html.parser')

print(sibling_soup.prettify())

sibling_soup.b.next_sibling

sibling_soup.c.previous_sibling

find_next_siblings() and find_next_sibling(), find_previous_siblings() and find_previous_sibling():

for sibling in soup.a.next_siblings:

print(repr(sibling))

.next_elements and .previous_elements #iterators to move forward or backward in the document as it was parsed

The find_all_next() method returns all matches, and find_next() only returns the first match. The find_all_previous() method returns all matches, and find_previous() only returns the first match.

len(list(soup.children))

len(list(soup.descendants))

Finding elements

soup.head

soup.title

soup.title.name #Important: "name" is a reserved keyword. To access a tag named <name>, use eg. wpt.find("name").string

soup.title.string #element's text

soup.title.get_text() #alternative

soup.get_text("|", strip=True)

soup.body.b #get the first tag below <body>

#get all text within a tree

text for text in soup.stripped_strings

soup.title.parent.name

soup.head.contents

soup.head.contents[0].name

title_tag = head_tag.contents[0]

for child in title_tag.children:

print(child)

soup.p #first paragraph

soup.p['class'] #display value of attribute

soup.find_all('a') #all hyperlinks

soup("a") #shortcut for soup.find_all("a")

soup.title(string=True) #shortcut for soup.title.find_all(string=True)

soup.find(id="link3") #all elements with that attribute

soup.find_all(string="Elsie") #first occurence <blah>Elsie</blah>

soup.find_all(string=["Tillie", "Elsie", "Lacie"])

soup.find_all(string=re.compile("Dormouse"))

soup.find_all("a", limit=2) #bad?

#Grab attributes

tag = soup.find('meta', {'name': 'keywords'})

print(tag)

print(tag.attrs)

print(tag.attrs.get('content'))

Note: If find_all() can’t find anything, it returns an empty list. If find() can’t find anything, it returns None:

#get content of a.href

for link in soup.find_all('a'):

print(link.get('href')) #get content of href attribute, ie. link

for string in soup.stripped_strings:

print(repr(string)) #returns a printable representation of the given object

find() and find_all() are the most popular search methods; Use filters (string, regex, list, function)

soup.find_all("p", "title")

soup.find_all(id=True) #all tags with an "id" attribute

soup.find_all(href=re.compile("elsie"), id='link1')

name_soup.find_all(attrs={"name": "email"}) #name is a reserved keyword

soup.find_all("a", class_="sister") #class is a reserved keyword

soup.find_all('b')

#regex

for tag in soup.find_all(re.compile("^b")):

print(tag.name)

soup.find_all(string="Elsie") #all tags that contain Elsie

BeautifulSoup has a .select() method which uses the SoupSieve package to run a CSS selector against a parsed document and return all the matching elements.

css_soup.select("p.strikeout.body") #CSS selector to search for tags that match two or more CSS classes

soup.select("html head title") #same as soup.title

soup.select("p > a") #directly under

soup.select_one(".sister") #only first one

soup.select("#link1 ~ .sister")

soup.select("#link1 + .sister")

soup.select(".sister")

soup.select("[class~=sister]")

soup.select("#link1")

soup.select("#link1,#link2")

soup.select('a[href]')

soup.select('a[href="http://example.com/elsie"]')

soup.select('a[href^="http://example.com/"]')

soup.select('a[href$="tillie"]')

soup.select('a[href*=".com/el"]')

Modifying the tree

Note: When calling eg. soup.mytag, BS will look for mytag anywhere in the three, not just right after soup

tag = soup.b

tag.name = "blockquote"

new_tag.string = "Link text."

tag.string.replace_with("No longer bold")

tag = BeautifulSoup('bold', 'html.parser').b

tag['id']

tag['id'] = 'verybold'

del tag['id']

tag = soup.a

tag.string = "New link text."

append()/insert() to add to an element's string (at the end, at a given location) which can be empty; new_tag() to add a whole tag. There's also insert_before() and insert_after(). Use clear() to empty a tag's string. Use extract()/decompose() to remove a tag from the tree.

replace_with() can be used with more than one argument: a_tag.b.replace_with(bold_tag, ".", i_tag)

soup = BeautifulSoup("<a>Foo</a>", 'html.parser')

soup.a.append("Bar") #<a>FooBar</a>

#alternative

new_string = NavigableString("ed")

soup.a.append(new_string) #<a>FooBared</a>

Important: append/insert is used to edit the string of a tag, which can include a whole block (ie. to add a new tag), not just the string of a basic tag

original_tag = soup.b

new_tag = soup.new_tag("a", href="http://www.example.com")

new_tag.string = "Link text."

original_tag.append(new_tag) # <a href="http://www.example.com">Link text.</a>

To clean a tag that holds multiple NavigableString objects after using .append():

soup.smooth()

print(soup.p.prettify())

soup = BeautifulSoup("<a>Soup</a>", 'html.parser')

soup.a.extend(["'s", " ", "on"])

soup # <a>Soup's on</a>

soup.a.contents # ['Soup', ''s', ' ', 'on']

from bs4 import Comment

new_comment = Comment("Nice to see you.")

tag.append(new_comment) # Hello there

tag = soup.a #<a href="http://example.com/">I linked to example.com</a>

tag.insert(1, "but did not endorse ") #<a href="http://example.com/">I linked to but did not endorse example.com</a>

soup #leave

tag = soup.new_tag("i")

tag.string = "Don't"

soup.b.string.insert_before(tag) # Don'tleave

div = soup.new_tag('div')

div.string = 'ever'

soup.b.i.insert_after(" you ", div) # Don't you <div>ever</div> leave

soup.b.contents # [Don't, ' you', <div>ever</div>, 'leave']

soup #<a href="http://example.com/">I linked to example.com</a>

tag = soup.a

tag.clear() # <a href="http://example.com/"></a>

a_tag = soup.a #<a href="http://example.com/">I linked to example.com</a>

i_tag = soup.i.extract()

a_tag # <a href="http://example.com/">I linked to</a>

i_tag # example.com

a_tag = soup.a #<a href="http://example.com/">I linked to example.com</a>

i_tag = soup.i

i_tag.decompose()

a_tag # <a href="http://example.com/">I linked to</a>

a_tag = soup.a #<a href="http://example.com/">I linked to example.com</a>

new_tag = soup.new_tag("b")

new_tag.string = "example.com"

a_tag.i.replace_with(new_tag) # <a href="http://example.com/">I linked to example.com</a>

bold_tag = soup.new_tag("b")

bold_tag.string = "example"

i_tag = soup.new_tag("i")

i_tag.string = "net"

a_tag.b.replace_with(bold_tag, ".", i_tag) # <a href="http://example.com/">I linked to example.net</a>

#I wish I was bold.

soup.p.string.wrap(soup.new_tag("b"))

# I wish I was bold.

a_tag = soup.a #<a href="http://example.com/">I linked to example.com</a>'

a_tag.i.unwrap() # <a href="http://example.com/">I linked to example.com</a>

#to inject a tree into another

doc = BeautifulSoup("<document><content/>INSERT FOOTER HERE</document", "xml")

footer = BeautifulSoup("<footer>Here's the footer</footer>", "xml")

doc.find(text="INSERT FOOTER HERE").replace_with(footer)

#<header/>

header = soup.header

header.string = "blah"

Output

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#output

soup.prettify()

soup.a.prettify() #just a subpart of the tree

#raw output

str(soup)

str(soup.a)

unicode_string = str(tag.string)

Note: The str() function returns a string encoded in UTF-8. See Encodings for other options. You can also call encode() to get a bytestring, and decode() to get Unicode.

If you need more sophisticated control over your output, you can use Beautiful Soup’s Formatter class:

from bs4.formatter import HTMLFormatter

formatter = HTMLFormatter(uppercase)

print(soup.prettify(formatter=formatter))

To find where elements are located in the source file:

soup = BeautifulSoup(markup, 'html.parser')

for tag in soup.find_all('p'):

print(repr((tag.sourceline, tag.sourcepos, tag.string)))

To copy an element (which won't be part of the tree):

import copy

p_copy = copy.copy(soup.p)

To only parse and find certain elements:

from bs4 import SoupStrainer

only_a_tags = SoupStrainer("a")

To investigate what BS does:

from bs4.diagnose import diagnose

with open("bad.html") as fp:

data = fp.read()

diagnose(data)

Encoding

Regardless of how it's encoded originally, when loaded into Beautiful Soup, it's converted to Unicode. Beautiful Soup uses a sub-library called Unicode, Dammit to detect a document’s encoding and convert it to Unicode.

from bs4 import UnicodeDammit

dammit = UnicodeDammit("Sacr\xc3\xa9 bleu!")

print(dammit.unicode_markup)

# Sacré bleu!

dammit.original_encoding

# 'utf-8'

Unicode, Dammit’s guesses will get a lot more accurate if you install one of these Python libraries: charset-normalizer, chardet, or cchardet.

If you have your own suspicions as to what the encoding might be, you can pass them in as a list:

dammit = UnicodeDammit("Sacr\xe9 bleu!", ["latin-1", "iso-8859-1"])

print(dammit.unicode_markup)

# Sacré bleu!

dammit.original_encoding

# 'latin-1'

You can check the encoding found by BS using "soup.original_encoding".

If you happen to know a document’s encoding ahead of time, you can avoid mistakes and delays by passing it to the BeautifulSoup constructor as from_encoding:

soup = BeautifulSoup(markup, 'html.parser', from_encoding="iso-8859-8")

When you write out a document from Beautiful Soup, you get a UTF-8 document, even if the document wasn’t in UTF-8 to begin with. If you don’t want UTF-8, you can pass an encoding into prettify():

print(soup.prettify("latin-1"))

#alternatively

soup.p.encode("utf-8")

To read

https://www.crummy.com/software/BeautifulSoup/
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
To print out a report showing you how different parsers handle the document: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#diagnose
mailing list: https://groups.google.com/g/beautifulsoup

Internal xml module: minidom and ElementTree

"The ElementTree library was contributed to the standard library by Fredrick Lundh. It includes tools for parsing XML using event-based and document-based APIs, searching parsed documents with XPath expressions, and creating new or modifying existing documents."

"Python has two interfaces — minidom and Element Tree — probably because Element Tree was integrated into the standard library a good deal later after minidom came to be. The reason for this was likely its far more "Pythonic" API compared to the W3C-controlled DOM." (Source)

Python's ElementTree has only limited support for XPath. If you need more, try lxml.

Besides the Python implementation xml.etree.ElementTree, there is also a C implementation in xml.etree.cElementTree, which in Python3 is used automatically. The xml.etree.cElementTree module is now deprecated.

Python's XML module includes…

xml.dom: the DOM API definition
xml.dom.minidom: a minimal DOM implementation
xml.dom.pulldom: support for building partial DOM trees
xml.etree.ElementTree: the ElementTree API, a simple and lightweight XML processor
xml.sax: SAX2 base classes and convenience functions
xml.parsers.expat: the Expat parser binding

ET has two classes for this purpose - ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.

Issue I had while learning how to use minidom and ET:

minidom removes encoding info: <?xml version="1.0" encoding="UTF-8"?> -> <?xml version="1.0" ?>

Must use pretty = doc.toxml(encoding='utf-8')/prettyxml = doc.toprettyxml(encoding='utf-8'), and save file as bytes instead of string… but wrong accents (Ã© instead of é)

minidom.parseString(inputdata)
prettyxml = doc.toprettyxml(encoding='utf-8') #Adds encoding="UTF-8" to XML header, but messes with accents, and uses bytes
prettyxml = doc.toprettyxml() #Accents OK but missing encoding="UTF-8" in XML header, and uses string

Work-around: prettyxml = prettyxml.replace('<?xml version="1.0" ?>', '<?xml version="1.0" encoding="UTF-8"?>')

CHECK ElementTree insists on prepending namespace to output data ("ns0:"); The ET.register_namespace() trick did nothing; Had to find code to remove those from the source file before parsing

lxml

Since they're largely compatible, any tutorial about Element(Tree) will do, not just the limited doco from lxml which assumes people already know ET.

"lxml is significantly faster [than ElementTree], can be used to parse HTML, and supports XPath. […] lxml is also easier to use with namespaces." (Source) lxml.etree versus ElementTree

"The lxml toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. lxml.etree follows the ElementTree API as much as possible, building it on top of the native libxml2 tree."

What's the difference between tree and root?

Note: In lxml 4.6.3.0 at least, there's a bug when parsing an HTML through a filename rather than a file handle, with lxml adding "" before each carriage-return:

#BAD 

tree = et.parse(INPUT,parser)

#OK

with open(INPUT) as tempfile:

tree = et.parse(tempfile, parser=parser)

If need be, encoding/decoding can be specified: print(ET.tostring(root, encoding='utf8').decode('utf8')).

Here's how to find elements, and get their parents, which is need to delete the element:

for movie in root.findall("./foo/bar/[@multiple='Yes']..."):

print(movie.attrib)

The difference between iterfind() and findall() is that the former returns an iterator, and only searches through the tree as needed, while findall() first returns all the data.

iter()? https://docs.python.org/3/library/xml.etree.elementtree.html#elementinclude-functions

It's possible to only get descendants under a given tag:

tag_name = "ellipse"

for descendant in root.iter(tag_name):

print(descendant)

Dealing with namespaces is more convenient when using .iterfind(), which accepts an optional mapping of prefixes to domain names:

namespaces = {"": "http://www.w3.org/2000/svg","custom": "http://www.w3.org/2000/svg"}

for descendant in root.iterfind("g", namespaces):

print(descendant)

findtext() and itertext() work on elements' text.

There is no move() method: You'll have to find + append + remove:

action = root.find("./genre[@category='Action']")

new_dec = ET.SubElement(action, 'decade')

new_dec.attrib["years"] = '2000s'

xmen = root.find("./genre/decade/movie[@title='X-Men']")

dec2000s = root.find("./genre[@category='Action']/decade[@years='2000s']")

dec2000s.append(xmen)

dec1990s = root.find("./genre[@category='Action']/decade[@years='1990s']")

dec1990s.remove(xmen)

Install

pip install lxml

Quick test

import lxml.etree as et

tree = et.parse("input.gpx")
root = tree.getroot()

#Retrieves direct children nodes of the root

for child in root:
print(child.tag, child.attrib)
#? print(root[0][1].text)

Logic

An XML file is made of elements (or "nodes"). Each element has a tag, and possibly attributes and text.

The ElementTree package consists of two classes: ElementTree (the whole structure) and Element (nodes).

You first need to read the input, either from a file or a string, have ET parse it and return a pointer to either the tree (ET.parse("myfile.xml") followed by tree.getroot()) or the root element directly (ET.fromstring()).

Once you have a pointer to the root element, you can navigate and modify the tree before writing the edited output back to a file.

Ways to get/set infos from an element:

myElement.tag
myElement.text
myElement.attrib: Returns a dictionary
myElement.get('myAttr')
myElement.set('myAttr', myValue) #Alternative: myElement.attrib['myAttr']=myValue
root.append(myElement)
root.insert()
root.remove(myElement)

To create a tree from code:

a = ET.Element('a')

b = ET.SubElement(a, 'b')

c = ET.SubElement(a, 'c')

d = ET.SubElement(c, 'd')

ET.dump(a) -> <a><c><d /></c></a>

Alternatively:

root = etree.HTML("data")

print(etree.tostring(root))

XPath ("XML Path Language and uses") has more features than ElementTree to find elements (ElementTree's Supported XPath syntax). "The .find*() methods are usually faster than the full-blown XPath support. They also support incremental tree processing through the .iterfind() method, whereas XPath always collects all results before returning them. They are therefore recommended over XPath for both speed and memory reasons, whenever there is no need for highly selective XPath queries."

"ElementTree objects have a method getpath(element), which returns a structural, absolute XPath expression to find that element:

c = etree.SubElement(a, "c")
d2 = etree.SubElement(c, "d")
tree = etree.ElementTree(c)
print(tree.getpath(d2))
/c/d[2]

"For ElementTree, the xpath method performs a global XPath query against the document (if absolute) or against the root node (if relative):

r = tree.xpath('/foo/bar')
prin(r[0].tag)

"The XPath class compiles an XPath expression into a callable function. The compilation takes as much time as in the xpath() method, but it is done only once per class instantiation. This makes it especially efficient for repeated evaluation of the same XPath expression. Just like the xpath() method, the XPath class supports XPath variables:

root = etree.XML("<root><a></a></root>")
find = etree.XPath("//b")
print(find(root)[0].tag)

"ElementTree supports a language named ElementPath in its find*() methods. One of the main differences between XPath and ElementPath is that the XPath language requires an indirection through prefixes for namespace support, whereas ElementTree uses the Clark notation ({ns}name) to avoid prefixes completely. The other major difference regards the capabilities of both path languages. Where XPath supports various sophisticated ways of restricting the result set through functions and boolean expressions, ElementPath only supports pure path traversal without nesting or further conditions."

lxml.etree vs lxml.objectify: The two modules provide different ways of handling XML. However, objectify builds on top of lxml.etree and therefore inherits most of its capabilities and a large portion of its API. lxml.etree is a generic API for XML and HTML handling. It aims for ElementTree compatibility and supports the entire XML infoset. It is well suited for both mixed content and data centric XML. Its generality makes it the best choice for most applications. lxml.objectify is a specialized API for XML data handling in a Python object syntax. It provides a very natural way to deal with data fields stored in a structurally well defined XML format. Data is automatically converted to Python data types and can be manipulated with normal Python operators. Look at the examples in the objectify documentation to see what it feels like to use it. Objectify is not well suited for mixed contents or HTML documents. As it is built on top of lxml.etree, however, it inherits the normal support for XPath, XSLT or validation.

Parsing HTML

https://lxml.de/lxmlhtml.html

document_fromstring(string): Parses a document from the given string. This always creates a correct HTML document, which means the parent node is <html>, and there is a body and possibly a head.
fragment_fromstring(string, create_parent=False): Returns an HTML fragment from a string. The fragment must contain just a single element, unless create_parent is given; e.g., fragment_fromstring(string, create_parent='div') will wrap the element in a <div>.
fragments_fromstring(string): Returns a list of the elements found in the fragment.
fromstring(string): Returns document_fromstring or fragment_fromstring, based on whether the string looks like a full document, or just a fragment.

If BeautifulSoup's UnicodeDammit doesn't solve an incorrect encoding declaration, ElementSoup makes use of the BeautifulSoup parser to build an lxml HTML tree from broken HTML.

E-factory makes it possible to quickly generate HTML pages and fragments:

from lxml.html import builder as E

from lxml.html import usedoctest

html = E.HTML(

E.HEAD(

E.LINK(rel="stylesheet", href="great.css", type="text/css"),

E.TITLE("Best Page Ever")

E.BODY(

E.H1(E.CLASS("heading"), "Top News"),

E.P("World News only on this page", style="font-size: 200%"),

"Ah, and here's some more text, by the way.",

lxml.html.fromstring("... and this is a parsed fragment ...")

)

print lxml.html.tostring(html)

lxml.html.open_in_browser(lxml_doc) writes the document to disk and open it in the default browser.

lxml.html also supports working with links and forms, and cleaning HTML (removing embedded or script content, special tags, CSS style annotations, etc.)

Namespaces

Namespaces are required in XML files that contain data from different sources that might use elements with the same name, eg. "name" as a way to know which one is meant each time. In simple XML files, namespaces are not required, and can be removed from the source file to make things easier.

Namespaces can be any string; It's a convention to use a URL, which can point to a document providing information about it.

Namespaces can be either set in the input file, or through ElementTree:

#Default namespaces follow this format.

xmlns="namespaceURI"

#Be sure to replace "URI" with the actual URI in your XML document.

ET.register_namespace('', "URI")

ET.register_namespace('xsi', "http://www.w3.org/2001/XMLSchema-instance")

Classes

Some interesting methods:

class lxml.etree._Element

addnext(self, element)
addprevious(self, element)
append(self, element)
clear(self, keep_tail=False)
find(self, path, namespaces=None)
findall(self, path, namespaces=None)
findtext(self, path, default=None, namespaces=None)
get(self, key, default=None)
list(element) (was: getchildren(self))
getnext(self)
getparent(self)
getprevious(self)
getroottree(self)
insert(self, index, element)
iter(self, tag=None, *tags)
remove(self, element)
replace(self, old_element, new_element)
xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables)

class lxml.etree._ElementTree

find(self, path, namespaces=None)
findall(self, path, namespaces=None)
findtext(self, path, default=None, namespaces=None)
getelementpath(self, element)
getpath(self, element)
iter(self, tag=None, *tags)
iterfind(self, path, namespaces=None)
write(file, *, encoding, method, pretty_print, xml_declaration, with_tail, standalone, doctype, compression, exclusive, inclusive_ns_prefixes, with_comments, strip_text, docstring)
xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables)

Output

Note: The tree is binary data, while the root is string:

#with open(OUTPUFILE, 'wb') as writer:

with open(OUTPUTFILE, 'wt',encoding='utf-8') as writer:

#TypeError: write() argument must be str, not bytes

writer.write(et.tostring(root,pretty_print=True))

To print the whole tree:

print(ET.tostring(root, encoding='utf8').decode('utf8'))

#makes no difference: Still binary dump
print(ET.tostring(root,pretty_print=True))

Displaying infos:

tree = etree.ElementTree(root)

print(tree.docinfo.xml_version)

print(tree.docinfo.doctype)

tree.docinfo.public_id = '-//W3C//DTD XHTML 1.0 Transitional//EN'

tree.docinfo.system_url = 'file://local.dtd'

print(tree.docinfo.doctype)

lxml also supports indenting:

etree.indent(root)

print(etree.tostring(root))

etree.indent(root, space=" ")

print(etree.tostring(root))

etree.indent(root, space="\t")

etree.tostring(root)

Outputing XML, HTML, text:

print(etree.tostring(root)) #Default is XML

print(etree.tostring(root, method='html', pretty_print=True))

print(etree.tostring(root, method='text', encoding="UTF-8"))

To start from a clean plate:

parser = etree.XMLParser(remove_blank_text=True)

tree = etree.parse(filename, parser)

print(ET.tostring(root,pretty_print=True))

In case there are redundant namespaces:

parser = et.XMLParser(ns_clean=True,remove_blank_text=True)

tree = et.parse(INPUTFILE, parser)

print(et.tostring(tree.getroot()))

To write the tree to a file:

f = open('doc.xml', 'w')

f.write(etree.tostring(root, pretty_print=True))

f.close()

Another way to get pretty printing is to use Tidy:

import tidy

f.write(tidy.parseString(your_xml_str, **{'output_xml':1, 'indent':1, 'input_xml':1}))

If ET complains about encoding, try this:

print(ET.tostring(root, encoding='utf8').decode('utf8'))

Finding elements

There are multiple ways to search for elements

Direct children of the root node can be found with a simple "for child in root"
find/findall: find() returns the first hit, while findall() returns all of them; it either takes a single element, or an XPath string; note that find() called with a simple element won't search deeper than the level right below (use XPath for a deeper search)
xpath: For advanced queries
iter/iterfind: Both are recommended when reading lots of data. The former returns all the elements that match the tag; The latter supports a path, so only elements within a given section of the tree will be read

An element acts like a list where the children are items in the list, eg. len(root) returns the number of elements below the root. Attributes are dictionaries.

find() returns a single element, if any, while findall() returns a dictionary.

findall() is part of the original ElementTree API. It supports a simple subset of the XPath language, without predicates, conditions and other advanced features. For instance, it doesn't allow the use of "/" ("SyntaxError: cannot use absolute path on element"). findall() returns only elements with a tag which are direct children of the current element.

Likewise, findall() doesn't support the "|" symbol to search for different tags, while xpath() does:

for el in root.xpath('.//tag1/*|.//tag2/*'):

print(el.tag, el.text)

A simpler alternative:

for el in root.iter('tag1', 'tag2'):

print(el.tag, el.text)

Note: find/findall/iterfind() methods are recommended over using xpath() because they are faster and support incremental searches, and also simplify namespace usage, ie. only use .xpath() for advanced queries.

Important: If findall() returns nothing although the query looks good, it might be an issue with the namespaces. In that case, either remove all namespaces in the input file, or change the search string.

Important: To check if find() found an element, use: if element.find('...') is not None.

lxml also offers two functions to get sibblings: getprevious()/getnext(). It also provides getparent().

CHECK Important: Checking if find/findall is not empty requires two different ways:

r = root.find('./Document/name')

#if et.iselement(r):
if r is not None:

tracks = root.findall('.//LineString')

if len(tracks):

This shows the top-level element in the tree:

print(root.tag,root.attrib)

This will return all the elements right below the root, not any deeper:

for child in root:

print(child.tag, child.attrib)

To get all the elements in the tree…

for elem in root.iter():

print(elem.tag,elem.attribute,elem.text)

… or only some elements, anywhere in the tree:

for movie in root.iter('movie'):

print(movie.attrib)

Find the first element that matches:

for wpt in root.find("wpt"):

print(wpt.tag,wpt.attrib,wpt.text)

Getting the text of the second attribute in a meta element in the head section:

description = root.xpath('string(//meta[@name="description"]/@content)')

if len(description):

print("Description=",description)

find() and find() supports a sub-set of XPath, which provides a more powerful way to navigate a tree. Here's how to find all waypoints below the root in a GPX file:

for wpt in root.findall("./wpt"):

print(wpt.tag,wpt.attrib,wpt.text)

Searching with XPath

"somenode" or "somenode/somesubnode"
/ : Start searching from the root node (also works with a pointer to the root node); Absolute search
// : Searches the whole tree, regardless of the location of the node that calls xpath
.//: Searches within the descendants below the node that calls xpath
. : Current node
.. : Parent of the current node
* : All elements
@ : Attribute
@*: All the attributes
node(): Any node
element(): Any element node
last() -1: Next-to-last node in sequence
attribute(src): Any attribute node named "src"
count(//property) gt 1: True if more than one property nodes
p/img[@src="icon.png"]: Element that has src set to "icon.png"
div[3]: Third div element (1 is first)
Other functions supported: Strings; Dates, Times, and Durations; Nodes and Documents; Numbers

Note that xpath() returns a list, even if it found only one element:

element = template_tree.xpath('//myelement')

if len(element):

html_tree = lxml.html.fragment_fromstring("<div>blah</div>", parser=lxml.html.HTMLParser())

parent = element[0].getparent()

parent.insert(parent.index(element[0]),html_tree)

parent.remove(element[0])

print(et.dump(template_root))

Examples:

/class/student[1]: Second node from the top of the set
/class/student[last()]: Last node in the set; Other functions include position() and count()
/class/student[@rolllno = 493]: Nodes whose "rollno" attribute is = 493
/class/student[marks>85]: Nodes whose element "marks" is > 85

More infos:

https://en.wikipedia.org/wiki/XPath
DONE Beginning XSLT and XPath: Transforming XML Documents and Data by Ian Williams
XPath and XPointer by John Simpson (2002)
Xpath: Navigating Xml With Xpath 1.0 and 2.0 by Steven Holzner
https://developer.mozilla.org/en-US/docs/Web/XPath
https://www.w3.org/TR/xpath-31/
XPath Expression Testbed (and XPath in the browser.js)

lxml.objectify

"lxml supports an alternative API similar to the Amara bindery or gnosis.xml.objectify through a custom Element implementation. The main idea is to hide the usage of XML behind normal Python objects, sometimes referred to as data-binding. It allows you to use XML as if you were dealing with a normal Python object hierarchy."

https://lxml.de/objectify.html

TO READ

DONE https://docs.python.org/3/library/xml.etree.elementtree.html
DONE https://lxml.de/parsing.html
DONE https://lxml.de/FAQ.html
DONE Beginning XSLT and XPath - Transforming XML Documents and Data (Wrox, 2009).pdf
XPath and XPointer (O'Reilly, 2002) - Simpson.pdf
DONE https://www.datacamp.com/community/tutorials/python-xml-elementtree
DONE https://docs.python.org/3/library/xml.etree.elementtree.html
DONE https://docs.python.org/3/library/xml.etree.elementtree.html#elementtree-xpath
DONE https://www.datacamp.com/tutorial/python-xml-elementtree
MIA http://www.nmt.edu/tcc/help/pubs/pylxml/
DONE https://realpython.com/python-xml-parser/
https://lxml.de/tutorial.html
https://lxml.de/xpathxslt.html
MIA http://effbot.org/zone/element.htm

Help

https://mailman-mail5.webfaction.com/listinfo/lxml Archives https://mailman-mail5.webfaction.com/pipermail/lxml/

DEAD http://blog.gmane.org/gmane.comp.python.lxml.devel

DEAD https://www.google.com/webhp?q=site:comments.gmane.org%2Fgmane.comp.python.lxml.devel+

Q&A

Can I get rid of namespace infos while working with data?

The klugy way to remove namespaces from the source file is to run a regex through the source file, and read the result into the root

A cleaner way is to parse the XML, and then remove all references to the namespace(s):

# Remove namespace prefixes

#Source: https://stackoverflow.com/questions/60486563/

tree = et.parse(INPUTFILE)

root = tree.getroot()

for elem in root.getiterator():

#ValueError: Invalid input tag of type <class 'cython_function_or_method'>

#et.tag = et.QName(elem).localname

# For elements, replace qualified name with localname

if not(type(elem) == et._Comment):

elem.tag = et.QName(elem).localname

# Remove attributes that are in a namespace

for attr in elem.attrib:

if "{" in attr:

elem.attrib.pop(attr)

# Remove unused namespace declarations

et.cleanup_namespaces(root)

How to add text when using append()?

for waypoint in root.findall('gpx:wpt', namespaces=NSMAP):

#How to set text?

waypoint.append( ET.Element("dummy"))

Difference between .iter() and .findall()?

"Element.findall() finds only elements with a tag which are direct children of the current element."

Diff between root.write(ET.tostring()) and tree.write()?

with open("removed.time.gpx", 'wb') as doc:

#Diff with tree.write('output.xml') ?

root.write(ET.tostring(tree, pretty_print = True))

What's the difference between tree and root (parse vs. fromstring)?

parse() returns an ElementTree while fromstring() returns an Element.

https://stackoverflow.com/questions/32620254/python-elementtree-elementtree-vs-root-element

What's the point of getroot()?

Needed with functions that return a whole ElementTree instead of a specific Element (node):

tree = et.parse("input.gpx")
root = tree.getroot()

"fromstring() parses XML from a string directly into an Element, which is the root element of the parsed tree. Other parsing functions may create an ElementTree." (Source)

"The getroot() method is available on xml.etree.ElementTree.ElementTree objects, not xml.etree.ElementTree.Element objects. ET.fromstring() returns the latter type. You already have the root element."

https://stackoverflow.com/questions/32620254/python-elementtree-elementtree-vs-root-element

fastkml

$ pip install fastkml (or "pip install -r requirements.txt" from the base of the source tree; To build KML files, FastKML requires Shapely which requires libgeos).

Note: If the input KML has something it doesn't like, fastkml (or lxml2?) might just complain with "ValueError" without saying which line it didn't like.

from fastkml import kml

k = kml.KML()

with open(kml_file, 'rt', encoding="utf-8") as myfile:

doc=myfile.read()

k.from_string(doc)

#features() returns a generator object that you can iterate over

for f in k.features():

print(f.name)

features = list(k.features())

f2 = list(features[0].features())

print(f2[0].name)

print k.to_string(prettyprint=True)

with open(input) as f:

doc = parser.parse(f)

root = doc.getroot()

folder = root.Document.Folder # parent of Placemark

for pm in folder.getchildren():

print(pm.tag, pm.getparent().tag)

"""

if pm.tag == '{http://www.opengis.net/kml/2.2}Placemark':

keep = False

#zipcode = ''

for sd in pm.ExtendedData.SchemaData.getchildren():

if 'ZCTA5CE10' in sd.values():

if sd.text in zipcodes:

#zipcode = sd.text

keep = True

break

if not keep:

removed += 1

folder.remove(pm)

else:

kept += 1

"""

#doc.write('output.kml', xml_declaration=True, encoding='UTF-8')

More infos

C:\Python38\Lib\site-packages\fastkml\test_main.py
https://github.com/cleder/fastkml/tree/master/examples
collective.geo.fastkml
https://gis.stackexchange.com/questions/tagged/fastkml
https://stackoverflow.com/questions/tagged/python+kml
https://stackoverflow.com/search?q=%5Bpython%5D+%5Bkml%5D+fastkml

pykml

"pyKML is based on the lxml.objectify API which provides a Pythonic API for working with XML documents. pyKML adds additional functionality specific to the KML language. pyKML depends on the lxml Python library, which in turn depends on two C libraries: libxml2 and libxslt. Given this, the first step to installing pyKML is to get lxml running on your system."

The XML parser is used to read an existing KML file (pykml.parser.parse), or write a KML object to a file (lxml.etree.tostring).

"For complete stand alone programs that demonstrate how to use pyKML, check out the pyKML Examples."

"This type of attribute-based access is provided by the lxml packages’s objectify API. pyKML users are encouraged to familiarize themselves with the objectify API documentation on the lxml website, because pyKML inherits this functionality."

"KML documents that you create can be validated against XML Schema documents, which define the rules of which elements are acceptible and what ordering can be used. Both the OGC KML schema and the Google Extension schemas are included with pyKML."

Resources

Install lxml

To check if lxml is installed, run Python, and type "import lxml"

If not: http://lxml.de/installation.html

Install PyKML

pip install pykml

Run Python, and type "import pykml"

Have PyKML create a ready-to-use script

from pykml.factory import write_python_script_for_kml_document

import urllib.request as urllib2 #urllib2 was used in Python 2

from pykml import parser

url = 'http://code.google.com/apis/kml/documentation/kmlfiles/altitudemode_reference.kml'

fileobject = urllib2.urlopen(url)

doc = parser.parse(fileobject).getroot()

script = write_python_script_for_kml_document(doc)

print(script)

Later

from lxml import etree

from pykml import parser

from pykml.factory import KML_ElementMaker as KML

To validate:

from pykml.parser import Schema

schema_ogc = Schema("ogckml22.xsd")

schema_gx = Schema("kml22gx.xsd")

doc = KML.kml(GX.Tour())

#The .validate() method only returns True or False

schema_ogc.validate(doc)

schema_gx.validate(doc)

#More details

schema_ogc.assertValid(doc)

Here's how to read a KML file, and remove an element:

from pykml import parser

from lxml import etree

with open("input.kml") as f:

doc = parser.parse(f)

root = doc.getroot()

folder = root.Document.Folder.Placemark

for pm in folder.getchildren():

#print(pm.tag)

#To prevent PyKML from prepending {http://earth.google.com/kml/2.0},

#use regex to remove namespace: <kml xmlns="http://earth.google.com/kml/2.0">

#if pm.tag=="ExtendedData":

if pm.tag=="{http://earth.google.com/kml/2.0}ExtendedData":

folder.remove(pm)

outfile = open("output.kml","wb")

outfile.write(etree.tostring(doc, pretty_print=True))

Here's how to read coordinates:

import lxml

#import pykml

from pykml import parser

doc=None

with open('dummy.kml') as f:

doc = parser.parse(f).getroot()

for e in doc.Document.Folder.Placemark:

coor = e.LineString.coordinates.text.split(',')

print(coor)

Here's how to read from a file, and copy data into a new file:

from pykml import parser

from lxml import etree

from pykml.factory import KML_ElementMaker as KML

with open('input.kml') as f:

tree = parser.parse(f)

root = tree.getroot()

coords = root.Document.Folder.Placemark.LineString.coordinates

doc = KML.kml(

KML.Placemark(

KML.name("test"),

KML.Style(KML.LineStyle(KML.color("FF0000FF"))),

KML.LineString(

KML.coordinates(coords)

)

outfile = open('output.kml','wb')

outfile.write(etree.tostring(doc, pretty_print=True))

simpleKML

https://simplekml.readthedocs.io/en/latest/

"Unfortunately, simplekml is just a kml generator, it cannot read and manipulate existing kml, only create it. You will have to use an alternative, such as pyKML." (Source)

pip install simplekml

Example:

import simplekml

kml = simplekml.Kml()

kml.document.name = "Test"

kml.save("botanicalgarden.kml")

How to remove id? <Document id="1">

How to read existing KML file, extract needed items (eg. Placemark), edit them, and save everything to a new KML file?

Working with GPX files with gpxpy

https://github.com/tkrajina/gpxpy

http://witkowskibartosz.com/blog/gpx-file-reader.html

https://ocefpaf.github.io/python4oceanographers/blog/2014/08/18/gpx/

pip install gpxpy

Quick code:

import gpxpy

f = open(path_to_gpx_file, 'r')

p = gpxpy.parse(f)

print("{} track(s)".format(len(p.tracks)))

To read from a GPX file:

a = gpx.tracks[0]

b = a.segments[0]

c = b.points[1]

d = [c.longitude, c.latitude, c.elevation, c.time]

To create a new GPX file from scratch:

# Create first track in our GPX:

gpx_track = gpxpy.gpx.GPXTrack()

gpx.tracks.append(gpx_track)

# Create first segment in our GPX track:

gpx_segment = gpxpy.gpx.GPXTrackSegment()

gpx_track.segments.append(gpx_segment)

# Create points:

gpx_segment.points.append(gpxpy.gpx.GPXTrackPoint(2.1234, 5.1234, elevation=1234))

gpx_segment.points.append(gpxpy.gpx.GPXTrackPoint(2.1235, 5.1235, elevation=1235))

gpx_segment.points.append(gpxpy.gpx.GPXTrackPoint(2.1236, 5.1236, elevation=1236))

# You can add routes and waypoints, too...

print 'Created GPX:', gpx.to_xml()

xmltodict

"xmltodict is a Python module that makes working with XML feel like you are working with JSON".

Won't do if you need to add a key, but fine if you just need to read, and possibly change any value.

Notes from John E. Simpson's "XPath and XPointer" (2002)

XPath is used for locating XML content within an XML document; XPointer is the standard for addressing such content, once located.

As support for XPath is integrated into the Document Object Model (DOM), DOM developers may also find XPath a convenient alternative to walking through document trees.

"An XPath" consists of one or more chunks of text, delimited by any of a number of special characters, assembled in any of various formal ways. Each chunk, as well as the assemblage as a whole, is called an XPath expression.

Most XPath expressions, by far, locate a document's contents or portions thereof. These pieces of content are located by way of one or more location steps — discrete units of XPath "meaning" — chained together, usually, into location paths.

An XPath expression can be said to consist of various components: tokens and delimiters. The expression taxcut/* locates all elements that are children of a taxcut element.

XPath is capable of processing four data types: string, numeric, Boolean, and nodes (or node-sets).

Most nodes have names. Three important terms:

qualified name: Almost always contracted to "QName"; Usually the identifier for the node as it actually appears in an instance document, including any namespace prefix. For example, an element whose start tag is <concerto> has a QName of "concerto"; if the start tag were <mml:concerto>, the QName would be "mml:concerto."
local-name: its QName, sans any namespace prefix. If an element's QName is "mml:concerto," its local-name is "concerto." If there's no namespace in effect for a given node, its QName and local-name are identical.
expanded-name: If the node is associated with a particular namespace, its expanded-name is a pair, consisting of the URI associated with that namespace and the local-name. Because the expanded-name doesn't consider the namespace prefix at all, two elements, for example, can have the same expanded-name even if their QNames are different, as long as both their associated namespace URIs (possibly null) and their local-names are identical.

In a location path, the root node is represented by a leading / (forward slash) character.

There's an XPath function, normalize-space() that trims all leading and trailing whitespace from a given element's content.

Editing with XMLStarlet

Read xmlstarlet-ug.pdf

XMLStarlet is an open-source, command-line application that supports testing XPath queries.

Checking the structure: xml el input.xml

Networking

"Pyro is short for PYthon Remote Objects. It is an advanced and powerful Distributed Object Technology system written entirely in Python, that is designed to be very easy to use."

Q&A

How to find the type of a variable/output?

print(type(blah))

UnicodeDecodeError: 'ascii' codec can't decode byte

Python uses Unicode internally, and may need some help when it can't successfully figure out which page code is used to encode a string:

try:

cursor.execute(sql.decode('utf-8'))

except UnicodeDecodeError:

try:

cursor.execute(sql.decode('iso8859-15'))

except UnicodeDecodeError:

cursor.execute(sql.decode('cp1252'))

More information:

Unicode HOWTO
Python Unicode
A Crash Course in Character Encoding By Michael Foord

How to check for errors?

"Pylint analyzes Python source code looking for bugs and signs of poor quality."

How to enhance performance?

Why are strings immutable?

Read that question in a ng. Does it mean a string in Python is read-only?

What's the difference between "import mymodule" and "from mymodule import *"?

The former forces you to prepend the module's name to every membre, eg. mymodule.mymethod(), while the latter imports all the methods into the current namespace, letting you call the methods without the module name. Although easier to use, make sure those new methods don't clash with your current namespace...

Is there a native-code compiler for Windows?

Check out py2exe. Other sources of information are Distributing Python Apps and How can I create a stand-alone binary from a Python script? Also take a look at Psyco.

Py? Pyc? Pyd? Pyo? Pyw?

(From Boudewijn Rempt's book on PyQT): "The translation from Python code to byte-code only happens once: Python saves a compiled version of your code in another file with the extension .pyc, or an optimized compiled version of your code that removes assert statements and line-number tracking in a file with the extension .pyo.

However, that is only done with Python files that are imported from other files: the bootstrap script will be compiled to bytecode every time you run it, but python will create a myapp.pyc from a file myapp.py (which is not shown here)."

IndentationError

"unindent does not match any outer indentation level" : If copy/pasting code from a web page, make sure there are not hidden characters that confuse Python.

Hiding the DOS box when running under Windows?

"Python.exe is used for console mode programs and Pythonw.exe is used for GUI applications that don't need a console window. Python.exe can also be used for GUI programs, but then you get a console window in addition to your GUI window(s)."

How to call a PowerBasic DLL from Python?

http://www.talkaboutprogramming.com/group/alt.lang.powerbasic/messages/7219.html

How to hide the console window when running an EXE generated by py2exe?

Books

Eric Matthes - Python Crash Course_ A Hands-On, Project-Based Introduction to Programming-No Starch Press (2020).pdf
Fluent Python - Ramalho - Final release (O'Reilly Media, 2015, 9781491946008).pdf
Mark Lutz - Programming Python, 4th Edition - 2014.pdf
Programming - Learning Python (5Th Ed)By Mark Lutz.pdf
[2015] Automate the Boring Stuff with Python_ Practical Programming for Total Beginners - Sweigart (No Starch Press).pdf

From VB to Python

vb2py (requires the Python interpreter, PythonCard, Simpleparse, and possibly the mxTextTools libraries)
Python Programming on Win32 by Mark Hammond, Andy Robinson
Python support for Windows
ActivePython, including its mailing lists

Resources

https://python-forum.io
Python Programming
Instant Python
Thinking in Python
DiveIntoPython
Python Tutorial
Python FAQTS
Python Aprocrypha
How to Think Like a Computer Scientist - Learning with Python
Python Editors
Integrated Development Environments
OnLAMP Python Center
Graphical Programming with Python: QT Edition
Transactional Persistence for Python
Review: theKompany Black Adder Python IDE
Power Python: Do More With Less Code
IronPython (Python implementation targeting the .NET and Mono platforms)
Charming Python - Series of articles from IBM, including A review of Python IDEs, Review of Python IDEs, and Preview of the [anygui] project
When Pythons Attack - Common Mistakes of Python Programmers by Mark Lutz, coauthor of Learning Python, 2nd Edition
Python Gotchas
Python Cookbook (ActiveState)
Vaults of Parnassus : Python Resources
ActiveState Programmer Network - Python
PEAK - Python Enterprise Application Kit
Python Is Middleware by Tim Daneliuk
WATSUP - Windows Application Test System Using Python
Driving win32 GUIs with Python, part 1
Artima Articles About Python
Charming Python: Pyrex extends and speeds Python apps by David Mertz
Enhanced Interactive Python with IPython by Jeremy Jones
"Movable Python is a way of building portable, standalone, Python environments. The result is a version of Python that can be used without having to install python on the target machine. [...] This means you can carry around a prebuilt python environment on a USB memory stick - Python on a stick. Come and get your gannet ripple, python on a stick."
Python + .NET = IronPython by by Paul Ferrill
Python in the enterprise by Daniel G. Shafer
Introducing IPython by Conrad Koziol ("an enhanced interactive Python shell with improvements such as history caching, profiles, object information, and session logging, as a replacement for the default interpreter.")
A review of 6 Python IDEs
Python IDE Review
A First Look at IronPython: Where Python meets .NET By Jeff Cogswell
IPython, An enhanced Interactive Python shell
Python for Newbies – Tutorial

Home

Introduction to Python

Introduction

Setup

Installing Python 2 and Python 3 on the same Windows host

Checking script syntax

Concepts

Why import a library twice?

How to find the list of methods/properties a library offers?

How to uninstall a module?

Data Structures

Array

List

List vs. array?

Tuples

Dictionary

Sets

Collections, heapq

Code Snippets

Running an external program

Leaving for loop early

Operators

File I/O

Checking if a directory exists

Writing to a text file

Reading from a text file in one go

Reading from a text file, line by line

Reading for a text file, edit each line, save into new file

Finding if a file is missing from a directory

Append stuff to a text file

Checking that a file exists

Checking the size of a file

Displaying the last modified date of a file

Reading a value from a key in a section of an INI file

Reading all the key/value items in a section in an INI file

Writing data to an INI file

Setting the current directory

Looping through each file in a directory

Reading information from MS Word files

Using SQLite as file-based database

Python3

Python2

Using regular expressions

Driving a web browser

Selenium

webbrowser module

Connecting to a web server

Playing with date/time

Lists

Tuples

Dictionaries

Commenting a block of text

Exiting a script

Handling a long line of code

Sending an e-mail

Tips from the Python Tutorial

Calling a non-COM DLL from Python

Calling a COM DLL from Python

Shortcut to the last result

Long lines

Strings

Lists

Variable number of function parameters

Importing modules

Modules

Compiling

Pyco

Psyco

setuptools

py2exe

McMillan Installer

Freeze

cx_Freeze

SQFreeze

Pyrex

PyPy

PyInline

Py2Cmod

Weave

pyPack