Introduction to Python
|
|
Introduction
Python is an excellent, cross-platform, object-oriented interpreted language.
Besides ease of use, its main characteristic is that it enforces indentation
(don't indent, and the program won't run.)
As of Septembre 2004, there are weaknesses to be aware of if you intend to
use Python to write GUI apps for Windows, though:
- Python is originally a command-line, text-mode scripting language, so
requires some add-on to build GUI apps. There is no integrated way to build
Windows applications, so you must use and distribute a toolkit (most people
use wxPython, which is a wrapper around the famous, open-source, cross-platform
wxWidgets toolkit)
- Unless you're ready to shell out $ to buy VS.Net ($?) and ActiveState's
Visual Python plug-in ($300), you won't find an affordable IDE on par with
VB or Delphi, which means you'll have to settle for GUI builders and IDE's
that feel somewhat clunky (developing something like VS.Net is quite a lot
of work)
- Performance can be a problem for heavy-duty programs, in which case
you might have to delegate the computing-intensive parts of your program
to a DLL compiled into native code
Setup
At least three distributions of Python are currently available for the Windows platform (PythonWare
used to be yet another package, but it's been deprecated):
If you only need a basic distribution, try out Tiny
Python.
The "import" statement looks for module files in the directories
specified in the $PYTHONPATH environment variable. If the named module isn't
found in these directories, it returns an error. The first time Python imports
a module, it automatically compiles the module as saves it as bytecode; this
bytecode file has the same name as the module file, but ends in a .pyc extension.
These .pyc files are automatically recompiled if the module changes in any way.
"On Windows, you can also use extension
.pyw and interpreter program
pythonw.exe instead of
.py and
python.exe. The
w
variants run Python without a text-mode console, and thus without standard input
and output. These variants are appropriate for scripts that rely on GUIs. You
normally use them only when the script is fully debugged, to keep standard
output and error available for information, warnings, and error messages during
development.
"
If you are using UltraEdit as your favorite editor, here's
the section to add in UE's wordfile.txt to handle Python documents.
Installing Python 2 and Python 3 on the same Windows host
C:\Python27\python.exe
C:\Users\fred\AppData\Local\Programs\Python\Python37-32\python.exe
Checking script syntax
Concepts
module
library
package
namespace
Wheel, .whl, "allows for binary redistribution of libraries"
pip
Why import a library twice?
eg.
import mylib
import mylib.lib
How to find the list of methods/properties a library offers?
- help() and dir() are OK, they don't show the whole picture.
- vars()
- The inspect
module
- The pydoc
module (python -m pydoc mymodule)
- The ast
module ("Abstract Syntax Trees")
How to uninstall a module?
c:\>pip list
c:\>pip uninstall somemodule
Data Structures
Array
All entries must be of the same data type.
import array as arr
a = arr.array("I",[3,6,9])
List
Collection of heterogeneous items. Mutable.
x = []
x1 = [1,'apple',3]
print(x1[1])
List vs. array?
"With arrays, you can perform an operations on all its item individually
easily, which may not be the case with lists", eg.
array_char.tostring()
"NumPy arrays are very heavily used in the data science world to work
with multidimensional arrays. They are more efficient than the array module
and Python lists in general."
Tuples
"Tuples are another standard sequence data type. The difference between
tuples and list is that tuples are immutable, which means once defined you cannot
delete, add or edit any values inside it."
Tuples are enclosed in parentheses.
x_tuple = (1,2,3,4,5)
y_tuple = ('c','a','k','e')
x_tuple[0]
Dictionary
"Dictionaries are made up of key-value pairs. key is used to identify
the item and the value holds as the name suggests, the value of the item."
Dictionaries are built with curly brackets.
x_dict = {'Edward':1, 'Jorge':2, 'Prem':3, 'Joe':4}
del x_dict['Joe']
x_dict
{'Edward': 1, 'Jorge': 2, 'Prem': 3}
x_dict['Edward'] # Prints the value stored with the key 'Edward'.
Sets
'Sets are a collection of distinct (unique) objects. These are useful to
create lists that only hold unique values in the dataset. It is an unordered
collection but a mutable one, this is very helpful when going through a huge
dataset.'
y_set = set('COOKIE')
print(y_set) # Single unique 'o'
{'I', 'O', 'E', 'C', 'K'}
Collections, heapq
Those are additional data structures.
Code Snippets
Running an external program
Leaving for loop early
"break" or "continue"
Operators
Watch out when using shortcuts like += on large strings, as they seem to
be much slowed than the more lengthy "mystring = mstring + something".
File I/O
Checking if a directory exists
Either...
- import os
- try:
- os.mkdir("./mydir")
- except:
- pass
... or
- import os
- if not os.path.isdir("./mydir"):
- os.mkdir("./mydir")
Writing to a text file
- log = open('test.txt','w')
- log.write("Some string")
- log.close()
Caution: Under Windows, \r\n turns into 0D0D0A. To get the expected
0D0A, just use \n .
Important: Although Python3 uses Unicode, it happily writes data in Latin1
under Windows unless told otherwise:
- stuff = "Crème"
- with open("cp1252.txt", 'w') as outFile:
- outFile.write(stuff)
- with open("utf8.txt", mode='w',encoding='utf-8') as outFile:
- outFile.write(stuff)
Reading from a text file in one go
- f = open("c:/test.txt", "r")
- data = f.read()
- print data
- f.close()
Reading from a text file, line by line
- f = open("c:/test.txt", "r")
- textlines = f.readlines()
- for line in textlines:
- print line
- f.close()
Reading for a text file, edit each line, save into new file
- import re,sys
-
- MAGIC = 10
-
- f = open("C:\\input.txt", "r")
- textlines = f.readlines()
- f.close()
-
- #rewrite lines to new file
- log = open('output.txt','w')
-
- #search for pattern using regex
- p = re.compile('^\{(.+?)\}')
- for line in textlines:
- m = p.search(line)
- nugget = int(m.group(1))
- nugget += MAGIC
-
- #update line
- start = str(start)
- end = str(end)
- new = "{%s){%s}"
% (start,end)
- line = p.sub(new,line)
- #print line
- #adds extra newline
:-/
- #print>>log, line
- log.write("%s"
% line)
- log.close()
Finding if a file is missing from a directory
We'll read a list of files from a text file, and then check if the file exists:
- import os.path
-
- PATH="C:\\MYDIR\\"
-
- f = open(PATH + "files.txt", "r")
- textlines = f.readlines()
- for line in textlines:
- line = line.strip()
- if not os.path.isfile(PATH
+ line):
- print
"%s NOT FOUND" % line
- f.close()
Append stuff to a text file
A first way is to open a file in "a" mode:
- f = open("c:/test.txt", "a")
- f.write("This is an appended line.\r\n")
- f.close()
Another way:
- import glob
-
- f = open("stuff.to.add.txt", "r")
- template = "\n\n" + f.read()
- f.close()
-
- for frm in glob.glob('*.txt'):
- f = open(frm, "r+")
- content = f.read()
- if 'my pattern' not
in content:
- f.seek(0,2)
- f.write(template)
- f.close()
Checking that a file exists
Either...
- import os
-
- if os.path.exists(file):
- return 1
- else:
- return 0
... or
- import os
-
- def exists(file):
- return os.access(file, os.F_OK)
Checking the size of a file
- import os
-
- print os.stat(file)[ST_SIZE]
Displaying the last modified date of a file
os.stat() returns the date a file was last modified in epoch, ie. the origin
of times being the number of seconds since January 1st 1970. To turn an epoch
into eg. YYYY-MM-DD:
- filetime = os.stat('myfile.txt')[ST_MTIME]
-
- #turns epoch into tuple such as (2004, 8, 13, 2, 35, 2, 4, 226, 0)
- filetime = time.gmtime(filetime)
-
- #turns tuple into formatted string
- print time.strftime("%Y-%m-%d",filetime)
Reading a value from a key in a section of an INI file
- import ConfigParser
-
- p = ConfigParser.ConfigParser()
- p.readfp (open('index.ini'))
- try:
- print p.get('files',file)
- except:
- print "section 'files' not found"
- else:
- print "ok"
Reading all the key/value items in a section in an INI file
- import ConfigParser
-
- p = ConfigParser.ConfigParser()
- p.readfp (open('index.ini'))
- for item in p.items('files'):
- print("key = " + item[0] + "
value = " + item[1])
Writing data to an INI file
Oddly enough, the ConfigParser doesn't have a write() method, so you need
to read the INI file, make the changes in memory, open the file in write mode,
and write to it:
- def writeini(file,size):
- p = ConfigParser.ConfigParser()
- p.read('index.ini')
- p.set('files', file, size)
-
- fp = open('index.ini','w')
- p.write(fp)
- fp.close()
-
- writeini("mykey","myvalue")
Setting the current directory
- import os
-
- os.chdir('./mydir')
Looping through each file in a directory
- import glob
-
- for file in glob.glob('*.htm*'):
Note: On the Windows platform, glob() mixes forward- and backslashes,
while open() doesn't allow backslashes altogether ("IOError: [Errno 2]
No such file or directory: '.\\mydir\myfile.txt' ".)
Reading information from MS Word files
- import win32com.client
- app = win32com.client.Dispatch('Word.Application')
- doc = app.Documents.Add('c:\\stuff.doc')
- for rev in doc.Revisions:
- print rev.Author
Using SQLite as file-based database
Python3
When reading data from SQLite3 (which is saved in UTF-8/16), and saving them
into a plain text file, Python3 uses the locale as default, eg. cp1252. To save
data as UTF-8, make sure you use the following switch:
- import sqlite3
-
- con = sqlite3.connect('input.sqlite')
-
- con.row_factory = sqlite3.Row
- cur = con.cursor()
- cur.execute("SELECT name FROM table1");
- results = cur.fetchall()
- output = open("output.txt", "w", encoding='UTF-8')
- for row in results:
- NUMBER=int(row["NUMBER"])
- output.write(NUMBER)
-
- output.close()
- conn.close()
Python2
Several wrappers
are available to access SQLite from Python, but two stand out: "pysqlite implements Python's DBAPI and was integrated into Python
[2.5]. There is
another wrapper, APSW
("Another Python SQLite Wrapper"), which is thinner and closer to SQLite's C API."
Note:
- "If you choose pysqlite be aware that it will start transactions behind your
back and can produce the wrong error message if you start your own
transactions.
- APSW works with Python 2.5, is usually a bit faster than pysqlite and is
more frequently updated - it's not tied to the DBAPI."
- Both pysqlite and APSW are hosted at http://initd.org/tracker/pysqlite
(or http://pysqlite.org)
Installing APSW: Just run the EXE that matches your version of Python, eg.
apsw-3.3.13-r1.win32-py2.5.exe
Here's how to display information:
- import os, sys, time
- import apsw
-
- print "Using APSW file",apsw.__file__
- print "APSW version",apsw.apswversion()
- print "SQLite version",apsw.sqlitelibversion()
Here's how to play with SQLite:
- if os.path.exists("dbfile"):
- os.remove("dbfile")
-
- connection=apsw.Connection("dbfile")
- cursor=connection.cursor()
cursor.execute("begin")
- cursor.execute("create table foo(x,y,z)")
- cursor.execute("insert into foo values(1,2,3)")
- cursor.execute("insert into foo values(4, 'five', 6.0)")
- cursor.execute("commit")
-
- for row in cursor.execute("select * from foo"):
- print row
-
- for m,n,o in cursor.execute("select x,y,z from foo"):
- print m,n,o
-
- connection.close(True)
Another example of using APSW (reading a tab-delimited text file to insert
books into SQLite)
- import re, apsw
-
- connection=apsw.Connection("books.sqlite")
- cursor=connection.cursor()
-
- sql = "CREATE TABLE IF NOT EXISTS books (id INTEGER PRIMARY KEY,
isbn VARCHAR, box VARCHAR, title VARCHAR)"
- cursor.execute(sql)
-
- f = open("books.tsv", "r")
- textlines = f.readlines()
- f.close()
-
- #Extract ISBN + box
- p = re.compile('^(.+)\t(\d+)$')
- for line in textlines:
- m = p.search(line)
- if m:
- isbn
= m.group(1)
- box
= m.group(2)
-
- sql
= "SELECT COUNT(isbn) FROM books WHERE isbn='%s'" % isbn
- cursor.execute(sql)
- for
row in cursor.execute(sql):
- #Record
not found -> Insert
- if
not row[0]:
- print
"No record found for ISBN " + isbn
- cursor.execute("INSERT
INTO books (id,isbn,box) VALUES (NULL,?,?)", (isbn,box))
-
- connection.close(True)
Here's how to perform an INSERT and display the values for each column:
- cursor.execute("INSERT INTO person (name, address, tel, web, email)
VALUES (:name, :address, :tel, :web, :email)", locals())
Here's how to safely update/insert data and display the resulting query:
- sql = 'UPDATE companies SET name=?,address=?,zip=? WHERE id=?;'
- try:
- cursor.execute(sql,
(name,address,zip,id) )
- except:
- print "Failed UPDATING"
- raise
Using regular expressions
Here's how to loop through a list of web pages, and check whether a given
pattern is found therein:
- import sys
- import urllib
- import re
-
- for i in range(1,10):
- f = urllib.urlopen("http://www.acme.com/index.asp?page=%s"
% i)
- #re.I = ignore case
- if re.search('stringtofind',f.read(),
re.I):
- print
"Found in %s" % i
Another way to do this:
- p = re.compile('stringtofind')
- if p.search(f.read()):
- print "Found"
- else:
- print "Not found"
Here's how to compile a regex, find a pattern, and save it to a file:
- p = re.compile('(<some>.+</some>)',re.DOTALL)
-
- m = p.search(inputdata)
- if m:
- inputdata = m.group(0)
- else:
- print("Pattern
not found")
- sys.exit()
-
- inputdata = inputdata.replace('<other>','<yet>')
-
- with open(output, 'w') as outputfile:
- outputfile.write(inputdat)
Here's how to load a web page, isolate a section, and display it (Note: you
cannot call f.read() twice, hence the copying of the page into the 'page' variable):
- log = open('found.txt','w')
- for i in range(1,10):
- f = urllib.urlopen("http://www.acme.com/index.asp?page=%s"
% i)
- print "Checking
page %i" % i
- page = f.read()
- if re.search('some text',page,
re.I):
- m=re.search('<span
class=subject>"(.+?)"</span>',page,re.I)
- if
m:
- log.write("Found
in %s\n" % i)
- log.flush()
- log.close()
Here's how to read an HTML file, and display the string between the TITLE
tags, in any:
- import re
-
- f = open('myfile.html', "r")
- inputfile = f.read()
- f.close()
-
- m = re.search('<title>(.*?)</title>',inputfile,re.I)
- if m:
- print m.group(1)
... or if you need to extract more than one set of items:
- p = re.compile('blabla (.+?) blabla (.+?)')
- packed = p.findall(inputfile)
- if packed:
- for x in packed:
- print "Item 1 =
" + x[0] + " Item 2 = " + x[1]"
If you need to call a regex a great number of times, you can increase performance
by compiling the search pattern:
- p = re.compile('[0-9]+')
- m = p.search('tempo999')
- print m.group(0)
To replace an item with another item, use re.sub():
- print re.sub('john','jane','john doe')
Note that re.sub() is very much slower than using a string's replace() method:
- stuff = stuff.replace('_',' ')
Also, the string with which to replace the pattern to search for must have
its backslashes escaped prior to calling re.sub(), using the r prefix to indiquate
a raw string (ie. with its backslashes treated as regular characters):
- toreplace = r"\\"
- body = "#"
- print re.sub("#",toreplace,body)
If you wish to tell the re module to treat the replace pattern as is even
when it contains backslashes, add a call to its escape() function:
- toreplace = re.escape(r"\\")
- body = "#"
- print re.sub("#",toreplace,body)
Here's how to rewrite a phone number:
- #!/usr/bin/python
-
- import sys,re
-
- #Turn 0123456789 into 01.23.45.67.89
- p = re.compile(r'(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)')
- phone = p.sub(r'\1.\2.\3.\4.\5',sys.argv[1])
-
- print phone
Important: By default, the regex library can't handle even European characters,
so you must set a locale, and add the re.LOCALE switch:
- import locale
-
- #BAD : Let Python handle it locale.setlocale(locale.LC_ALL, 'FR')
locale.setlocale(locale.LC_ALL,
'') - mypattern = re.compile("(\d+)\s+(\w+)\s+(\d+)",re.LOCALE)
More infos on using regexes in Python:
Driving a web browser
As of April 2021, there are at least two modules to manage a web browser
through a Python script: the webbrowser module, and the Selenium module. mechanize
might be too basic.
Selenium
https://towardsdatascience.com/controlling-the-web-with-python-6fceb22c5f08
- #pip3 install -U selenium
- #pip3 install webdriver-manager
- from selenium import webdriver
- from selenium.webdriver.chrome.options import Options
- from selenium.webdriver.chrome.service import Service
- from webdriver_manager.chrome import ChromeDriverManager
-
- options = Options()
- options.add_argument("start-maximized")
- driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),
options=options)
- driver.get("https://www.google.com")
webbrowser module
https://devtut.github.io/python/webbrowser-module.html
Connecting to a web server
Here's how to use urllib to POST to a script:
- import urllib
-
- url = "http://www.acme.com"
- data = {'myfield': somevalue}
- urldata = urllib.urlencode(data)
- results = urllib.urlopen(url, urldata).read()
- print results
Here is an example session that uses the 'GET' method to retrieve a URL containing
parameters:
- import urllib
- params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
- f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s"
% params)
- print f.read()
The following example uses the 'POST' method instead:
- import urllib
- params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
- f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query ",
params)
- print f.read()
The following example uses an explicitly specified HTTP proxy, overriding
environment settings:
- import urllib
- proxies = {'http': 'http://proxy.example.com:8080/'}
- opener = urllib.FancyURLopener(proxies)
- f = opener.open("http://www.python.org")
- f.read()
The following example uses no proxies at all, overriding environment settings:
- import urllib
- opener = urllib.FancyURLopener({})
- f = opener.open("http://www.python.org/")
- f.read ()
Here's how to use Libcurl to POST to a script:
- Install Python and Libcurl (eg. libcurl-7.16.2-win32-ssl-sspi.zip)
- Install PyCurl
- Use this script:
Here's how to log on to a web server through POST with support for cookies:
urllib vs urllib2 vs httplib
cookielib vs. ClientCookie http://www.voidspace.org.uk/python/articles/cookielib.shtml
Playing with date/time
Here's how to display the current date and time:
- import time
- import locale
-
- #displays '08/20/04 22:05:15'
- print time.strftime('%c')
-
- #displays 'French_France.1252'
- print locale.setlocale(locale.LC_ALL,'')
-
- #displays '20/08/2004 22:05:15'
- print time.strftime('%c')
The time value as returned by gmtime(), localtime(), and strptime(), and
accepted by asctime(), mktime() and strftime(), is a sequence of 9 integers.
The return values of gmtime(), localtime(), and strptime() also offer attribute
names for individual fields.
Lists
Tuples
Dictionaries
Printing the content of each key:
- for i in stuff.keys():
- print i + "=" + stuff[i]
Commenting a block of text
- """
- This is one
- block of text
- """
ie. three double-quotes in a row.
Exiting a script
- import sys
-
- sys.exit()
Handling a long line of code
To break a long line of code:
- if (somevar) or \
- (someothervar):
Sending an e-mail
Here's how to send an e-mail through code, passing one parameter to the script:
- #!/usr/bin/python
-
- from email.MIMEText import MIMEText
- import smtplib,sys
-
- body='''this text will become the body of the message
- Using triple-quotes you can span it easily over multiple lines.
- the result of an action'''
-
- msg = MIMEText(body)
- From = "me@acme.com"
- To = "you@acme.com"
- msg['From'] = From
- msg['To'] = To
- msg['Subject'] = "Call from " + sys.argv[1]
-
- server = smtplib.SMTP("smtp.isp.net")
- server.sendmail(From,[To],msg.as_string())
- server.quit
Tips from the Python Tutorial
Calling a non-COM DLL from Python
Calling a COM DLL from Python
Shortcut to the last result
In interactive mode, the last printed expression is assigned to the variable
_. This means that when you are using Python as a desk calculator, it is somewhat
easier to continue calculations, for example:
- >>> price * tax
- 12.5625
- >>> price + _
- 113.0625
Long lines
If a statement or string is too long to fit on a line,use the backslash:
- Note that whitespace at the beginning of the line is\
- significant."
You can also use """ or ''' :
- print """
- Usage: thingy [OPTIONS]
- -h Display this usage message
- -H hostname Hostname to connect to
- """
Strings
Unlike a C string, Python strings cannot be changed. Assigning to an indexed
position in the string results in an error.
Lists
Unlike strings, which are immutable, it is possible to change individual
elements of a list:
- a = [’spam’, ’eggs’, 100, 1234]
- a[2] = a[2] + 23
- a
- [’spam’, ’eggs’, 123, 1234]
Variable number of function parameters
When a final formal parameter of the form **name is present, it receives
a dictionary containing all keyword argu-ments whose keyword doesn’t correspond
to a formal parameter. This may be combined with a formal parameter of the form
*name (described in the next subsection) which receives a tuple containing the
positional arguments beyond the formal parameter list. (*name must occur before
**name.) For example, if we define a function like this:
- def cheeseshop(kind, *arguments, **keywords):
Importing modules
There is even a variant to import all names that a module defines:
- from fibo import *
This imports all names except those beginning with an underscore (_).
Modules
The built-in function dir() is used to find out which names a module defines.
It returns a sorted list of strings. Without arguments, dir() lists the names
[ie. variables and functions] you have defined currently.
page 42
Compiling
An easy and satisfactory way to distribute your Python script on a Windows
host is to compile it with Py2exe
(which analyses your script, and tries to extract all the required modules into
a ZIP file), and combine the different files using either a standard installer
like InnoSetup or NSIS,
or combine all the files into the main EXE generated by py2exe using PE
Bundle which will extract those extra files at runtime transparently:
- Install py2exe, and write a setup script (call it setup.py):
from
distutils.core import setup
import py2exe
setup(console=["myapp.py"])
- Open a DOS box, and run the following: python setup.py py2exe
- A directory named ./dist is create by py2exe, and contains all the files
that are required to run your script on a bare Windows host. You can remove
the ./build directory (temp stuff)
- Combine those few files into a single EXE using either your favorite
installer, or PE Bundle
- More
information available on py2exe
An alternative to py2exe is PyInstaller:
"PyInstaller is a program that converts (packages) Python programs into
stand-alone executables, under Windows, Linux and Irix. [...] PyInstaller is
an effort to rescue, maintain and further develop Gordon McMillan's Python Installer
(now PyInstaller). Their official website is not longer available and the original
package is not longer maintained. Believing that it is still far superior to
py2exe, we have setup this site to continue its further development."
First, read the following to understand the issue of compiling and/or distributing
Python scripts:
- Distributing Python
Modules and Installing
Python Modules by Greg Ward
- Python
Installers
- Can
Python be compiled to machine code, C or some other language?
- [Nick Patavalis in comp.lang.python] "Python needs drastic performance
improvement if it is to scrap-off the "scripting language" stigma.
The only way to get these improvements is making it possible for a python
implementation to produce *efficient* *compiled* code. At the same time
the dynamic-typing nature of the language is one of its most valuable characteristics.
And this is one of the hardest problems when trying to write a decent python
compiler.
If you define a function like:
def sum (a, b):
return
a + b
How can the compiler know what code to produce? It could trace
all the applications of sum(), and decide what types of arguments sum()
is actually applied on. But this is not easy, and sometimes it is straight-out
impossible."
- "A possible way to reduce the time required to load the Python interpreter
on a Windows box, is to load the DLL during boot time since Windows have a list
of "KnownDLLs" that it pre loads at startup time and they stay in
memory all the time: Try adding the python23.dll to:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\KnownDLLs
- Compiling Python
Code by Fredrik Lundh
- "py2exe will make you a windows native exe to run. and for updates
you just need to update it's library.zip. use NSIS for the installer, it
can do anything you want an it's free.
One tip: make sure your three
top-level scripts are as empty as possible: just import some other
module and call a function. All your active code is then in some library.zip
shared between the three, and you need never change alice.exe, bob.exe,
and carol.exe
To elaborate, this approach has worked well for me
when I've needed to release multiple .exe's together:
First make
a "dispatching" module like this (call it run.py)
if __name__
== '__main__':
app = sys.argv[0].lower()
if
app.find('alice') != -1:
import alice
alice.main()
elif
app.find('bob') != -1:
import bob
bob.main()
etc...
Next,
use py2exe to create an executable for run.py, and then copy run.exe to
alice.exe, bob.exe, and carol.exe (the resulting .exe's will be very small).
The reasoning behind this is that this way only one copy of all the DLLs,
Python std library, and so forth need to be included in your release. "
Pyco
- http://www.pythonapocrypha.com/projects/pyco/
- "Pyco (pronounced like pico) is a program that creates standalone
(single-file) Windows executables from your Python programs, with an emphasis
on the finished executable being as small as possible. It does not require
you to have a C compiler."
Psyco
- Python JIT(Just-In-Time)-like compiler to accelerate Python programs
on 386-compatible processors (x2 to x100 speed up !).
- http://psyco.sourceforge.net/
setuptools
"setuptools
() is a collection of enhancements to distutils which let you build .egg files.
Once you start using egg files you can include dependencies between package
versions and if your product requires a bunch of other packages the installation
step will download and install the appropriate versions.
See http://peak.telecommunity.com/DevCenter/EasyInstall for
instructions on installing packages built in this way, but in short, the user
has to run ez_setup.py from the EasyInstall page, and then a command like:
easy_install http://example.com/path/to/MyPackage-1.2.3.tgz
would download and install your package and all the other products it depends
on. If at a later stage they want to upgrade to a more recent version then all
they need to do is to run:
easy_install --upgrade MyPackage
Installed eggs usually exist in a single file (importable zip) which makes
uninstalling especially easy: just one file to delete."
py2exe
py2exe is a Python distutils
extension which converts python scripts into executable windows programs, able
to run without requiring a python installation.
- Install py2exe
- Create a script
- Run the script including the -w (Windows) option to hide the DOS box
that Python opens even when running a GUI application
- Distribute the resulting .EXE and its dependent DLLs, or generate an
installer
Note that even a no-thrill window developed with the wxPython toolkit with
just a tiny menu bar that displays a dialog box, turns into a 300KB EXE, and
requires 4 binaries for a total of 2.5Meg (and that's after compressing the
four dependencies with UPX).
For information, internally, Python source code is always translated into a "virtual
machine code" or "byte code" representation before it is interpreted
(by the "Python virtual machine" or "bytecode interpreter").
In order to avoid the overhead of parsing and translating modules that
rarely change over and over again, this byte code is written on a file whose
name ends in ".pyc" whenever a module is parsed (from a file whose
name ends in ".py").
When the corresponding .py file is changed, it
is parsed and translated again and the .pyc file is rewritten. There is no performance
difference once the .pyc file has been loaded (the bytecode read from the .pyc
file is exactly the same as the bytecode created by direct translation). The
only difference is that loading code from a .pyc file is faster than parsing
and translating a .py file, so the presence of precompiled .pyc files will generally
improve start-up time of Python scripts.
If desired, the Lib/compileall.py
module/script can be used to force creation of valid .pyc files for a given
set of modules. Note that the main script executed by Python, even if its filename
ends in .py, is not compiled to a .pyc file. It is compiled to bytecode,
but the bytecode is not saved to a file.
McMillan Installer
- Similar to py2exe, but all the exe/dll/pyd files are grouped into a
single EXE file.
- http://www.mcmillan-inc.com/
: (Posted by PaulBaranowski onTuesday, April 27) "Unfortunately, Gordon
has come under some hard financial times and his web site is not up at the
moment. So I have offered to host the program here!"
Freeze
If you are looking for a way to translate
Python programs in order to distribute them in binary form, without the need
to distribute the interpreter and library as well, have a look at the freeze.py
script in the Tools/freeze directory [find it in the ActivePython distribution;
Guess this refers to the standard Python distro.)
This creates a single binary file incorporating
your program, the Python interpreter, and those parts of the Python library
that are needed by your program. Of course, the resulting binary will
only run on the same type of platform as that used to create it.
"There is a tool called freeze that is included with Python that does
this. I havn't done it on Windows yet and I have heard that there are
some tricks or potential problems with it. Check the back-postings at
dejanews for details. Basically it scans you code for all imported modules
and builds a C module that has all the compiled python modules encoded within
it. Then you compile and linke this file against the Python library and
you end up with an executable that along with any binary extension modules you
may need will be a distributable version of your program."
cx_Freeze
"The Freeze utility that comes with Python itself requires a source
distribution, a C compiler and linker which makes for a complex environment
for creating executables. In addition, this method is very slow for creating
executables as compared to the other methods. py2exe is intended for development
on Windows only and cx_Freeze
is intended for cross platform development. Installer uses an import hook which
means that the development environment and runtime environment are considerably
different."
SQFreeze
- "Based upon Fredrik Lundh's famous squeeze module, I added some
code to create small stand-alone .exe's which can be used like "Python-applets".
It is currently known as SQFREEZE.
The system dll files are still required (so your users must stillhave Python
installed), but the exe file can run from alone an doesn't need to unpack
any modules."
- "Latest news (07-Feb-99)" ?
Pyrex
- Pyrex lets you write code that mixes Python and C data types any way
you want, and compiles it into a C extension for Python
- http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
PyPy
"The PyPy project aims at producing
a simple runtime-system for the Python language. We want to express the basic
abstractions within the Python Language itself. We later want to have a minimal
core which is not written in Python and doesn't need CPython anymore. We want
to take care that technologies such as PSYCO and Stackless will easily integrate."
PyInline
- The PyInline module allows you to put source code from other programming
languages directly "inline" in a Python script or module
- http://pyinline.sourceforge.net/
Py2Cmod
- Designed to aid in the conversion of Python modules to C extension modules
while keeping the same interface into the module.
- http://sourceforge.net/projects/py2cmod/
Weave
- The weave package allows the inclusion of C/C++ within Python code
- http://www.scipy.org/documentation/weave/
pyPack
- UPX/PKLite-like text-only compression of Python programs for compact
and easy distribution over identation-breaking medias (like web forums).
- http://sebsauvage.net/python/
SWIG
"SWIG is a software development tool
that connects programs written in C and C++ with a variety of high-level programming
languages. SWIG is primarily used with common scripting languages such as Perl,
Python, Tcl/Tk, and Ruby, however the list of supported languages also includes
non-scripting languages such as Java, OCAML and C#."
distutils
PyChecker
"PyChecker is a tool
for finding bugs in python source code. It finds problems that are typically
caught by a compiler for less dynamic languages, like C and C++. It is similar
to lint."
IDEs
As of 2018, I tried a few of them (IDLE, MS Visual Studio, PyCharm, Wing,
Eric), and found PyScripter to be the simplest to install and use.
PyScripter
Main page; Support
To make the IDE actually… readable, choose View > Styles > Windows
10
To set a keyboard shortcut to toggle comments (Source Code > Toggle Comment), use Tools > Options >
IDE Shortcuts; I used CTRL+SHIFT+B (B as "block"), since it was available.
Tools > Configure Tools: To use the current script work directory as the
Working directory, use "$[ActiveDoc-Dir]"
Instead of the internal Python interpreter, use the external interpreter,
so you can easily kill a rogue application if need be.
Q&A
What causes "Remote Interpreter Reinitialized"?'
How to change encoding, so that UTF-8 strings are displayed correctly (eg.
"é" "é")?
How to remove "greyed out" lines when script stopped running due
to error? Can't read code. Nothing in "Run".
How to hide black vertical line in middle of editing window?
Tools > Options > Editor Options : Edge column = 0
How to add items from Tools > Tools into user toolbar?
1. Right-click on User Toolbar
2. Customize
3. Commands tab
4. External Tools
5. Select item, and drag 'n drop it to the User Toolbar.
Others
- IDLE, "Python's very own minimal IDE"; It's part of Python,
so just open a terminal and type "idle.bat"; Failed running a
minimal .py script. Next
- MS Visual Studio Code + Python extension; Too slow to start running
Python script. Next
- MS Visual Studio Community IDE + PTVS: Canceled install after an hour.
Next.
- PyCharm: 190MB download, 500MB install. Next.
- Wing IDE: Slow to start; Failed with "UnicodeEncodeError: 'charmap'
codec can't encode characters in position 52-53: character maps to <undefined>".
Next.
- Komodo IDE: Commercial. Next.
- Eric (6-18.08): If running 32-bit Windows, you might have to downgrade
using "pip install PyQt5==5.10.1 QScintilla==2.10.4" before running
"python install.py". Next.
- Thonny:
- Eclipse + PyDev:
- Sublime Text:
- Atom:
- Spyder:
OLD If you prefer to use an IDE instead of a basic text editor, here are the
choices I would recommend:
- If you prefer to use a tool that uses the Windows' native set of widgets,
choose ActiveState Komodo, the ActiveState Python plug-in for
VS.Net, or PyDev
- For wxWidgets, use SPE or Boa
- QT, check out BlackAdder or eric3
- The FOX widget set, Arachno, but as of August 2004, only the
version that supports Ruby is yet available
- I hate Tkinter :-)
Bigger list here
PyDev
http://pydev.sourceforge.net
PyPe
DrPython
Programmer Studio
ActiveState
Komodo
Visual Python
BlackAdder
- Commercial IDE from the KDE consertium to build Python GUI apps using
the PyQT wrapper to the QT set of widgets
- Includes the QT Designer, but requires that you install Python and PyQT
before installing BlackAdder
- http://www.thekompany.com/products/blackadder/
- Not as polished as VB (not told to install Python and PyQT beforehand,
although the FAQ seemed to say it was included in BlackAdder "Do I
have to use the versions of PyQt and Qt embedded in BlackAdder?"; The
IDE used some non-standard font in 14 points, etc.)
- Don't know how integrated the BlackAdder IDE and QT Designer are
The eric3 Python IDE
- http://www.die-offenbachs.de/detlev/eric3.html
- Free; Written in PyQt using the QScintilla editor widget ("In order
to use eric3 under Win... operating systems you need a commercial or educational
license of Qt 3.x from Trolltech and the corresponding license of PyQt (which
includes QScintilla) from Riverbank.")
- "eric is a GPL'ed IDE for Python written using PyQt. eric used
to be bundled with PyQt but is now available separately"
- Programming
Tools: eric3
SPE - Stani's Python Editor
- http://spe.pycs.net/
- "Spe ships with wxGlade (gui designer), PyChecker (source code
doctor) and Kiki (regular expression console). Spe is extensible with wxGlade."
- As of Aug 2004, still some bugs to iron out, but is promising
Boa Constructor
- http://boa-constructor.sourceforge.net/
- wxWidgets-based IDE + GUI designer
- While the latest official version is 0.2.3, 0.2.8 can be downloaded
here
(unzip under drive:\Python2x\Lib\site-packages\wxPython\tools\)
- Launched with c:\python22\python c:\Python22\Lib\site-packages\wxPython\tools\boa\Boa.py
if you are using the Python from Python.org. If using ActivePython, search
for the boa.py script a little deeper
FOX
Arachno
- http://www.python-ide.com/
- Starting at USD 59$
- "What is the expected release date of the final version ? We try
to get it out in the first quarter of 2005"
- Built with the Open Source FOX GUI Toolkit
PythonWin, a.k.a. PyWin32
- Free, no thrill editor
- PythonWin is a badly-chosen name, as it stands
for either a Python wrapper around the MFC framework, or the IDE that ships
with it and is (obviously) written using this wrapper. It's been renamed
PyWin32
- Available either from Mark Hammond's extension to python.org's packaged version
of Python, or as part of the ActiveState distribution of Python
WPY
- ftp://ftp.python.org/pub/python/wpy/
- uses
the TK widgets for Unix, but native look-and-feel on Windows and is based on the Microsoft Foundation Classes
model
- Files dated 14 june 2001
Pmw
Idle
Wing
Writing GUI apps
Some infos
Below is a list of tools to let you build GUI applications. Most are just
wrappers around a set of widgets such as Windows' native widgets, wxWidgets,
or QT, bringing you back to the days of Windows programming Petzold-style (Mmm...),
but some also offer a GUI designer lilke VB, ie. you can draw the windows
interface with the mouse. You can read more in the page Gui
Programming on the Python site.
Note that the WYSIWYG GUI designer that feels most like VB's is
QT Designer, which you can get either directly from QT or by buying the BlackAdder
IDE.
Alternatively, you could also use a GUI desiging tool such as the antiquated
MS Dialog Editor or its more modern equivalents, just to draw the interface
with the mouse and get the coordinates for each widget, and copy/paste this
into code. Here are some suggestions I got:
- Use a drawing software like SmartDraw or Visio, both of which have libraries
with GUI widgets so you can build a mock-up
- VB, Delphi, or VC++ to draw the interface
- The Windows Platform SDK claims to provide a dialog editor
- Purebasic has a nice gui designer
(Visual
Designer), and is quite reasonable at 59 euro. Purevision is a separate tool that costs just $25. I don't think it requires Purebasic
- PB Forms from PowerBasic
- Choose Your GUI Toolkit
- Python can be called from Delphi: "Python
for Delphi is a set of components that wrap up the Python Dll into Delphi.
They let you easily execute Python scripts, create new Python modules and
new Python types. You can create Python extensions as Dlls and much more.",
Using Delphi
and Python together
- wxWindows
vs. TclTk
Here are the widgets and/or Python wrappers those GUI designers may require:
Designing the UI as a resource
"Quick side note: depending on your GUI needs, ctypes can be a pretty
easy way to go. Create your GUI as resources (e.g. in MS Visual Studio) and
wrap them into a tiny DLL. Then use ctypes to load them at runtime and run CreateDialogIndirect.
Most of the work involved is simply looking in header files for the values of
various Win32 messages and constants, but once you do it the first time you
can re-use much of the code over and over."
MFC
PythonWin, a.k.a. PyWin32 is not only an IDE, but also an MFC wrapper so you can build Win32
apps without any extra widgets set. Take a look at the samples under Drive:\Python23\Lib\site-packages\
PythonWin offers the following modules to wrap the Win32 APIs:
- win32*, such as win32service, win32api, etc.
- pywin*.mfc, such pywin.mfc, pywintypes, etc.
- msvcrt
- ntsecuritycon
- dde
- pythoncom
- etc.
Note that Python Win32, a.k.a. Win32all, is part of the ActivePython
package, so if you use ActivePython instead of the standard Windows version
of Python, Python Win32 is already installed.
More information:
Here's the familiar "Hello, World!" as a dialog box in PyWin32:
- from pywin.mfc import dialog, window
- import win32con
-
- dlgStatic = 130
- dlgButton = 128
-
- class Mydialog(dialog.Dialog):
- def OnInitDialog(self):
- rc = dialog.Dialog.OnInitDialog(self)
- return rc
-
- style = (win32con.DS_MODALFRAME |
- win32con.WS_POPUP |
- win32con.WS_VISIBLE |
- win32con.WS_CAPTION |
- win32con.WS_SYSMENU |
- win32con.DS_SETFONT)
- cs = win32con.WS_CHILD | win32con.WS_VISIBLE
- s = win32con.WS_TABSTOP | cs
- w = 64
- h = 64
-
- #1. Let's create a dialog box with a label and a pushbutton
- dlg = [["PyWin32",(0, 0, w, h), style, None, (8,
"MS Sans Serif")],]
- dlg.append([dlgStatic,"OK", win32con.IDOK, (7,
h - 18, 50, 14), s | win32con.BS_PUSHBUTTON])
- dlg.append([dlgStatic, "Hello, world!", -1, (7, 9, 50, 14),
cs | win32con.SS_LEFT])
-
- #2. Let's start the dialog
- d = Mydialog(dlg)
-
- #3. Display it
- d.DoModal()
Here's how to add a progress bar, set its range, and increment it:
- def OnInitDialog(self):
- rc = dialog.Dialog.OnInitDialog(self)
- self.pbar = win32ui.CreateProgressCtrl()
- self.pbar.CreateWindow (win32con.WS_CHILD |
win32con.WS_VISIBLE, (7, 30, 270, 50), self, 1001)
-
- #Find out how many *.HTM* in /input, and set range of progress bar
- filecount = 0
- for file in glob.glob('*.htm*'):
- filecount+=1
- self.pbar.SetRange(0,filecount)
-
- for file in glob.glob('*.htm*'):
- self.pbar.SetStep(1)
- self.pbar.StepIt()
- [...]
Python GUI API Project
- Python GUI API Project
- "Provide implementations of the API for the three major platforms
(Unix, Macintosh and Windows) that are small and lightweight, interposing
as little code as possible between the Python application and the platform's
underlying GUI facilities, and not bloating the Python installations
or applications which use them."
- Uses GTK, so should run on Windows with the ad hoc version of GTK
wxPython
More infos here.
PyQT
.Net (Mono, DotGNU)
This is very early development, but if you like bleeding edge stuff, you
could start looking at how to develop applications using either MS' official
.Net framework and its tools (VS.Net and the Python add-on, etc.), or the compatible
open-source versions that are Mono and DotGNU. Take a look at IronPython,
and boo.
pyFLTK
PyGTK
"If you like GTK+, you might want to try the glade designer and parse
the XML file with libglade and pygtk. (Generated code is bad). Remember,
glade generates XML. XML is not code, XML is data. And data is not
code. As long as you stay away from generated code, you will be safe.
Yup, the best of two worlds -- a graphical form designer that stores information
in XML data to be parsed by your own python program."
"BTW, there's a python port of glade underway: http://gruppy.sicem.biz/componentes#gazpacho"
- PyGTK: uses the famous GTK cross-platform framework
FXPy
- FXPy: "FXPy is a Python
extension module which provides an interface to the FOX
GUI library" (site says "Last Update: $Date: 2002/04/02 21:12:59"...)
Binding to the TnFox Toolkit?
http://www.osnews.com/story.php?news_id=9701
PyGUI
WAX
- WAX:
GUI toolkit that sits on top of wxPython. "Frankly, I don't have the time to work on this project very much.
That's why I only add new features when I need them."
PyUI
- http://pyui.sourceforge.net/
- "PyUI is a user interface library written entirely in the high-level
language python. It has a modular implementation that allows the drawing
and event input to be performed by pluggable "renderers". This
makes PyUI very portable and scalable. It can run in environments from hardware
accelerated 3D surfaces to regular desktop windows. PyUI was originally
targeted as a User Interface for games, but it has evolved into a more general
UI toolkit with applicability outside of games."
Windows
RipSting’s Blender-Python GUI Designer
Blender GUI Wizard
http://www.angelfire.com/nt/teklord/GUIWizard.htm
ActiveState GUI Builder
- Available in the Pro version of the Komodo IDE
- "The GUI
Builder supports building applications using TK with an extended widget set that
includes BWidgets
and
IWidgets."
Venster
- As of Aug 2004, prealpha
- http://venster.sf.net/
- "Venster is a highly native Windows GUI toolkit for Python based
on the ctypes ffi library. The aim of Venster is to be a very lightweight
wrapper around the standard Win32 API, making it easy to write slick windows
applications in pure Python. [...] The windows API is called trough the
use of Thomas Hellers's excellent ctypes
module
PythonWorks Pro
EasyDialogs for Windows
Dabo
- "Dabo is a 3-tier, cross-platform
application development framework, written in Python atop the wxPython GUI
toolkit"
DynWin
- "A dynamic GUI class library for Python and Win32"
- http://www.nightmare.com/~rushing/dynwin/
PythonWin
sdk32 - Partial Python wrap of the Win32 Platform SDK
- http://www.object-craft.com.au/projects/sdk32/
GTK
MojoView
QT
QT Designer
- Obviously uses the QT widget set instead of Windows' or wxWidgets'
- Comes with the QT set, but can be bought with the BlackAdder IDE
wxWidgets
Dabo
"Dabo is a 3-tier, cross-platform
application development framework, written in Python atop the wxPython GUI toolkit.
And while Dabo is designed to create database-centric apps, that is not a requirement.
Lots of people are using Dabo for the GUI tools to create apps that have no
need to connect to a database at all."
wxDesigner
- commercial dialog editor and RAD tool for the free wxWidgets GUI library
- http://www.roebling.de/
- To add widgets, include a sizer (wxWidgets concept of a group of widgets)
wxGlade
- GUI designer written in Python with the popular GUI toolkit wxPython
- It is not (and will never be) a full featured IDE, but simply a "designer":
the generated code does nothing apart from displaying the created widgets
- http://wxglade.sourceforge.net/
- To add widgets, include a sizer (wxWidgets concept of a group of widgets)
XRCed
- http://xrced.sourceforge.net/
- XRCed is a simple resource editor for wxWindows/wxPython GUI development
which supports creating and editing files in XRC format. It is written in
Python and uses wxPython GUI toolkit.
VisualWx
- http://visualwx.altervista.org/
- Visualwx aims to be the ultimate RAD tool, designed for c++,python,perl,ruby,
lua and wxWidgets.
- As of Aug 2004, still alpha but under active development
Boa Constructor
See above
PythonCard
- http://pythoncard.sourceforge.net/
- Improved features to deal with database servers. It doesn't
have absolutely everything you require, but it's pretty good for alpha software.
In particular, have a look at the dbBrowser sample application which shows how
to interact with MySQL. Here's
an article on working with the ADO interface, and here's
a tutorial on working with COM objects.
- Ready for prime time? "The first goal of the project is to produce
a prototype. This will prove the applicability of the tools, concepts and
high level design of PythonCard. To see how this is progressing;"
- Building GUI Applications
with PythonCard and PyCrust by Patrick O'Brien
Dialogblocks
- Commercial RAD tool
- http://www.anthemion.co.uk/dialogblocks/
- Only handles C++ and XRC, not Python? (XRC = the
wxWidgets resource format)
- Like the other designers, doesn't let you draw windows like VB
- DialogBlocks is programmed with the popular open source (free) GUI toolkit,
wxWidgets, created in 1992 by DialogBlocks' author, Julian Smart.
Tcl/Tk
Visual TCL
"Visual Tcl is a freely-available,
high-quality application development environment for UNIX, Windows, Macintosh
and AS400 platforms. Visual Tcl is written entirely in Tcl/Tk and generates
pure Tcl/Tk code. This makes porting your Visual Tcl applications either unnecessary
or trivial. Visual Tcl is covered by the GNU General Public License."
PAGE - Python Automatic GUI Generator
Resources
Writing GUIs with Tcl/Tk and TKinter
Notes:
- Using pythonw.exe instead of python.exe hides the console, so should
only be used once an application is bug-free, or error messages can be displayed
through the GUI. With the .pyw extension, a tkinter script won't show a terminal
window either, but the script will also be unable to output text to the
console, even if the script was launched manually through a terminal window
- Starting from Python 3, the Tkinter module was renamed to tkinter (with
a lowercase t). It also affects to the tkinter.ttk and deprecated) tkinter.tix
extensions
- Don't use "from … import *", as it mixes everything into
a single namespace
Tkinter is Python's object-oriented layer on top of Tcl/Tk. Tk only offers
basic widgets; If you need more, check out wxPython and PyQt.
Three main concepts: Widgets, event handling, and geometry management (pack,
grid, place; pack is the simplest for simple layouts, grid is
the most commonly used, and place is the least popular but provides the
best control).
Books
- NOK "Python GUI Programming with Tkinter" (2018) by Alan D.
Moore
- READING Tkinter GUI application development cookbook-Packt Publishing _ Alejandro rieur
Rodas de Paz (écra2018).pdf
- "TKinter GUI Application Development Blueprints" (2015) by
Bhaskar Chaudhary
- Modern Tkinter for Busy Python Developers - Roseman (Mark Roseman, 2012).pdf
- Tkinter 8.5 reference- a GUI for Python - Shipman (New Mexico Tech Computer
Center;2013;eng).pdf
- "Python and Tkinter Programming" (2000) by John E. Grayson
Layout/Geometry Managers
Historically, Tkinter supports three layout managers:
- pack: Easiest to use; "For simple layouts, such as when you want
to place all the widgets on top of each other or side by side"
- grid: "To prevent unexpected results in complex layouts, it is
common to group widgets with frames so that when you pack all the widgets
within a frame, you do not interfere with the arrangement of the other ones.
In these cases, use the Grid geometry manager since it allows you to directly
set the position of each widget with one call to the geometry manager and
avoids the need for additional frames."; "considered the more
versatile of the three geometry managers. It directly reassembles the grid
concept that is commonly used in user interface design—a two-dimensional
table divided into rows and columns, where each cell represents the space
available for a widget"
- place: "allows you to set the position and size of a widget in
absolute terms, or in relative terms to another one. Of the three geometry
managers, it is the least commonly used one. On the other hand, it can fit
some complex scenarios where you want to freely position a widget or overlap
a previously placed one. They shouldn't be mixed in the same application!"
Grid
The container frame is organized into a two-dimensional table where each
cell can hold one widget. However, widgets can be made to span multiple cells.
Pack
.pack(side=LEFT|RIGHT|TOP|BOTTOM, fill=X|Y|BOTH,expand=YES|NO,anchor=N|NE|E|SE|S|SW|W|NW)
The pack manager is ideally suited for the following two kinds épauleof situation:
- Placing widgets in a top-down manner
- Placing widgets side by side
Widgets
Tkinder provides the following widgets:
- Toplevel
- Canvas
- PanedWindow
- Frame
- Label
- LabelFrame
- Entry
- Text
- Button
- Listbox
- Checkbutton
- OptionMenu
- Radiobutton
- Menu
- Menubutton
- Message
- Scale
- Scrollbar
- Spinbox
- Bitmap Class
- Image Class
Test
To check that Python is correctly installed and that Tkinter works, open
a terminal window, and run the following command: python -m tkinter
Loading Tkinter
#Bad
from tkinter import *
#Better
import tkinter
#Best
import tkinter as tk
Hello, world!
- import tkinter as tk
-
- root = tk.Tk()
- root.title("My title")
-
- w = tk.Label(root, text="Hello Tkinter!")
- #Fit the size of the window to the given text
- w.pack()
-
- root.mainloop()
Dialog
A simple OK dialog:
- from tkinter import Tk
- from tkinter import messagebox
-
- # Hide parent window; in Windows, use ".pyw" as the extension
to hide the terminal window as well
- Tk().withdraw()
-
- messagebox.showinfo("My title", "Hello")
An Yes/No dialog:
- from Tkinter import *
- from tkMessageBox import *
-
- def answer():
- showerror("Answer", "Sorry, no
answer available")
-
- def callback():
- if askyesno('Verify', 'Really quit?'):
- showwarning('Yes', 'Not
yet implemented')
- else:
- showinfo('No', 'Quit
has been cancelled')
-
- Button(text='Quit', command=callback).pack(fill=X)
- Button(text='Answer', command=answer).pack(fill=X)
-
- mainloop()
To hide the main window:
- from tkinter import Tk
- from tkinter.filedialog import askopenfilename
-
- # we don't want a full GUI, so keep the root window from appearing
- Tk().withdraw()
-
- # show an "Open" dialog box and return the path to the selected
file
- filename = askopenfilename()
- if not filename:
- exit()
-
- print(filename)
Displaying text: Message, and Text
The Message widget has more features than Label, and the Text widget has
even more features.
Message
- import tkinter as tk
-
- master = tk.Tk()
-
- whatever_you_do = "Whatever you do will be insignificant, but it
is very important that you do it.\n(Mahatma Gandhi)"
- msg = tk.Message(master, text = whatever_you_do)
- msg.config(bg='lightgreen', font=('times', 24, 'italic'))
- msg.pack()
-
- tk.mainloop()
Buttons
- import tkinter as tk
- import random
-
- def change_label():
- button.config(text=str(random.randint(1,101)))
-
- root = tk.Tk()
- root.title("Changing label")
-
- button = tk.Button(root, text='Change', width=25, command=change_label)
- button.pack()
-
- root.mainloop()
Closing an application
- Button(master, text='Quit', command=master.quit)
Checkboxes
- var1 = IntVar()
- Checkbutton(master, text="male", variable=var1).grid(row=1,
sticky=W)
Radio button
tk.Radiobutton(root,
text=language,
padx
= 20,
variable=v,
command=ShowChoice,
value=val).pack(anchor=tk.W)
Variable : tell Tkinter what radiobutton should be the default choice
Text = radiobutton label ; Value = ?
Entry
For just a single line of text.
- import tkinter as tk
-
- def show_entry_fields():
- print("First Name:
%s" % (e1.get()))
-
- master = tk.Tk()
-
- tk.Label(master, text="First Name").grid(row=0)
- e1 = tk.Entry(master)
- e1.grid(row=0, column=1)
-
- tk.Button(master,
- text='Show', command=show_entry_fields).grid(row=3,
column=1, sticky=tk.W, pady=4)
-
- tk.mainloop()
Text
Multiple lines of text.
- import tkinter as tk
-
- root = tk.Tk()
-
- S = tk.Scrollbar(root)
- T = tk.Text(root, height=4, width=50)
- S.pack(side=tk.RIGHT, fill=tk.Y)
- T.pack(side=tk.LEFT, fill=tk.Y)
- S.config(command=T.yview)
- T.config(yscrollcommand=S.set)
-
- quote = """HAMLET: To be, or not to be--that is the question:
- Whether 'tis nobler in the mind to suffer
- The slings and arrows of outrageous fortune
- Or to take arms against a sea of troubles
- And by opposing end them. To die, to sleep--
- No more--and by a sleep to say we end
- The heartache, and the thousand natural shocks
- That flesh is heir to. 'Tis a consummation
- Devoutly to be wished."""
-
- T.insert(tk.END, quote)
-
- tk.mainloop()
Showing picture
To eg. display a picture. Note that the Canvas object can only display GIF
and PGM/PPM files.
- import tkinter as tk
- from PIL import ImageTk, Image
-
- master = tk.Tk()
- master.title("Join")
- master.geometry("300x300")
- master.configure(background='grey')
-
- img = ImageTk.PhotoImage(Image.open("IMG_20190522_164109.jpg"))
- panel = tk.Label(master, image = img)
- panel.pack(side = "bottom", fill = "both", expand
= "yes")
-
- tk.mainloop()
Variable Classes
Some widgets (like text entry widgets, radio buttons and so on) can be connected
directly to application variables by using special options: variable, textvariable,
onvalue, offvalue, and value. This connection works both ways: if the variable
changes for any reason, the widget it's connected to will be updated to reflect
the new value. These Tkinter control variables are used like regular Python
variables to keep certain values. It's not possible to hand over a regular Python
variable to a widget through a variable or textvariable option. The only kinds
of variables for which this works are variables that are subclassed from a class
called Variable, defined in the Tkinter module. They are declared like this:
x = StringVar() # Holds a string; default value ""
x = IntVar() # Holds an integer; default value 0
x = DoubleVar() # Holds a float; default value 0.0
x = BooleanVar() # Holds a boolean, returns 0 for False and 1 for True
To read the current value of such a variable, call the method get(). The
value of such a variable can be changed with the set() method.
Web development
More infos here.
Database access
http://www.python.org/sigs/db-sig/
json
https://realpython.com/python-json/
https://realpython.com/python-json/
https://jsonplaceholder.typicode.com/
PyGeoj, "a simple Python Geojson file reader and writer."
Encoding JSON = serialization or marshaling; decoding = deserialization.
dumps() is used to handle data in RAM while dump() is to write them to disk.
turn json into Python objets
Use load() and loads().
JSON
|
Python
|
object
|
dict
|
array
|
list
|
string
|
str
|
number (int)
|
int
|
number (real)
|
float
|
true
|
True
|
false
|
False
|
null
|
None
|
read
with open("data_file.json", "r") as read_file:
todos = json.loads(response.text)
with open("data_file.json", "r") as read_file:
json_string = json.dumps(read_file)
turn dictionary into json
data = {
"president": {
"name": "Zaphod
Beeblebrox",
"species": "Betelgeusian"
}
}
json_string = json.dumps(data)
print(json_string)
write
data = {
"president": {
"name": "Zaphod
Beeblebrox",
"species": "Betelgeusian"
}
}
with open("data_file.json", "w") as write_file:
json.dump(data, write_file)
geoJSON
https://pypi.org/search/?q=geojson
What package/library/module is recommended to work with geoJSON files?
c:\>pip install json
Collecting json
Could not find a version that satisfies the requirement json
(from versions: )
No matching distribution found for json
c:\>pip search geojson
geojson (2.4.0) -
Python bindings and utilities for GeoJSON
geojsontools (0.0.3) -
Functions for manipulating geojsons
geojson_elevation (0.1) -
GeoJSON compatible elevation proxy
geojson_utils (0.0.2) -
Python helper functions for manipulating GeoJSON
PyGeoj (0.22) -
A simple Python GeoJSON file reader and writer.
Setup
pip install geojson
Features
Read
with open('myfile.geojson') as f:
gj = geojson.load(f)
gj['features'][0]
for feature in gj['features'][0]:
print(feature)
print(gj)
Write
props = {"name": "My name","country": "Spain"}
point = Point((-115.81, 37.24))
features = []
features.append(Feature(properties=props,geometry=point))
feature_collection = FeatureCollection(features)
with open('myfile.geojson', 'w') as f:
dump(feature_collection, f)
geopy
"geopy makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources."
https://geopy.readthedocs.io/en/stable/
https://programminghistorian.org/en/lessons/mapping-with-python-leaflet
Working with XML/HTML
https://stackabuse.com/reading-and-writing-xml-files-in-python/
Note: PyXML is deadware
minidom: simplified implementation of DOM
ElementTree (ET): More Pythonic interface than DOM; lxml is an enhanced version
of ET
BeautifulSoup uses lxml, if available, and is an easy way to work with HTML/XML
"untangle
is a simple library which takes an XML document and returns a Python object
which mirrors the nodes and attributes in its structure."
More infos on XML here.
BeautifulSoup
"Beautiful Soup provides a few simple methods and Pythonic idioms for
navigating, searching, and modifying a parse tree. […] Beautiful Soup sits on
top of popular Python parsers like lxml and html5lib, allowing you to try out
different parsing strategies or trade speed for flexibility." Python 3.x should use BeautifulSoup4.
Once parsed, BS builds a tree of Python objects (Tag, NavigableString, BeautifulSoup,
and Comment.)
"A string corresponds to a bit of text within a tag. Beautiful Soup
uses the NavigableString class to contain these bits of text. A NavigableString
is just like a Python Unicode string, except that it also supports some of the
features described in Navigating the tree and Searching the tree. You can convert
a NavigableString to a Unicode string with str: unicode_string = str(tag.string)"
"If you want to use a NavigableString outside of Beautiful Soup, you
should call unicode() on it to turn it into a normal Python Unicode string.
If you don’t, your string will carry around a reference to the entire Beautiful
Soup parse tree, even when you’re done using Beautiful Soup. This is a big waste
of memory."
BS provides two ways to find elements: find(_all)(), and select() with more
sophisticated features as it's a CSS selector from Soup
Sieve).
Different classes, to make it easier to find elements:
- NavigableString
- CData
- ProcessingInstruction
- Declaration
- Doctype
- Comment
- etc.
Note: If the input data isn't in utf-8, BS will silently convert them, and
edit the relevant meta line in the header if it's there — but won't add one
if it isn't.
Questions
.string vs .text? "The string argument is new in Beautiful Soup 4.4.0.
In earlier versions it was called text"
soup.select("kml Document") vs. soup.select("kml > Document")?
The former finds any "Document" tags below "kml", no matter
where in the tree, while the latter look for it directly under "kml".
CSS: Difference between "#sister" and ".sister"?
find_all only searche tags (elements), or also strings within?
How to parse and output
- from bs4 import BeautifulSoup
-
- #open in binary and let BS convert data to utf-8 if needed
- soup = BeautifulSoup(open('input.html', 'rb'), 'xml')
- #OR
- soup = BeautifulSoup("<html>a web page</html>",
'html.parser')
-
- print("Orig encod:",soup.original_encoding)
- print(soup.prettify())
-
- #To work with each tag before having it prettified
- for c in soup.contents:
- print(c.prettify())
Since BS doesn't add it if none is found in the header, here's how to add
encoding information:
- meta = soup.head.find("meta", {"http-equiv".lower():"Content-Type".lower()})
- if meta is None:
- metatag = soup.new_tag('meta')
- metatag.attrs['http-equiv'] = 'Content-Type'
- metatag.attrs['content'] = 'text/html; charset=utf-8'
- soup.head.append(metatag)
- else:
- print("Found")
If you know how a file is (not) encoded, you can help BS by providing this
information before it runs its Unicode, Damnit sub-library:
- soup = BeautifulSoup(markup, 'html.parser', from_encoding="iso-8859-8")
- soup = BeautifulSoup(markup, 'html.parser', exclude_encodings=["iso-8859-7"])
Parsing XML
By default, BS will use an HTML parser unless you specifically tell it to
use an XML parser (which will need to be installed)
- soup = BeautifulSoup(data, 'xml') #https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
Performance
- Install lxml
- Speed up encoding detection by installing the cchardet library
- Parsing only part of a document won’t save you much time parsing the
document, but it can save a lot of memory, and it’ll make searching the
document much faster.
Navigating
- link = soup.a
- for parent in link.parents:
- print(parent.name)
-
- find_parents() and find_parent() work their way up the tree:
- a_string = soup.find(string="Lacie")
- a_string.find_parents("a")
- sibling_soup = BeautifulSoup("<a><b>text1</b><c>text2</c></a>",
'html.parser')
- print(sibling_soup.prettify())
- sibling_soup.b.next_sibling
- sibling_soup.c.previous_sibling
-
- find_next_siblings() and find_next_sibling(), find_previous_siblings()
and find_previous_sibling():
- for sibling in soup.a.next_siblings:
- print(repr(sibling))
-
- .next_elements and .previous_elements #iterators to move forward or
backward in the document as it was parsed
-
- The find_all_next() method returns all matches, and find_next() only
returns the first match. The find_all_previous() method returns all matches,
and find_previous() only returns the first match.
-
- len(list(soup.children))
- len(list(soup.descendants))
Finding elements
- soup.head
- soup.title
- soup.title.name
#Important: "name" is a reserved keyword. To access a tag named
<name>, use eg. wpt.find("name").string
- soup.title.string #element's text
- soup.title.get_text() #alternative
- soup.get_text("|", strip=True)
- soup.body.b #get the first <b> tag below <body>
-
- #get all text within a tree
- text for text in soup.stripped_strings
-
- soup.title.parent.name
-
- soup.head.contents
- soup.head.contents[0].name
-
- title_tag = head_tag.contents[0]
- for child in title_tag.children:
- print(child)
- soup.p #first paragraph
- soup.p['class'] #display value of attribute
-
- soup.find_all('a') #all hyperlinks
- soup("a") #shortcut for soup.find_all("a")
- soup.title(string=True) #shortcut for soup.title.find_all(string=True)
- soup.find(id="link3") #all elements with that attribute
- soup.find_all(string="Elsie") #first occurence <blah>Elsie</blah>
- soup.find_all(string=["Tillie", "Elsie", "Lacie"])
- soup.find_all(string=re.compile("Dormouse"))
- soup.find_all("a", limit=2)
#bad?
-
- #Grab attributes
- tag = soup.find('meta', {'name': 'keywords'})
- print(tag)
- print(tag.attrs)
- print(tag.attrs.get('content'))
-
- Note: If find_all() can’t find anything, it returns an empty list. If
find() can’t find anything, it returns None:
- #get content of a.href
- for link in soup.find_all('a'):
- print(link.get('href')) #get content of href
attribute, ie. link
-
- for string in soup.stripped_strings:
- print(repr(string)) #returns a printable representation
of the given object
- find() and find_all() are the most popular search methods; Use filters
(string, regex, list, function)
- soup.find_all("p", "title")
- soup.find_all(id=True) #all tags with an "id" attribute
- soup.find_all(href=re.compile("elsie"), id='link1')
- name_soup.find_all(attrs={"name": "email"}) #name
is a reserved keyword
- soup.find_all("a", class_="sister") #class
is a reserved keyword
- soup.find_all('b')
-
- #regex
- for tag in soup.find_all(re.compile("^b")):
- print(tag.name)
-
- soup.find_all(string="Elsie") #all tags that contain Elsie
- BeautifulSoup has a .select() method which uses the SoupSieve
package to run a CSS
selector against a parsed document and return all the matching elements.
-
- css_soup.select("p.strikeout.body") #CSS selector to search
for tags that match two or more CSS classes
- soup.select("html head title") #same as soup.title
- soup.select("p > a") #directly under
-
- soup.select_one(".sister") #only first one
- soup.select("#link1 ~ .sister")
- soup.select("#link1 + .sister")
- soup.select(".sister")
- soup.select("[class~=sister]")
- soup.select("#link1")
- soup.select("#link1,#link2")
- soup.select('a[href]')
- soup.select('a[href="http://example.com/elsie"]')
- soup.select('a[href^="http://example.com/"]')
- soup.select('a[href$="tillie"]')
- soup.select('a[href*=".com/el"]')
Modifying the tree
- Note: When calling eg. soup.mytag, BS will look for mytag anywhere in
the three, not just right after soup
-
- tag = soup.b
- tag.name = "blockquote"
- new_tag.string = "Link text."
-
- tag.string.replace_with("No longer bold")
- tag = BeautifulSoup('<b id="boldest">bold</b>',
'html.parser').b
- tag['id']
-
- tag['id'] = 'verybold'
- del tag['id']
-
- tag = soup.a
- tag.string = "New link text."
-
- append()/insert() to add to an element's string (at the end, at a given
location) which can be empty; new_tag() to add a whole tag. There's also
insert_before() and insert_after(). Use clear() to empty a tag's string.
Use extract()/decompose() to remove a tag from the tree.
-
- replace_with() can be used with more than one argument: a_tag.b.replace_with(bold_tag,
".", i_tag)
-
- soup = BeautifulSoup("<a>Foo</a>", 'html.parser')
- soup.a.append("Bar") #<a>FooBar</a>
- #alternative
- new_string = NavigableString("ed")
- soup.a.append(new_string)
#<a>FooBared</a>
- Important: append/insert is used to edit the string of a tag,
which can include a whole block (ie. to add a new tag), not just the string
of a basic tag
-
- original_tag = soup.b
- new_tag = soup.new_tag("a", href="http://www.example.com")
- new_tag.string = "Link text."
- original_tag.append(new_tag) # <b><a href="http://www.example.com">Link
text.</a></b>
-
- To clean a tag that holds multiple NavigableString objects after using
.append():
- soup.smooth()
- print(soup.p.prettify())
- soup = BeautifulSoup("<a>Soup</a>", 'html.parser')
- soup.a.extend(["'s", " ", "on"])
- soup # <a>Soup's on</a>
- soup.a.contents # ['Soup', ''s', ' ', 'on']
-
- from bs4 import Comment
- new_comment = Comment("Nice to see you.")
- tag.append(new_comment) # <b>Hello there<!--Nice to see you.--></b>
-
- tag = soup.a #<a href="http://example.com/">I linked
to <i>example.com</i></a>
- tag.insert(1, "but did not endorse ") #<a href="http://example.com/">I
linked to but did not endorse <i>example.com</i></a>
-
- soup #<b>leave</b>
- tag = soup.new_tag("i")
- tag.string = "Don't"
- soup.b.string.insert_before(tag) # <b><i>Don't</i>leave</b>
- div = soup.new_tag('div')
- div.string = 'ever'
- soup.b.i.insert_after(" you ", div) # <b><i>Don't</i>
you <div>ever</div> leave</b>
- soup.b.contents # [<i>Don't</i>, ' you', <div>ever</div>,
'leave']
-
- soup #<a href="http://example.com/">I linked to <i>example.com</i></a>
- tag = soup.a
- tag.clear() # <a href="http://example.com/"></a>
-
- a_tag = soup.a #<a href="http://example.com/">I linked
to <i>example.com</i></a>
- i_tag = soup.i.extract()
- a_tag # <a href="http://example.com/">I linked to</a>
- i_tag # <i>example.com</i>
-
- a_tag = soup.a #<a href="http://example.com/">I linked
to <i>example.com</i></a>
- i_tag = soup.i
- i_tag.decompose()
- a_tag # <a href="http://example.com/">I linked to</a>
-
- a_tag = soup.a #<a href="http://example.com/">I linked
to <i>example.com</i></a>
- new_tag = soup.new_tag("b")
- new_tag.string = "example.com"
- a_tag.i.replace_with(new_tag) # <a href="http://example.com/">I
linked to <b>example.com</b></a>
- bold_tag = soup.new_tag("b")
- bold_tag.string = "example"
- i_tag = soup.new_tag("i")
- i_tag.string = "net"
- a_tag.b.replace_with(bold_tag, ".", i_tag) # <a href="http://example.com/">I
linked to <b>example</b>.<i>net</i></a>
-
- #<p>I wish I was bold.</p>
- soup.p.string.wrap(soup.new_tag("b"))
- # <b>I wish I was bold.</b>
-
- a_tag = soup.a #<a href="http://example.com/">I linked
to <i>example.com</i></a>'
- a_tag.i.unwrap() # <a href="http://example.com/">I linked
to example.com</a>
-
- #to inject a tree into another
- doc = BeautifulSoup("<document><content/>INSERT FOOTER
HERE</document", "xml")
- footer = BeautifulSoup("<footer>Here's the footer</footer>",
"xml")
- doc.find(text="INSERT FOOTER HERE").replace_with(footer)
-
- #<header/>
- header = soup.header
- header.string = "blah"
Output
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#output
- soup.prettify()
- soup.a.prettify() #just a subpart of the tree
-
- #raw output
- str(soup)
- str(soup.a)
- unicode_string = str(tag.string)
-
- Note: The str() function returns a string encoded in UTF-8. See Encodings
for other options. You can also call encode() to get a bytestring, and decode()
to get Unicode.
-
- If you need more sophisticated control over your output, you can use
Beautiful Soup’s Formatter class:
- from bs4.formatter import HTMLFormatter
-
- formatter = HTMLFormatter(uppercase)
- print(soup.prettify(formatter=formatter))
To find where elements are located in the source file:
- soup = BeautifulSoup(markup, 'html.parser')
- for tag in soup.find_all('p'):
- print(repr((tag.sourceline, tag.sourcepos, tag.string)))
To copy an element (which won't be part of the tree):
- import copy
- p_copy = copy.copy(soup.p)
To only parse and find certain elements:
- from bs4 import SoupStrainer
- only_a_tags = SoupStrainer("a")
To investigate what BS does:
- from bs4.diagnose import diagnose
- with open("bad.html") as fp:
- data = fp.read()
- diagnose(data)
Encoding
Regardless of how it's encoded originally, when loaded into Beautiful Soup,
it's converted to Unicode. Beautiful Soup uses a sub-library called Unicode,
Dammit to detect a document’s encoding and convert it to Unicode.
- from bs4 import UnicodeDammit
-
- dammit = UnicodeDammit("Sacr\xc3\xa9 bleu!")
- print(dammit.unicode_markup)
- # Sacré bleu!
- dammit.original_encoding
- # 'utf-8'
Unicode, Dammit’s guesses will get a lot more accurate if you install one
of these Python libraries: charset-normalizer, chardet, or cchardet.
If you have your own suspicions as to what the encoding might be, you can
pass them in as a list:
- dammit = UnicodeDammit("Sacr\xe9 bleu!", ["latin-1",
"iso-8859-1"])
- print(dammit.unicode_markup)
- # Sacré bleu!
- dammit.original_encoding
- # 'latin-1'
You can check the encoding found by BS using "soup.original_encoding".
If you happen to know a document’s encoding ahead of time, you can avoid
mistakes and delays by passing it to the BeautifulSoup constructor as from_encoding:
- soup = BeautifulSoup(markup, 'html.parser', from_encoding="iso-8859-8")
When you write out a document from Beautiful Soup, you get a UTF-8 document,
even if the document wasn’t in UTF-8 to begin with. If you don’t want UTF-8,
you can pass an encoding into prettify():
- print(soup.prettify("latin-1"))
-
- #alternatively
- soup.p.encode("utf-8")
To read
Internal xml module: minidom and ElementTree
"The ElementTree
library was contributed to the standard library by Fredrick Lundh. It includes
tools for parsing XML using event-based and document-based APIs, searching parsed
documents with XPath expressions, and creating new or modifying existing documents."
"Python has two interfaces — minidom and Element Tree — probably because Element Tree was integrated
into the standard library a good deal later after minidom came to be. The reason
for this was likely its far more "Pythonic" API compared to the W3C-controlled
DOM." (Source)
Python's ElementTree has only limited support for XPath. If you need more,
try lxml.
Besides the Python implementation xml.etree.ElementTree, there is also a
C implementation in xml.etree.cElementTree, which in Python3 is used automatically.
The xml.etree.cElementTree module is now deprecated.
Python's XML
module includes…
- xml.dom: the DOM API definition
- xml.dom.minidom: a minimal DOM implementation
- xml.dom.pulldom: support for building partial DOM trees
- xml.etree.ElementTree: the ElementTree API, a simple and lightweight
XML processor
- xml.sax: SAX2 base classes and convenience functions
- xml.parsers.expat: the Expat parser binding
ET has two classes for this purpose - ElementTree represents the whole XML
document as a tree, and Element represents a single node in this tree. Interactions
with the whole document (reading and writing to/from files) are usually done
on the ElementTree level. Interactions with a single XML element and its sub-elements
are done on the Element level.
Issue I had while learning how to use minidom and ET:
- minidom removes encoding info: <?xml version="1.0" encoding="UTF-8"?>
-> <?xml version="1.0" ?>
Must use pretty = doc.toxml(encoding='utf-8')/prettyxml
= doc.toprettyxml(encoding='utf-8'), and save file as bytes instead of string…
but wrong accents (é instead of é)
minidom.parseString(inputdata)
prettyxml
= doc.toprettyxml(encoding='utf-8') #Adds encoding="UTF-8"
to XML header, but messes with accents, and uses bytes
prettyxml = doc.toprettyxml() #Accents
OK but missing encoding="UTF-8" in XML header, and uses string
Work-around:
prettyxml = prettyxml.replace('<?xml version="1.0" ?>',
'<?xml version="1.0" encoding="UTF-8"?>')
- CHECK ElementTree insists on prepending namespace to output data ("ns0:");
The ET.register_namespace() trick did nothing; Had to find code to remove
those from the source file before parsing
lxml
Since they're largely compatible, any tutorial about Element(Tree) will do,
not just the limited doco from lxml which assumes people already know ET.
"lxml is significantly faster [than ElementTree], can be used to parse HTML, and supports
XPath. […] lxml is also easier to use with namespaces." (Source)
lxml.etree versus ElementTree
"The lxml toolkit is a Pythonic binding
for the C libraries libxml2 and libxslt. It is unique in that it combines the
speed and XML feature completeness of these libraries with the simplicity of
a native Python API, mostly compatible but superior to the well-known ElementTree
API. lxml.etree follows the ElementTree API as much as possible, building it
on top of the native libxml2 tree."
What's
the difference between tree and root?
Note: In lxml 4.6.3.0 at least, there's a bug when parsing an HTML through
a filename rather than a file handle, with lxml adding " "
before each carriage-return:
- #BAD
- tree = et.parse(INPUT,parser)
- #OK
- with open(INPUT) as tempfile:
- tree = et.parse(tempfile,
parser=parser)
If need be, encoding/decoding can be specified: print(ET.tostring(root, encoding='utf8').decode('utf8')).
Here's how to find elements, and get their parents, which is need to delete
the element:
- for movie in root.findall("./foo/bar/[@multiple='Yes']..."):
- print(movie.attrib)
The difference between iterfind() and findall() is that the former returns
an iterator, and only searches through the tree as needed, while findall() first
returns all the data.
iter()? https://docs.python.org/3/library/xml.etree.elementtree.html#elementinclude-functions
It's possible to only get descendants under a given tag:
- tag_name = "ellipse"
- for descendant in root.iter(tag_name):
- print(descendant)
Dealing with namespaces is more convenient when using .iterfind(), which
accepts an optional mapping of prefixes to domain names:
- namespaces = {"": "http://www.w3.org/2000/svg","custom":
"http://www.w3.org/2000/svg"}
- for descendant in root.iterfind("g", namespaces):
- print(descendant)
findtext() and itertext() work on elements' text.
There is no move() method: You'll have to find + append + remove:
- action = root.find("./genre[@category='Action']")
- new_dec = ET.SubElement(action, 'decade')
- new_dec.attrib["years"] = '2000s'
-
- xmen = root.find("./genre/decade/movie[@title='X-Men']")
- dec2000s = root.find("./genre[@category='Action']/decade[@years='2000s']")
- dec2000s.append(xmen)
- dec1990s = root.find("./genre[@category='Action']/decade[@years='1990s']")
- dec1990s.remove(xmen)
Install
pip install lxml
Quick test
- import lxml.etree as et
-
- tree = et.parse("input.gpx")
root = tree.getroot()
-
- #Retrieves direct children nodes of the root
- for child in root:
print(child.tag, child.attrib)
#?
print(root[0][1].text)
Logic
An XML file is made of elements (or "nodes"). Each element has
a tag, and possibly attributes and text.
The ElementTree package consists of two classes: ElementTree (the whole structure) and
Element (nodes).
You first need to read the input, either from a file or a string, have ET
parse it and return a pointer to either the tree (ET.parse("myfile.xml")
followed by tree.getroot()) or the root element directly (ET.fromstring()).
Once you have a pointer to the root element, you can navigate and modify
the tree before writing the edited output back to a file.
Ways to get/set infos from an element:
- myElement.tag
- myElement.text
- myElement.attrib: Returns a dictionary
- myElement.get('myAttr')
- myElement.set('myAttr', myValue) #Alternative: myElement.attrib['myAttr']=myValue
- root.append(myElement)
- root.insert()
- root.remove(myElement)
To create a tree from code:
- a = ET.Element('a')
- b = ET.SubElement(a, 'b')
- c = ET.SubElement(a, 'c')
- d = ET.SubElement(c, 'd')
- ET.dump(a) -> <a><b /><c><d /></c></a>
Alternatively:
- root = etree.HTML("<p>data</p>")
- print(etree.tostring(root))
XPath ("XML Path Language and uses") has more features than ElementTree
to find elements (ElementTree's Supported
XPath syntax). "The .find*() methods are usually faster than the full-blown
XPath support. They also support incremental tree processing through the .iterfind()
method, whereas XPath always collects all results before returning them. They
are therefore recommended over XPath for both speed and memory reasons, whenever
there is no need for highly selective XPath queries."
"ElementTree objects have a method getpath(element), which returns a
structural, absolute XPath expression to find that element:
c = etree.SubElement(a, "c")
d2 = etree.SubElement(c,
"d")
tree = etree.ElementTree(c)
print(tree.getpath(d2))
/c/d[2]
"
"For ElementTree, the xpath method performs a global XPath query against
the document (if absolute) or against the root node (if relative):
r = tree.xpath('/foo/bar')
prin(r[0].tag)
"
"The XPath class compiles an XPath expression into a callable function.
The compilation takes as much time as in the xpath() method, but it is done
only once per class instantiation. This makes it especially efficient for repeated
evaluation of the same XPath expression. Just like the xpath() method, the XPath
class supports XPath variables:
root = etree.XML("<root><a><b/></a><b/></root>")
find
= etree.XPath("//b")
print(find(root)[0].tag)
"
"ElementTree supports a language named ElementPath in its find*() methods.
One of the main differences between XPath and ElementPath is that the XPath
language requires an indirection through prefixes for namespace support, whereas
ElementTree uses the Clark notation ({ns}name) to avoid prefixes completely.
The other major difference regards the capabilities of both path languages.
Where XPath supports various sophisticated ways of restricting the result set
through functions and boolean expressions, ElementPath only supports pure path
traversal without nesting or further conditions."
lxml.etree vs lxml.objectify: The two modules provide different ways of handling
XML. However, objectify builds on top of lxml.etree and therefore inherits most
of its capabilities and a large portion of its API. lxml.etree is a generic
API for XML and HTML handling. It aims for ElementTree compatibility and supports
the entire XML infoset. It is well suited for both mixed content and data centric
XML. Its generality makes it the best choice for most applications. lxml.objectify
is a specialized API for XML data handling in a Python object syntax. It provides
a very natural way to deal with data fields stored in a structurally well defined
XML format. Data is automatically converted to Python data types and can be
manipulated with normal Python operators. Look at the examples in the objectify
documentation to see what it feels like to use it. Objectify is not well suited
for mixed contents or HTML documents. As it is built on top of lxml.etree, however,
it inherits the normal support for XPath, XSLT or validation.
Parsing HTML
https://lxml.de/lxmlhtml.html
- document_fromstring(string): Parses a document from the given string.
This always creates a correct HTML document, which means the parent node
is <html>, and there is a body and possibly a head.
- fragment_fromstring(string, create_parent=False): Returns an HTML fragment
from a string. The fragment must contain just a single element, unless create_parent
is given; e.g., fragment_fromstring(string, create_parent='div') will wrap
the element in a <div>.
- fragments_fromstring(string): Returns a list of the elements found in
the fragment.
- fromstring(string): Returns document_fromstring or fragment_fromstring,
based on whether the string looks like a full document, or just a fragment.
If BeautifulSoup's UnicodeDammit
doesn't solve an incorrect encoding declaration, ElementSoup makes use of the
BeautifulSoup parser to build an lxml HTML tree from broken HTML.
E-factory
makes it possible to quickly generate HTML pages and fragments:
- from lxml.html import builder as E
- from lxml.html import usedoctest
-
- html = E.HTML(
- E.HEAD(
- E.LINK(rel="stylesheet",
href="great.css", type="text/css"),
- E.TITLE("Best
Page Ever")
- ),
- E.BODY(
- E.H1(E.CLASS("heading"),
"Top News"),
- E.P("World
News only on this page", style="font-size: 200%"),
- "Ah,
and here's some more text, by the way.",
- lxml.html.fromstring("<p>...
and this is a parsed fragment ...</p>")
- )
- )
-
- print lxml.html.tostring(html)
lxml.html.open_in_browser(lxml_doc) writes the document to disk and open
it in the default browser.
lxml.html also supports working with links and forms, and cleaning HTML (removing
embedded or script content, special tags, CSS style annotations, etc.)
Namespaces
Namespaces are
required in XML files that contain data from different sources that might use
elements with the same name, eg. "name" as a way to know which
one is meant each time. In simple XML files, namespaces are not required, and
can be removed from the source file to make things easier.
Namespaces can be any string; It's a convention to use a URL, which can point
to a document providing information about it.
Namespaces
can be either set in the input file, or through ElementTree:
- <Author xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="Application_t">
-
- #Default namespaces follow this format.
- xmlns="namespaceURI"
-
- #Be sure to replace "URI" with the actual URI in your XML
document.
- ET.register_namespace('', "URI")
- ET.register_namespace('xsi', "http://www.w3.org/2001/XMLSchema-instance")
Classes
Some interesting methods:
class lxml.etree._Element
- addnext(self, element)
- addprevious(self, element)
- append(self, element)
- clear(self, keep_tail=False)
- find(self, path, namespaces=None)
- findall(self, path, namespaces=None)
- findtext(self, path, default=None, namespaces=None)
- get(self, key, default=None)
- list(element) (was: getchildren(self))
- getnext(self)
- getparent(self)
- getprevious(self)
- getroottree(self)
- insert(self, index, element)
- iter(self, tag=None, *tags)
- remove(self, element)
- replace(self, old_element, new_element)
- xpath(self, _path, namespaces=None, extensions=None, smart_strings=True,
**_variables)
class lxml.etree._ElementTree
- find(self, path, namespaces=None)
- findall(self, path, namespaces=None)
- findtext(self, path, default=None, namespaces=None)
- getelementpath(self, element)
- getpath(self, element)
- iter(self, tag=None, *tags)
- iterfind(self, path, namespaces=None)
- write(file, *, encoding, method, pretty_print, xml_declaration, with_tail,
standalone, doctype, compression, exclusive, inclusive_ns_prefixes, with_comments,
strip_text, docstring)
- xpath(self, _path, namespaces=None, extensions=None, smart_strings=True,
**_variables)
Output
Note: The tree is binary data, while the root is string:
- #with open(OUTPUFILE, 'wb') as writer:
- with open(OUTPUTFILE, 'wt',encoding='utf-8') as writer:
- #TypeError: write() argument
must be str, not bytes
- writer.write(et.tostring(root,pretty_print=True))
To print the whole tree:
- print(ET.tostring(root, encoding='utf8').decode('utf8'))
#makes
no difference: Still binary dump
print(ET.tostring(root,pretty_print=True))
Displaying infos:
- tree = etree.ElementTree(root)
- print(tree.docinfo.xml_version)
- print(tree.docinfo.doctype)
-
- tree.docinfo.public_id = '-//W3C//DTD XHTML 1.0 Transitional//EN'
- tree.docinfo.system_url = 'file://local.dtd'
- print(tree.docinfo.doctype)
lxml also supports indenting:
- etree.indent(root)
- print(etree.tostring(root))
-
- etree.indent(root, space=" ")
- print(etree.tostring(root))
-
- etree.indent(root, space="\t")
- etree.tostring(root)
Outputing XML, HTML, text:
- print(etree.tostring(root)) #Default is XML
- print(etree.tostring(root, method='html', pretty_print=True))
- print(etree.tostring(root, method='text', encoding="UTF-8"))
To start from a clean plate:
- parser = etree.XMLParser(remove_blank_text=True)
- tree = etree.parse(filename, parser)
- print(ET.tostring(root,pretty_print=True))
In case there are redundant namespaces:
- parser = et.XMLParser(ns_clean=True,remove_blank_text=True)
- tree = et.parse(INPUTFILE, parser)
- print(et.tostring(tree.getroot()))
To write the tree to a file:
- f = open('doc.xml', 'w')
- f.write(etree.tostring(root, pretty_print=True))
- f.close()
Another way to get pretty printing is to use Tidy:
- import tidy
-
- f.write(tidy.parseString(your_xml_str, **{'output_xml':1, 'indent':1,
'input_xml':1}))
If ET complains about encoding, try this:
- print(ET.tostring(root, encoding='utf8').decode('utf8'))
Finding elements
There are multiple ways to search for elements
- Direct children of the root node can be found with a simple "for
child in root"
- find/findall: find() returns the first hit, while findall() returns
all of them; it either takes a single element, or an XPath string; note
that find() called with a simple element won't search deeper than the level
right below (use XPath for a deeper search)
- xpath: For advanced queries
- iter/iterfind: Both are recommended when reading lots of data. The former
returns all the elements that match the tag; The latter supports a path,
so only elements within a given section of the tree will be read
An element acts like a list where the children are items in the list, eg.
len(root) returns the number of elements below the root. Attributes are dictionaries.
find() returns a single element, if any, while findall() returns a dictionary.
findall() is part of the original ElementTree API. It supports a simple subset
of the XPath language, without predicates, conditions and other advanced features.
For instance, it doesn't allow the use of "/" ("SyntaxError:
cannot use absolute path on element"). findall() returns only elements
with a tag which are direct children of the current element.
Likewise, findall() doesn't support the "|" symbol to search for
different tags, while xpath() does:
- for el in root.xpath('.//tag1/*|.//tag2/*'):
- print(el.tag, el.text)
A simpler alternative:
- for el in root.iter('tag1', 'tag2'):
- print(el.tag, el.text)
Note: find/findall/iterfind() methods are recommended over using xpath() because
they are faster and support incremental searches, and also simplify namespace
usage, ie. only use .xpath() for advanced queries.
Important: If findall() returns nothing although the query looks good, it
might be an issue with the namespaces. In that case, either remove all namespaces in the input file,
or change the search string.
Important: To check if find() found an element, use: if element.find('...')
is not None.
lxml also offers two functions to get sibblings: getprevious()/getnext().
It also provides getparent().
CHECK Important: Checking if find/findall is not empty requires two different
ways:
- r = root.find('./Document/name')
- #if et.iselement(r):
if r is not None:
-
- tracks = root.findall('.//LineString')
- if len(tracks):
This shows the top-level element in the tree:
- print(root.tag,root.attrib)
This will return all the elements right below the root, not any deeper:
- for child in root:
- print(child.tag, child.attrib)
To get all the elements in the tree…
- for elem in root.iter():
- print(elem.tag,elem.attribute,elem.text)
… or only some elements, anywhere in the tree:
- for movie in root.iter('movie'):
- print(movie.attrib)
Find the first element that matches:
- for wpt in root.find("wpt"):
- print(wpt.tag,wpt.attrib,wpt.text)
Getting the text of the second attribute in a meta element in the head section:
- description = root.xpath('string(//meta[@name="description"]/@content)')
- if len(description):
- print("Description=",description)
find() and find() supports a sub-set of XPath, which provides a more powerful way to navigate a tree. Here's how to find
all waypoints below the root in a GPX file:
- for wpt in root.findall("./wpt"):
- print(wpt.tag,wpt.attrib,wpt.text)
Searching with XPath
- "somenode" or "somenode/somesubnode"
- / : Start searching from the root node (also works with a pointer to
the root node); Absolute search
- // : Searches the whole tree, regardless of the location of the node
that calls xpath
- .//: Searches within the descendants below the node that calls xpath
- . : Current node
- .. : Parent of the current node
- * : All elements
- @ : Attribute
- @*: All the attributes
- node(): Any node
- element(): Any element node
- last() -1: Next-to-last node in sequence
- attribute(src): Any attribute node named "src"
- count(//property) gt 1: True if more than one property nodes
- p/img[@src="icon.png"]: Element that has src set to "icon.png"
- div[3]: Third div element (1 is first)
- Other functions supported: Strings; Dates, Times, and Durations; Nodes
and Documents; Numbers
Note that xpath() returns a list, even if it found only one element:
- element = template_tree.xpath('//myelement')
- if len(element):
- html_tree = lxml.html.fragment_fromstring("<div>blah</div>",
parser=lxml.html.HTMLParser())
- parent = element[0].getparent()
- parent.insert(parent.index(element[0]),html_tree)
- parent.remove(element[0])
- print(et.dump(template_root))
Examples:
- /class/student[1]: Second node from the top of the set
- /class/student[last()]: Last node in the set; Other functions include
position() and count()
- /class/student[@rolllno = 493]: Nodes whose "rollno" attribute
is = 493
- /class/student[marks>85]: Nodes whose element "marks" is
> 85
More infos:
lxml.objectify
"lxml supports an alternative API similar to the Amara bindery or gnosis.xml.objectify
through a custom Element implementation. The main idea is to hide the usage
of XML behind normal Python objects, sometimes referred to as data-binding.
It allows you to use XML as if you were dealing with a normal Python object
hierarchy."
https://lxml.de/objectify.html
TO READ
Help
https://mailman-mail5.webfaction.com/listinfo/lxml
Archives https://mailman-mail5.webfaction.com/pipermail/lxml/
DEAD http://blog.gmane.org/gmane.comp.python.lxml.devel
DEAD https://www.google.com/webhp?q=site:comments.gmane.org%2Fgmane.comp.python.lxml.devel+
Q&A
Can I get rid of namespace infos while working with data?
The klugy way to remove namespaces from the source file is to run a regex
through the source file, and read the result into the root
A cleaner way is to parse the XML, and then remove
all references to the namespace(s):
- # Remove namespace prefixes
- #Source: https://stackoverflow.com/questions/60486563/
- tree = et.parse(INPUTFILE)
- root = tree.getroot()
- for elem in root.getiterator():
- #ValueError: Invalid
input tag of type <class 'cython_function_or_method'>
- #et.tag = et.QName(elem).localname
-
- # For elements, replace
qualified name with localname
- if not(type(elem) ==
et._Comment):
- elem.tag
= et.QName(elem).localname
-
- # Remove attributes
that are in a namespace
- for attr in elem.attrib:
- if
"{" in attr:
- elem.attrib.pop(attr)
-
- # Remove unused namespace declarations
- et.cleanup_namespaces(root)
How to add text when using append()?
for waypoint in root.findall('gpx:wpt', namespaces=NSMAP):
#How to set text?
waypoint.append( ET.Element("dummy"))
Difference between .iter() and .findall()?
"Element.findall() finds only elements with a tag which are direct children
of the current element."
Diff between root.write(ET.tostring()) and tree.write()?
with open("removed.time.gpx", 'wb') as doc:
#Diff with tree.write('output.xml')
?
root.write(ET.tostring(tree,
pretty_print = True))
What's the difference between tree and root (parse vs. fromstring)?
parse() returns an ElementTree while fromstring() returns an Element.
https://stackoverflow.com/questions/32620254/python-elementtree-elementtree-vs-root-element
What's the point of getroot()?
Needed with functions that return a whole ElementTree instead of a specific
Element (node):
tree = et.parse("input.gpx")
root = tree.getroot()
"fromstring() parses XML from a string directly into an Element, which
is the root element of the parsed tree. Other parsing functions may create an
ElementTree." (Source)
"The getroot() method is available on xml.etree.ElementTree.ElementTree
objects, not xml.etree.ElementTree.Element objects. ET.fromstring() returns
the latter type. You already have the root element."
https://stackoverflow.com/questions/32620254/python-elementtree-elementtree-vs-root-element
fastkml
$ pip install fastkml (or "pip install -r requirements.txt"
from the base of the source tree; To build KML files, FastKML requires Shapely
which requires libgeos).
Note: If the input KML has something it doesn't like, fastkml (or lxml2?)
might just complain with "ValueError" without saying which line it
didn't like.
- from fastkml import kml
-
- k = kml.KML()
- with open(kml_file, 'rt', encoding="utf-8") as myfile:
- doc=myfile.read()
- k.from_string(doc)
-
- #features() returns a generator object that you can iterate over
- for f in k.features():
- print(f.name)
- features = list(k.features())
- f2 = list(features[0].features())
- print(f2[0].name)
- print k.to_string(prettyprint=True)
-
- --
- with open(input) as f:
- doc = parser.parse(f)
- root = doc.getroot()
-
- folder = root.Document.Folder # parent of Placemark
- for pm in folder.getchildren():
- print(pm.tag, pm.getparent().tag)
- """
- if pm.tag == '{http://www.opengis.net/kml/2.2}Placemark':
- keep = False
- #zipcode = ''
- for sd in pm.ExtendedData.SchemaData.getchildren():
- if 'ZCTA5CE10' in sd.values():
- if
sd.text in zipcodes:
- #zipcode
= sd.text
- keep
= True
- break
- if not keep:
- removed += 1
- folder.remove(pm)
- else:
- kept += 1
- """
- #doc.write('output.kml', xml_declaration=True, encoding='UTF-8')
More infos
pykml
"pyKML is based on the lxml.objectify
API which provides a Pythonic API for working with XML documents. pyKML adds additional functionality specific to the KML language.
pyKML depends on the lxml Python library, which in turn depends
on two C libraries: libxml2 and libxslt. Given this, the first step to installing
pyKML is to get lxml running on your system."
The
XML parser is used to read an existing KML file (pykml.parser.parse),
or write a KML object to a file (lxml.etree.tostring).
"For complete stand alone programs that demonstrate how to use pyKML,
check out the pyKML Examples."
"This type of attribute-based access is provided by the lxml packages’s
objectify API. pyKML users are encouraged to familiarize themselves with the
objectify API documentation on the
lxml website, because pyKML inherits this functionality."
"KML documents that you create can be validated against XML Schema documents,
which define the rules of which elements are acceptible and what ordering can
be used. Both the OGC KML schema and the Google Extension schemas are included
with pyKML."
Resources
Install lxml
To
check
if lxml is installed, run Python, and type "import lxml"
If not: http://lxml.de/installation.html
Install PyKML
pip install pykml
Run Python, and type "import pykml"
Have PyKML create a ready-to-use script
from pykml.factory import write_python_script_for_kml_document
import urllib.request as urllib2 #urllib2 was used in Python 2
from pykml import parser
url = 'http://code.google.com/apis/kml/documentation/kmlfiles/altitudemode_reference.kml'
fileobject = urllib2.urlopen(url)
doc = parser.parse(fileobject).getroot()
script = write_python_script_for_kml_document(doc)
print(script)
Later
from lxml import etree
from pykml import parser
from pykml.factory import KML_ElementMaker as KML
To validate:
from pykml.parser import Schema
schema_ogc = Schema("ogckml22.xsd")
schema_gx = Schema("kml22gx.xsd")
doc = KML.kml(GX.Tour())
#The .validate() method only returns True or False
schema_ogc.validate(doc)
schema_gx.validate(doc)
#More details
schema_ogc.assertValid(doc)
Here's how to read a KML file, and remove an element:
- from pykml import parser
- from lxml import etree
-
- with open("input.kml") as f:
- doc = parser.parse(f)
- root = doc.getroot()
-
- folder = root.Document.Folder.Placemark
- for pm in folder.getchildren():
- #print(pm.tag)
- #To prevent PyKML from prepending {http://earth.google.com/kml/2.0},
- #use regex to remove namespace: <kml xmlns="http://earth.google.com/kml/2.0">
- #if pm.tag=="ExtendedData":
- if pm.tag=="{http://earth.google.com/kml/2.0}ExtendedData":
- folder.remove(pm)
-
- outfile = open("output.kml","wb")
- outfile.write(etree.tostring(doc, pretty_print=True))
Here's how to read coordinates:
- import lxml
- #import pykml
- from pykml import parser
-
- doc=None
- with open('dummy.kml') as f:
- doc = parser.parse(f).getroot()
-
- for e in doc.Document.Folder.Placemark:
- coor = e.LineString.coordinates.text.split(',')
- print(coor)
Here's how to read from a file, and copy data into a new file:
- from pykml import parser
- from lxml import etree
- from pykml.factory import KML_ElementMaker as KML
-
- with open('input.kml') as f:
- tree = parser.parse(f)
- root = tree.getroot()
- coords = root.Document.Folder.Placemark.LineString.coordinates
-
- doc = KML.kml(
- KML.Placemark(
- KML.name("test"),
- KML.Style(KML.LineStyle(KML.color("FF0000FF"))),
- KML.LineString(
- KML.coordinates(coords)
- )
- )
- )
-
- outfile = open('output.kml','wb')
- outfile.write(etree.tostring(doc, pretty_print=True))
simpleKML
https://simplekml.readthedocs.io/en/latest/"Unfortunately, simplekml is just a kml generator, it cannot read and
manipulate existing kml, only create it. You will have to use an alternative,
such as pyKML." (Source)
pip install simplekml
Example:
- import simplekml
- kml = simplekml.Kml()
- kml.document.name = "Test"
- kml.save("botanicalgarden.kml")
How to remove id? <Document id="1">
How to read existing KML file, extract needed items (eg. Placemark), edit
them, and save everything to a new KML file?
Working with GPX files with gpxpy
https://github.com/tkrajina/gpxpy
http://witkowskibartosz.com/blog/gpx-file-reader.html
https://ocefpaf.github.io/python4oceanographers/blog/2014/08/18/gpx/
pip install gpxpy
Quick code:
- import gpxpy
-
- f = open(path_to_gpx_file, 'r')
- p = gpxpy.parse(f)
- print("{} track(s)".format(len(p.tracks)))
To read from a GPX file:
- a = gpx.tracks[0]
- b = a.segments[0]
- c = b.points[1]
- d = [c.longitude, c.latitude, c.elevation, c.time]
To create a new GPX file from scratch:
- # Create first track in our GPX:
- gpx_track = gpxpy.gpx.GPXTrack()
- gpx.tracks.append(gpx_track)
-
- # Create first segment in our GPX track:
- gpx_segment = gpxpy.gpx.GPXTrackSegment()
- gpx_track.segments.append(gpx_segment)
-
- # Create points:
- gpx_segment.points.append(gpxpy.gpx.GPXTrackPoint(2.1234, 5.1234, elevation=1234))
- gpx_segment.points.append(gpxpy.gpx.GPXTrackPoint(2.1235, 5.1235, elevation=1235))
- gpx_segment.points.append(gpxpy.gpx.GPXTrackPoint(2.1236, 5.1236, elevation=1236))
-
- # You can add routes and waypoints, too...
-
- print 'Created GPX:', gpx.to_xml()
xmltodict
"xmltodict
is a Python module that makes working with XML feel like you are working with
JSON".
Won't do if you need to add a key, but fine if you just need to read, and
possibly change any value.
Notes from John E. Simpson's "XPath and XPointer" (2002)
XPath is used for locating XML content within an XML document; XPointer is
the standard for addressing such content, once located.
As support for XPath is integrated into the Document Object Model (DOM),
DOM developers may also find XPath a convenient alternative to walking through
document trees.
"An XPath" consists of one or more chunks of text, delimited by
any of a number of special characters, assembled in any of various formal ways.
Each chunk, as well as the assemblage as a whole, is called an XPath expression.
Most XPath expressions, by far, locate a document's contents or portions
thereof. These pieces of content are located by way of one or more location
steps — discrete units of XPath "meaning" — chained together, usually,
into location paths.
An XPath expression can be said to consist of various components: tokens
and delimiters. The expression taxcut/* locates all elements that are children
of a taxcut element.
XPath is capable of processing four data types: string, numeric, Boolean,
and nodes (or node-sets).
Most nodes have names. Three important terms:
- qualified name: Almost always contracted to "QName"; Usually
the identifier for the node as it actually appears in an instance document,
including any namespace prefix. For example, an element whose start tag
is <concerto> has a QName of "concerto"; if the start tag
were <mml:concerto>, the QName would be "mml:concerto."
- local-name: its QName, sans any namespace prefix. If an element's QName
is "mml:concerto," its local-name is "concerto." If
there's no namespace in effect for a given node, its QName and local-name
are identical.
- expanded-name: If the node is associated with a particular namespace,
its expanded-name is a pair, consisting of the URI associated with that
namespace and the local-name. Because the expanded-name doesn't consider
the namespace prefix at all, two elements, for example, can have the same
expanded-name even if their QNames are different, as long as both their
associated namespace URIs (possibly null) and their local-names are identical.
In a location path, the root node is represented by a leading / (forward
slash) character.
There's an XPath function, normalize-space() that trims all leading and trailing
whitespace from a given element's content.
Editing with XMLStarlet
Read xmlstarlet-ug.pdf
XMLStarlet is an open-source,
command-line application that supports testing XPath queries.
Checking the structure: xml el input.xml
Networking
- "Pyro is short for PYthon
Remote Objects. It is an advanced and powerful Distributed Object Technology
system written entirely in Python, that is designed to be very easy to use."
Q&A
How to find the type of a variable/output?
print(type(blah))
UnicodeDecodeError: 'ascii' codec can't decode byte
Python uses Unicode internally, and may need some help when it can't successfully
figure out which page code is used to encode a string:
- try:
- cursor.execute(sql.decode('utf-8'))
- except UnicodeDecodeError:
- try:
- cursor.execute(sql.decode('iso8859-15'))
- except UnicodeDecodeError:
- cursor.execute(sql.decode('cp1252'))
More information:
How to check for errors?
"Pylint analyzes
Python source code looking for bugs and signs of poor quality."
How to enhance performance?
Why are strings immutable?
Read that question in a ng. Does it mean a string in Python is read-only?
What's the difference between "import mymodule" and "from
mymodule import *"?
The former forces you to prepend the module's name to every membre, eg. mymodule.mymethod(),
while the latter imports all the methods into the current namespace, letting
you call the methods without the module name. Although easier to use, make sure
those new methods don't clash with your current namespace...
Is there a native-code compiler for Windows?
Check out py2exe.
Other sources of information are Distributing
Python Apps and How can
I create a stand-alone binary from a Python script? Also take a look at
Psyco.
Py? Pyc? Pyd? Pyo? Pyw?
(From Boudewijn Rempt's book on PyQT): "The translation from Python
code to byte-code only happens once: Python saves a compiled version of your
code in another file with the extension .pyc, or an optimized compiled version
of your code that removes assert statements and line-number tracking in a file
with the extension .pyo.
However, that is only done with Python files that are imported from other
files: the bootstrap script will be compiled to bytecode every time you run
it, but python will create a myapp.pyc from a file myapp.py (which is not shown
here)."
IndentationError
"unindent does not match any outer indentation level" : If copy/pasting
code from a web page, make sure there are not hidden characters that confuse
Python.
Hiding the DOS box when running under Windows?
"Python.exe is used for console mode programs and Pythonw.exe
is used for GUI applications that don't need a console window. Python.exe can
also be used for GUI programs, but then you get a console window in addition
to your GUI window(s)."
How to call a PowerBasic DLL from Python?
http://www.talkaboutprogramming.com/group/alt.lang.powerbasic/messages/7219.html
How to hide the console window when running an EXE generated by py2exe?
Books
- Eric Matthes - Python Crash Course_ A Hands-On, Project-Based Introduction
to Programming-No Starch Press (2020).pdf
- Fluent Python - Ramalho - Final release (O'Reilly Media, 2015, 9781491946008).pdf
- Mark Lutz - Programming Python, 4th Edition - 2014.pdf
- Programming - Learning Python (5Th Ed)By Mark Lutz.pdf
- [2015] Automate the Boring Stuff with Python_ Practical Programming for Total
Beginners - Sweigart (No Starch Press).pdf
From VB to Python
Resources
- https://python-forum.io
- Python Programming
- Instant Python
- Thinking in Python
- DiveIntoPython
- Python Tutorial
- Python
FAQTS
- Python Aprocrypha
- How to Think Like a
Computer Scientist - Learning with Python
- Python
Editors
- Integrated
Development Environments
- OnLAMP Python Center
- Graphical Programming
with Python: QT Edition
- Transactional Persistence
for Python
- Review: theKompany
Black Adder Python IDE
- Power
Python: Do More With Less Code
- IronPython (Python implementation
targeting the .NET and Mono platforms)
- Charming
Python - Series of articles from IBM, including A
review of Python IDEs, Review
of Python IDEs, and Preview
of the [anygui] project
- When Pythons Attack - Common
Mistakes of Python Programmers by Mark Lutz, coauthor of Learning Python,
2nd Edition
- Python Gotchas
- Python Cookbook
(ActiveState)
- Vaults of Parnassus : Python Resources
- ActiveState Programmer
Network - Python
- PEAK - Python Enterprise
Application Kit
- Python
Is Middleware by Tim Daneliuk
- WATSUP - Windows Application
Test System Using Python
- Driving
win32 GUIs with Python, part 1
- Artima
Articles About Python
- Charming
Python: Pyrex extends and speeds Python apps by David Mertz
- Enhanced Interactive Python
with IPython by Jeremy Jones
- "Movable Python
is a way of building portable, standalone, Python environments. The result
is a version of Python that can be used without having to install python
on the target machine. [...] This means you can carry around a prebuilt
python environment on a USB memory stick - Python on a stick. Come and get
your gannet ripple, python on a stick."
- Python
+ .NET = IronPython by by Paul Ferrill
- Python
in the enterprise by Daniel G. Shafer
- Introducing IPython by Conrad Koziol ("an enhanced interactive
Python shell with improvements such as history caching, profiles, object
information, and session logging, as a replacement for the default interpreter.")
- A
review of 6 Python IDEs
- Python
IDE Review
- A
First Look at IronPython: Where Python meets .NET By Jeff Cogswell
- IPython, An enhanced Interactive
Python shell
- Python for Newbies
– Tutorial