I am a big fan of Python. It’s a fast and flexible language with an easy-to-read syntax. The standard Python library is massive and additionally there are lots of third party libraries for pretty much everything. It’s an awesome tool for administrative tasks. Personally I’ve used it for a variety of tasks, from scientific computing to ASCII art.
Last year I heard that some Finnish Pythonists were setting up a conference as a part of the international Python community. Unfortunately, I wasn’t able to attend. This year was different and I went ahead and participated.
I’m off to Turku
PyCon Finland 2011 was held on October 17-18 in Turku. The venue was located on University of Turku’s ICT-building in Kupittaa. This time PyCon was organized by Python Suomi ry, a non-profit organization, which was founded on May. First day of the conference was for talks and the second one for sprints, workshops and for the biannual Python Suomi ry meeting. I was there on the first day only.
There were two conference rooms. One was an auditorium and the other one a normal classroom.
These are the talks I attended to. The full schedule can be found from here. All talks had a livestream and they were recorded, so these talks will appear on the PyCon website some day. Also the spreadsheets will be up.
“from __magic__ import wtf” – Tommie Gannert, Spotify
The organizers couldn’t get Jonathan to talk. They tried.
Tommie Gannert is a team leader in the backend and infrastructure development at Spotify. Tommie’s talk started off by a brief history of Spotify and how things are now. Currently they are running Debian and old Python 2.4/2.6 on the backend systems with Twisted. They are switching to gevent because of the complexity of Twisted. During Tommie’s talk it became apparent how much they’ve used Python in the Spotify infrastucture and it surprised me. The Spotify clients are written in C++ and Java (depending on the platform), some PHP on the websites and some Java on the backend as well. Python is the most-used language, not lines of code -wise but on the amount of projects.
He said that they started using Python “because it’s fun”. A part of his talk was about the nature of the language and how it should be used. He used Spotify’s backend as an example. Spotify is only five years old and yet there’s lots of code which needs rewrite – mainly because it uses Twisted or because the code is “too clever”, or as he said, twisted. Tommie reminded us to have empathy on other developers.
“Python for Data Science” – Harri Hämäläinen
Harri is a doctoral student at Aalto University. His talk was about using Python in data science, which roughly is a combination of computer science, mathematics and visualization. This is widely used on social media these days, for example on Twitter’s “trends”.
Python comes really handy when gathering the data. Some services provide an API to gather some data from the service, but usually it’s not that easy. Web-scraping means the data is extracted from a document, usually XML. Regular expressions cannot be used. Python libraries BeautifulSoup and lxml are great tools for parsing such data. Harri also presented Scrapy, a web-framework for web-scraping. This was something I was already familiar with.
Ethics play a big role in web-scraping. Harri had a couple of advices for us and probably the best one was “dumps over API’s, API’s over scraping”. If it’s possible to download a dump of the data you are interested in, it’s the most reliable way to get it. API’s can require some extra work, but it’s definitely better than scraping. Scraping is the last option, which (at least in my case) turns out to be the most common method for data gathering.
Harri introduced a couple of cool Python libraries for computing and visualizing the data. These libraries were scipy, numpy and matplotlib. Combined with ipython you get a Matlab-like environment, which is something I must try. For networking graphs and graph theory Harri presented networkx and igraph. Interesting stuff.
“High performance computing in Python” – Jussi Enkovaara, CSC
I was really looking forward to this talk as I thought it would be the most scientific talk of them all. Turned out to be my favourite.
High performance computing means complex computing which requires large computational clusters or super-computers for the calculation. Jussi used GPAW, a tool for computational nano-science, as an example. They’ve chosen Python because it enables fast development and provides useful (and fast) libraries for scientific computing (such as scipy and numpy, which I’ve already mentioned). Python’s data structures (for example dictionaries) and dynamic typing (because of complex numbers) are also suitable for such computing. These would require a lot of extra work if they would use C or C++.
NumPy is widely used on GPAW. Standard Python libraries are too slow especially with huge list-structures. NumPy provides a new type of array, which operations are already implemented in the compiled NumPy-module. It also provides a set of mathematical operations which take an array as a parameter, for example sin(), cos(), exp(), log() and so on. Apparently with these optimizations Python is as fast as an equivalent C code would be. Jussi presented a benchmark between pure Python, C and Numpy. This benchmark consisted of a matrix multiplication (C=A*B) where A and B are 200×200 matrices. Results were:
- Pure Python: 5,3 seconds
- C: 0,09 seconds
- NumPy: 0,01 seconds
I was happy to hear that it wasn’t always fun and games with Python. Jussi listed a couple of challenges with Python, one being the overhead of import statements and the lack of debugging and profiling on parallel Python code. Python initialization generates loads and loads of small file I/O, which in GPAW’s case turned out pretty extreme as there were thousands of parallel Python interpreters running. One per CPU, as they were running it on a super-computer with thousands of CPU’s. They resolved this problem by creating a custom Python-interpreter which would handle all the imports and then populate other interpreters via MPI. At this point my non-existing nerdy-looking glasses were fogged with excitement.
My favourite talk.
“Understanding Encodings” – Ezio Melotti
Ezio Melotti is a CPython Core Developer. CPython is the most common implementation of Python.
Ezio started the talk by telling about ASCII, the first character set consisting of 128 characters in 7 bits. He then proceeded by ISO-8859-1, where the 8th bit was put to use for “special” characters. That wasn’t enough so ISO-8859-2 and 14 others were introduced, where the 8th bit was used for different kinds of special characters, which no other ISO-8859 had. Problem here of course was that you couldn’t mix these. And this is how Unicode was born. It includes all characters, even the most less known characters and it consists of 16 planes and 1114112 characters (U+0000 to U+10FFFF).
Encoding maps the character to a bit sequence, decoding maps the bit sequence to a character. Ezio showed this on a quite low-level manner. Python always uses UTF-8 and Ezio gave some advices for handling different kinds of character sets. Encodings and the problems related to it aren’t exactly fun while programming. Useful talk for everyone.
Keynote – Mike Bradshaw
This was the last talk of the day. Mike’s keynote was about building a community and how to keep it active, which is relevant to Python Suomi ry as it’s fairly new. One of the funniest talks of the day. Hopefully they recorded this talk as well.
This was a great experience for me and I met some cool people. Next year the conference is probably held somewhere else than in the ICT-building as it got a bit crowded during the day. Also next year there will be more people than this year.
Thanks for the organizers. I will be there next year, hopefully.