For some of us October means Oktoberfest, but for some it’s PyCon Finland time of the year. PyCon Finland is a Finnish Python conference. I’ve attended PyCon Finland in 2011 and 2013. Honestly, I’ve tried to attend every year since 2010, but I’ve always run out of luck on years modulo 2.
This year the event was held at 19th of October in HTC Helsinki. I don’t know how many attendees there were at previous events, but this year there were 150. I could be wrong but it feels like the amount of attendees has doubled in 4 years. The event was split into two tracks without a specific theme. All talks were also recorded. I’ll add a link to the talks once the recordings have been published.
Here’s a brief summary of some of the talks I attended to and what I thought about them.
Keynote: Jyrki Pulliainen (Spotify) – Solid data structures in Python with logs
Like previous years there was a talk by someone from Spotify. Jyrki happens to work there as a Software Engineer. These talks have always been the best ones in PyCon and this wasn’t an exception. The talk presented how Spotify has moved from the classic application-centric scaling to a more distributed model.
In an application-centric model more and more bells and whistles are implemented in the application as you grow. A simple web application with a single database can easily become a monster application with separate systems for searching, caching, analytics and notifications – all connected to your application. Complexity grows enormously.
Spotify solved this problem by using Apache Kafka. Kafka is a distributed messaging system developed and used by LinkedIn. The infrastructure and terminology is very similar as with any other message queue, such as Apache Qpid or RabbitMQ. You have a broker, which maintains and persists messages by different topics. Then you have producers who create the messages and consumers who consume them. Topics can be partitioned (or sharded, if you fancy the term) across several nodes. So, instead of connecting separate systems to your application, you just integrate everything to Kafka, and do your deeds in message queues. Just don’t put binary data in there, like cat pictures.
The cool thing here about Kafka is that it ensures queue order. This is why Spotify chose Kafka. For example, when a new search index node is presented it can just drain all messages from the queue starting from the beginning of the index. When the drain is completed the search node is already warmed up and ready to serve.
It was interesting to hear how things have developed at Spotify. It appears they are shifting their systems from hack-ish Python/Twisted systems to Scala-based JVMs, apparently (and not surprisingly) due to performance reasons. See my post from 2011 about their previous infrastructure model, if you became curious.
Matteo Cafasso (F-Secure) – Hunting Malware with Python
Matteo works as a software engineer at F-Secure Labs. He presented how F-Secure uses Python-based automation to analyse malware. The core component of the automation is called SEE (Sandboxed Execution Environment), which supports multiple virtualisation platforms. By defining an execution environment, the tool creates a clone of a virtual machine, executes defined so-called scanning engines and deletes the virtual machine. This framework enables the use of different behavioural scanning engines. New engines can be quickly built to match a specific need. Matteo said that the framework will be open-sourced to F-Secure’s Github soon. As of writing this it’s not available yet.
But before becoming a lone-wolf malware researcher a couple of things should be considered. Matteo used a “red laptop” in his talk, meaning that the laptop could not be connected to any device or network ever. If an USB stick, for example, is connected to the laptop, it becomes “red” too. Extra precautions should be considered when working with malware since it’s impossible to tell what the malware actually does, prior to research of course. Typically malware is examined in a standalone environment without Internet access as external connectivity is irrevelant in most cases. However, these days malware programmers know that their code is being examined in sandboxed environments. The malware then, for example, creates intentional memory leaks in an effort to crash the environment. The virtualisation platform might have vulnerabilities, too, what the malware could then use to escalate itself from a virtual machine to the actual host machine. A prime example of a vulnerability like this is the VENOM vulnerability in QEMU.
The talk included an extremely interesting demo. Matteo presented how actual real malware, in this case Cozyduke, ZeroAccess and CTB-locker can be researched with the framework. The SEE was configured to use Volatility framework for memory forensics, Wireshark for traffic capture and libguestfs to determine which files were changed between the virtual machine snapshots. It was very interesting to see how different all these three were, how they injected into the machine and how they were even programmed to support multiple processor architectures.
This talk was probably the most popular one as the venue was pretty much full. It’s rare to see talks that go this deep.
Alexander Bokovoy (Red Hat) – How to become enterprise-friendly with free software
Alexander gave a presentation at PyCon Finland 2011 about Samba 4 Python Bindings. He works for Red Hat. This time Alexander’s talk presented a way of handling federated identities in the enterprise world. Typical enterprise-approved protocols for authentication/authorization are Kerberos and SAML-based solutions, probably due to high-grade security and single-sign on support. Kerberos can be used for pretty much anything, ranging from logging into your workstation to SSH authentication or signing into a web application. SAML is a different kind of beast, mainly used by browsers when authenticating to a web service.
Kerberos is well supported by Apache httpd (krb5 + mod_auth_kerb) and so is SAML (Shibboleth + mod_shibboleth). However, Alexander argued that configuring and installing these components is too tricky. This is why Fedora Project has started a project called Ipsilon, which is a “dead simple” Python-based Identity Provider (IdP) / Service Provider (SP). I’ve configured quite a few Shibboleth instances (both SP and IdP) and Ipsilon could be worth a try.
The talk included a demo. Alexander used his own laptop to demonstrate how he connected to a Linux server via SSH by using his Kerberos ticket. He then used the same Kerberos ticket to sign into a SAML-secured website. A SAML authentication request was generated and the browser was redirected to the IdP. The Kerberos ticket was used to authenticate to the Ipsilon-based IdP, which then generated the necessary SAML assertions to sign into the original website. The webserver then passed all SAML attributes to the sample web application by using mod_auth_mellon.
It’s pretty much the most simple way to do things this way rather than implementing all authentication handling in the application itself. A web application can simply read the SAML attributes upstream. The actual user authentication part is trivial to implement in the application. For example, this is how we do this with ADFS and Atlassian Confluence.
Great talks, great people, great venue and great food. Definitely the best Monday in a while.