pippi longstrings


Here's a new project, some of you might have already heard about it: Pippi Longstrings. Just like Bonobo (currently also off-line) this is part of le(n)x a set of tools to empower citizens in the legislative process. We're awaiting replacement of unstable hardware, but hope to have the problems sorted out soon and can start operations on http://pippi.euwiki.org. Until then there are some cached results from various stages of the development (the newest ACTA is a good example), so please excuse the varying quality of the docs in that cache.


The original idea came from our team-member Erik Josefsson. Unfortunately he is not a lawyer, so he came up with the idea to make legislative texts (laws) more comprehensible for non-politicians, by looking for text-blocks that are copied from one document to another. These text-fragments act as memes that carry the most important legislation into new laws and other legal documents. By classifying these fragments and 'translating' them into short summaries, it is possible to ease the burden of reading such documents. Reverse-engineering the EU is something akin to having only the legal code of a Creative Commons license, but not the deed (a simple one line explanation - see an example for a deed) nor the icons, using Pippi we try to reduce the code into deeds and possibly also icons for easier understanding for citizens, activists, advisors and politicians themselves.

Just to give you an example of the above, the deed of the previous paragraph could say: "we look for copy/pasted texts (the longer the more interesting - hence the name: Pippi Longstrings) and try to translate this into short, comprehensible summaries".

Tracing sources

Among other this can also be used to track the sources of such fragments. Previously there have been interesting pippies discovered by manual inspection:

  • the FSFE found out, that parts of the European Interoperability Framework have been "authored" by the Business Software Alliance.
  • parts of the EU Telecom Package have been written by Telecom Italia (Amendment 542)
  • and parts of EU laws (like the IPRED directive) have been included in Trade Agreements with Canada, Korea, scores of Caribian countries and possibly also India (more on this later).

There is a benefit for policy advocates as the above examples show. For non-legislative purposes Pippi Longstrings can also be used for tracking and translating memes in contracts, terms of service agreements and EULAs, similarity to the EFF's great Tosback service.

National implementations

Another important use-case for Pippi Lonstrings in regard to EU laws is the analysis of adoption of these laws into member states law. All EU laws are automatically translated and published in all 23 languages of the member states. So when a member state adopts a law, we can check whether they adopted the verbatim translation or changed bits and pieces while adopting it into national law. In the later case it is definitely interesting to analyze the reasons for the deviation from the original EU translation.

Current status

Pippi Longstrings is currently running as a closed beta. If you're interested in analyzing a certain document against the current European corpus of regulations and directives please suggest mail them to longstrings on ctrlc.hu.

We have about 40 documents that are on our list waiting to be processed. These are related to Internet, privacy, copyright topics and trade agreements, but we are looking for more docs to analyze.

Currently the processing of a doc takes a couple of hours against the whole EU corpus of law, the list of docs to be processed is being prioritized by us, until we succeed in adding a feature for user initiated processing and/or get donation for lots of powerful hardware. As an alternative, you can get the code which is completely free according to the Affero GNU Public License and operate a Pippi service yourself.

Document formats

Pippi Longstrings is part of a set of tools to reduce entry-barriers to participation in the European legislative process. Even though the European legislative process is obliged to be transparent, some - some of the most important - issues are shrouded in secrecy and so we need to rely on low quality PDF leaks for analyzing and reacting to them. These leaks are usually scanned PDFs which do not lend themselves for automated analysis. There is a grave need for solutions that are able to transform these PDFs into high-quality machine-processable documents (semantically correct HTML or ODF preferably). Currently we rely on crowd-sourced transcriptions mostly done by the Telecomix crew and La Quadrature du Net, but the EU-India trade agreement leak has not been transcribed yet by anyone - while it is surely a very interesting document. If anyone can help us to get these transcriptions more effectively, please share your tools, resources or whatever. It would be nice to have transcriptions done by recaptcha for example, google are you reading this?

If we have such a transcription or the original document is not a PDF, then we are able to produce some nifty diffs between different versions of these texts (think ACTA) so that we can track the negotiations without being admitted to them. The generation of these diffs is not as easy as you might know from software development though. Producing such diffs has involved a lot of manual labor to align the paragraphs properly for the most comprehensible results. The preprocessing prior to a diff for a typical ACTA leak or release takes between 5-10 days for a single person.


Our nearest-term goals are the introduction of commenting on (translating/summarizing) pippies, possibly also integrating marked up texts into the fantastic co-ment.org service. Also we are going to start a dedicated blog for pippies (longstrings.soup.io), where we are going to give summaries on the results of pippifications. If you request us to process a doc, you should also be ready to write a blog-post on the results in return.

As such we also intend to integrate Pippi with Eriks other very nice tool Tratten, which does track issues from the beginning of the legislative pipeline.

If you want to support the ongoing work consider signing up and donating via . We're also participating on Mozilla's Drumbeat project project, go and vote for us.

Big plans, lot's of things to do, let's not waste time. Please submit interesting docs to longstrings on ctrlc.hu.

Thanks go to a lot of supporters: amelia for setting us up, jaywalk for hosting us, asciimoo for his coding, erik for the idea and general support, jz for useful criticism, and the telecomix guys.


next posts >
< prev post

Proudly powered by Utterson