Why Python / Pandas?

If you do Search PPC in e-commerce or for clients in the travel industry you know how fast you get to account sizes that go far beyond Excel’s 1 Million rows limit. And even if you are way below that number regarding your keyword portfolio, at some point you might want to analyze data with different segments for device, search network, over weeks, days or even on hourly basis. If your account has only 10.000 keywords with Impressions and you want to do a timeseries analysis on this data for just over the last 10 days, segmented by day then you already get to Excel’s limit’s.

Besides from that, Excel tends to slow down even with less data. Simple calculations can take minutes and Search & Replace runs cell by cell to do a simple operation like “Replace X with Y”. We got used to this taking time just because we don’t know other ways. Many PPC managers complain that their computer is just too slow.

But it is not the computer to blame. It is Excel and its algorithms. The tool is just not made for large-scale PPC operations which could actually be done much better in a real database environment. Excel’s default algorithms are made for an interactive GUI frontend that Excel is. The truth is: If your computer used a different algorithm of working with the data the calculation would speed up by  10 to 1000 times.

The combination of the Python programming language combined with the Pandas library is a very good alternative to working with Excel on large datasets. Python alone would not really help you as you would need to write the necessary code to do all the usual tasks that you handle in Excel now completely from scratch. Probably these would be even slower than Excel. This is where Pandas enters the stage:

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. (Pandas Website)

Critical parts of the code that usually lead to slow performance are witten in “C”. The library is extremely fast. Fast enough for all the tasks you struggle in Excel with.

The other argument for Python / Pandas in PPC besides from speed is repetitiveness. What you do in Pandas is real programming. Your can write complete programs doing your daily boring tasks. In the beginning you might need more time than doing all the steps once in Excel. But as soon as you have finished your program in Pandas, you are able to run in on another input file in a week again without doing any of the steps before. Additionally you can send the script to your colleague and he can also run the script on his computer to do the task on his accounts.

With Pandas you can do all the tasks a PPC manager has to deal with:

  • Build full accounts from scratch.
  • Create, remove and change ads and keywords in existing accounts.
  • Aggregate and analyze large sets of data.
  • Create Graphs and tables.

Install Anaconda

For this tutorial we are going to use the easy-to-setup Python platform Anaconda. Anaconda includes all necessary libraries we need to use Python for PPC. Download the platform for Python 2.7 from the website and install it on your computer. There are versions for Windows, OSX and Linux available.

The iPython interface

After installing Anaconda, open a terminal. In Windows you can find the Termial by pressing “Alt + R” and entering “cmd”.

Run iPython Notebook

Your browser will open a new tab with an iPython notebook. This is our development environment.

Pandas Hello World

The Notebook consists of “cells”. A cell defines one block of code. Each cell can be run either completely from top to the bottom of your program or individually one by one. You can copy, paste, modify, merge, split or move cells around in your program. Enter ‘hello world’ (with Apostrophs!) into the first cell and press “Shift + Enter”. Python runs the cell and shows the output. Below a new cell is created. Enter the code shown in the screenshot and hit “Ctrl + Enter”. Now this cell gets run without creating a new cell.

The possibilities of manipulating cells can lead to much faster development of scripts. Imagine this:

You want to write a small script which creates a new column in your downloaded data from AdWords. This column contains some label you want to set according to the campaign name.

Cell 1: Code to import the necessary libraries.
Cell 2: Code to import the CSV you downloaded from AdWords.
Cell 3: Function to add the new column and put a label according to the campaign name.
Cell 4: Export the data to a CSV file for importing into AdWords Editor.

After running the program and importing into AdWords Editor you notice a mistake. You accidentially included some campaigns into your labeling that are Test Campaigns. But these should not get the label.

In iPython you do not need to run the full program again in that case. You can just change Cell 2 and add one line of code to it, which excludes the Test campaigns. Alternatively you might add a new cell between Cell 2 and Cell 3 with the code to exclude. Then you just run Cell 2 up to the last cell. Done.

This is just a very simple example. If you create bigger scripts that do many different operations to your data, you might find this kind of development in iPython much faster than the traditional method with just a text file.

In the next part of this series learn how to import AdWords data into your iPython notebook.

(Title Image: Creative Commons 3.0)