[Updated 06/11/2015 - new method!]

At some point in 2011, I decided to start reading the Linux Kernel Mailing List (LKML), so I subscribed to the list from my gmail account.

Those that know the LKML will know why my reading streak didn't last long due to the shere number of e-mails and the technical nature of them. The list has all patches and discussions related to Linux kernel development which can be very heavy going!

Fast-forward to 2015 and I realised I was running low on google mail storage...

80% gmail storage used 50 e-mails selected

... and after finding I had over 140k e-mails from LKML I thought I'd delete those and free up some space. This turned out to be a little harder than I had expected!

Method 1 - Clickity-Click

I figured out pretty quickly that although I could use the mouse and click on 'Select All' mail with the label "LKML", and then hit delete:

50 e-mails selected

However, that only deleted 50 e-mails at a time:

50 e-mails deleted

With that approach this would take a lot of time!

But wait - you can select all e-mails/conversations with that label and hit delete:

gmail timeout

Not so fast... this resulted in a timeout from gmail.

Method 2 - Automate it!

I knew at this point I should probably dig out an e-mail client or script to connect via imap and mark the e-mails as deleted as I was reaching a limitation in the gmail web interface.

However, I thought I'd have some fun before that! How about I enable keyboard shortcuts in gmail and then automate the keypresses?!? That sounded like fun and made me feel like a kid in school.

Enabling the advanced keyboard shortcuts via the gmail settings is required:

80% gmail storage used

After which I looked up the keyboard shortcuts and then looked for a tool to do the job.

Xdotool

It turns out 'xdotool' was the tool for the job, the man page puts it best:

"xdotool lets you programatically (or manually) simulate keyboard input and mouse activity, move and resize windows, etc."

With a little work I found out the key presses I needed were:

shift + 8 - enter keyboard shortcut mode
a - select all mails in current window
# - delete all the selected mail!

I only had to do that approximately 28,00 times to remove all the mail in the label 'lkml'.

So, I ended up with a little shell script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/bin/bash -x
WID=`xdotool search --desktop 0 "Iceweasel" | head -1`
xdotool windowactivate $WID

for i in `seq 1 2800`;
do 
  echo $i
  xdotool key --delay 10 shift+8
  xdotool key --delay 10 a
  xdotool key --delay 10 numbersign 
  sleep 8s
done

It took me a little while to work out #/hash or pound in the US was denoted by 'numbersign' on my keyboard, although I'm not really sure why.

I also had to make sure I was entering the keypress in the correct window, which is what the search for Iceweasel does (Iceweasel is the Debian name for Firefox) and then activates that window or brings it to the foreground. I then iterate the key press sequence 28,000 times. There is a dodgy sleep at the end as I was finding the gmail interface was returning a little slowly after a while and it took a good 5 seconds to return so I could then reselect the mail on screen.

I left this running for an hour and found that it deleted a fair amount of e-mail but even at 50 e-mails every 8 seconds or so, that's still 6.2 hrs to delete them all!

In addition to that, I found that after an hour or so I'd hit the same problem as before with the "Oops.. the system encounted a problem (#007)" error.

Method 3 - Do it properly

Of course I knew before I automated some key presses that I could enable gmail's imap support in the settings and connect a mail client like Thunderbird to the server and mark all the mails as deleted but that didn't really sound like fun.

The best way to do this in my opinion for this problematic folder is to use python's imaplib to connect to my gmail account and then delete the mail that way, simple.

Thankfully a quick google search revealed someone had already done the work for me with a script called "Gmail Label Purge".

Gmail-label-purge

"gmaillabelpurge allows you to delete all messages from given gmail labels when they reach a certain age. It uses GMail's IMAP interface, which means you have to enable IMAP on your account, and speaks Google's IMAP extensions."

"Gmail Label Purge" by Jürgen Geuter - https://github.com/tante/gmaillabelpurge

This python script did exactly that, purged labeled e-mails based on a cfg file and had both a perl and python version.

I cloned the repo onto a local system:

$ git clone https://github.com/tante/gmaillabelpurge.git

Created a simple configuration file, ~/.config/com.github.tante.gmaillabelpurge:

[DEFAULT]
username=<my-email-address>@gmail.com
password=<password>
[set1]
maxage=0
labels=lkml

For this configuration file I first had to create a gmail application password as I use two-factor authentication. When you use 2-FA with gmail you can also generate a application or script specific password that doesn't need your 2-FA credentials, which is useful for scripts or apps that don't support it. I created a temporary one which I removed after using the script.

Then all you need to do is run the script:

$ python gmaillabelpurge.py -v

This took a little while to complete - you could say a similar amount of time to that of automating the keyboard but at least the results were now reliable. The process which seemed to take the longest was searching through all the e-mail for the label, perhaps if I were to check for the imap folder which represented the label name it wouldn't have to spend time searching for the mail.

Does anyone know of a more efficient way to mass delete e-mail within gmail?

[ Updated on 06/11/2015 ]

Method 4 - Google Script

Google Script UI

Pierre Tardy mentioned in the comments that he couldn't get the gmaillabelpurge script working, so I went back and tested it again. It still works for me, but it seems to spend a lot longer searching for the label and purging the mail that it had done previously.

Pierre suggested Google script, which I'd never heard of! It looks very useful!

Google Script is a javascript interface which let's you code scripts around all of the google apps products.

Here's Pierre's suggestion:

function myFunction() {
    while(true){
        var threads = GmailApp.search('label:lkml', 10, 10);
        GmailApp.moveThreadsToTrash(threads);
    }
}

Breaking that down, the first part of the function loops forever, then it sets a variable called threads to search through all gmail using the GmailApp resource. This search uses the label lkml with start index of 10, end index of 10. This variable is then called with the GmailApp.moveThreadsToTrash function which does exactly what you think it does - moves them to the trash.

The key function is GmailApp.search which seems to be very well documented.

Obviously, this is only 10 e-mails at a time but as we've seen with the Gmail UI, large scans timeout. I played with this a bit and experienced a very similar error to that of the Gmail UI in Method 1, however 50 seemed to work. After leaving it running, I thought all was good.... however 6 minutes in and it failed.

There is some built in protection to google script in the form of a maximum execution time - 6 minutes it seems. There is a triggers feature that allows you to trigger the script based on time, much like cron so for example I could call the script every 10 minutes which means it would then run again and again. I'm going to have a play with that and see how far I get...

Thanks Pierre!


Comments

comments powered by Disqus