Exporting Evernote notes to plain text files – Part 1

EvernoteEvernote. I’ve loved it. I’ve hated it. I’ve been ambivalent toward’s it. Currently, I’m disenchanted with it for a number of reasons.

When I started working with Evernote, I was looking for something to simply keep notes in which would sync across platforms. It does this very well. Originally, I was attracted to it because it was cross platform relatively light weight and the clipping tools seemed to be interesting. But I never felt like I could easily put something into Evernote as a quick note on any of the platforms.

I know that it’s got quick add functionality on both the Mac and on Windows and that there are tons of options for getting things into Evernote on iOS, but I never took to any of these options. Maybe it’s me, but I just don’t use menu bar quick entry very often because I’m not just adding snippets, but need to take notes in meetings etc.

But getting data into Evernote is not my beef. My concern with Evernote is getting my data out of it. It’s becoming increasingly important to me to have that data accessible to other applications and more importantly to me in the long term. I’ve long been a fanboy of ASCII text. I can’t stand it when someone sends me an email with an attachment that contains nothing but words that should have been in the body of the message.

Evernote holds my data hostage in it’s local database and on it’s servers. Sure, I can access it with their website and the clients, but what if I want to get all my data out of it? Evernote provides two options for export HTML or .enex file. Inconveniently, there’s no option to export as ASCII text files.

The HTML export is really of no use for a couple of reasons. First, it’s fucking HTML. That means it’s a pain in the ass to edit or read without a browser. Second, you lose a ton of data surrounding the note, most importantly the note’s tags and the notes create date. Even if I strip out all the html, I’m left with a text file without any context such as date, time, or where I was when I put the note into Evernote.

The .enex export at first looks daunting. It’s one file with all my notes. Evernote registers itself as the application to open this file. I wondered if that would be a problem: Is it binary? Is it proprietary? Quick investigation with the terminal revealed that while it is proprietary, it’s not binary. It’s a proprietary xml file.

katahdin:Desktop damien$ file My Notes.enex
My Notes.enex: XML  document text
katahdin:Desktop damien$

Okay, this is something that could be useful. I decided to look at it and see if I could break it up based on some of the xml tags. It turned out that each note is contained within a set of tags …data… delimiting the beginning and the end of the note. So, if I could split the file based upon these tags into multiple files, I’d be well on my way to getting my notes out of Evernote in a plain text format.

I work on a Mac, so this post is geared toward a Mac User. Given that OSX is based on a unix kernel, there are tons of great tools available on the command line (terminal). If you’re a Windows user, many of the tools that I’m using are available in the cygwin distribution. If you’re a Linux User, you should find all these tools in your distribution of choice.

One such command is csplit which is useful for splitting large files based on context. So, I did the following to get to a point where I had a number of discrete notes files:

  1. Export Evernote Files as .enex file. Make note of number of entries reported by Evernote exported.
  2. Rename Evernote export 1.xml
    mv My Notes.enex 1.xml
  3. Run the csplit command on the with the options to create the files with the name file0000 (enumerated for each file). In my case I had exported 241 notes, so I also added the option to recursively do this for each file up to 240 times. (You need to do this one less than the number of exported notes or you won’t end up with any files because the last csplit operation will fail and no files will be written to disk.)
     csplit -f file -n 4 1.xml /‘’/ {240}

This left me with 242 distinct files named file0000 … file0241. The file “file0000” contains everything up to the first opening tag – basically all the Evernote xml file definition headers.

katahdin:temp damien$ more file0000
<?xml version=“1.0” encoding=“UTF–8”?>
<!DOCTYPE en-export SYSTEM “http://xml.evernote.com/pub/evernote-export3.dtd”>
<en-export export-date=“20130619T130845Z” application=“Evernote” version=“Evernote Mac 5.1.4 (401297)”>

 

The first note starts in file0001 and the last note of the export was in file0241. Now I had individual xml files for each of the notes in my archive, but each file has create time of the time that the csplit command was run. I wanted to find a way to change the create date of each file to match the create date of the note. I knew that I could do this with the unix tool touch but I needed to find a way to get the time stamp from the notes to pass to touch.

I found that there was a tag in the files that listed the time in the format of 20130617T175528Z, but it was embedded within a line that included a number of other things.

I’m not a perl or python guru, so I wanted to find a way to get the tag on a single line which I could easily grab with grep. This was easy to do with the tidy command. Tidy cleans up and reformats a number of different markup languages including html and xml. One of the things it is really good at is placing each tag on its own line in a file. Tidy is included in OSX.

So I ran tidy on the files with the flags to identify each file as an xml file and to rewrite each file as part of the tidy clean up.

tidy -xml -m file*

This placed the created tag on its own line in the file. The next challenge was to reformat the date string into the format that touch could use to reset the create time on each file. To do this, I used grep and a few instances of the tool tr. This tool translates or deletes matched characters on a single line. I crafted the following bash script that creates a variable $CDATE which is passed to touch as the date string to be used to reset the create date on each file in the directory. $CDATE is defined by matching on the <created> tags, using tr to cut out all the xml, using fold to split the resulting line at the 12th character and using sed to rejoin the resulting 2 lines with a . in between the 12 and 13th characters.

#!/bin/bash
for f in file*

do
    CDATE=`grep created "$f" | tr -d [:alpha:] | tr -d [:punct:] | fold -w 12 | sed '$!N;s/n/./'`
    echo $CDATE
    touch -t $CDATE "$f"
done

The resulting touch command is applied to each file where the date format represents the same date that was contained in the tag in the xml.

touch -t 201306191024.58

At this point, I’ve got 241 files with the correct create time stamp on each one. There are a few items I’m still working on. Namely, I want to extract the title of each note and use that as the filename and I want to remove all the xml code from each file. Once I have that accomplished, I’ll have exported my Evernote notes into usable text files.

I have a feeling I’m going to be spending some time working with either perl or python to accomplish the last bits of this project. If you’ve got any good ideas on how to do this, I’d love to hear them in the comments.

8 thoughts on “Exporting Evernote notes to plain text files – Part 1

  1. This is pretty amazing! I’m surprised I’m the first to comment. If you could turn this into an app, I know there would be a huge market for this… so many people are (rightfully) suspicious of EN because the files are locked in, closed system etc.

    I wish I could help but your technical wizardry dwarfs mine, so all I can say is KEEP UP THE GOOD WORK, you will be helping hundreds of thousands of people!

    Like

    1. Thanks for the comment and encouragement. If I get it together and figure out how to make this all work in a single app it will either be a perl script or a python program. In either case, I would release it to the public domain free of charge.

      Like

  2. I don’t use the Evernote app, insted, I use everpad provider for linux desktop; which saves notes in sqlite format. I’ve made an script to export my notes using csvtools and pandoc, but I didn’t count on different books or tags in the notes, for i weren’t using it.

    I hope you luck doing your way, anyway, if in a hurry and getting everpad installed in a linux desktop, you can use mine’s: http://fompi.net/extract-your-evernote-contents-to-text-files-using-everpad-sqlite-database/

    Liked by 1 person

    1. Keep up the good work guys!! I wish i had even the slightest bit of technical knowledge to contribute to this, but again – I think there is a huge market for someone who can successfully break the sandboxing of Evernote at least a bit – I for one would be willing to pay for such a solution and I know I’m not the only one!!

      Like

  3. Hi,

    there is one great perl tool for splitting up xml files: xml_split.
    On mac you have to do the following:

    • XCode and Developer Tools must be installed

    sudo perl -MCPAN -e shell

    Initialize and commit the changes (url for cpan etc.)
    o conf init
    o conf commit

    Upgrade CPAN
    sudo perl -MCPAN -e ‘install Bundle::CPAN’

    Now you can install several packages with

    sudo perl -MCPAN -e ‘install Bundle::Name’

    so do the following:

    sudo perl -MCPAN -e ‘install XML::Twig

    XML::Twig holds the tool xml_split

    Now you can split the files with:

    xml_split 1.xml

    which will create something like:

    1-01.xml
    1-02.xml

    Regards,

    Stefan

    Like

  4. I found that I could convert evernote files to text simply by sending them to my email client (I use Thunderbird) and then doing the “Save As” in text format. I also found that I could email multiple notes en masse in one email. This method will probably work for just about any email client that will allow you to save emails in text format. Note that any font formatting, pictures and any attached files will be lost when saving as text, of course.

    Like

  5. I thought it was free and universal, but I was mistaken. Great blog. Happy to find people like this, as I hereby wrote a python script to export notes out of this enex building, with some pretty basic formatting.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s