Monday, May 16, 2011

Porting to Python3

I put off converting my Python code to Python3 for the usual reasons, including lack of 3rd-party library support and perceived issues with Unicode. But not all my work are applications; a significant portion are crypto libraries. I should at least convert them to Python3 so I do not become part of the problem.

When hashing data for digital signatures the data must always have a consistent representation, be of consistent length, and have zero endian issues. Change anything, even a single bit, and the hashes will not agree. While hashing Unicode is technically possible, the community is not sufficiently well versed on how to [consistently] do this correctly. (If in doubt, serious expertise should be consulted.) Likewise, software tools are not standardized, can produce varying results, and interoperability is a major requirement.

In acknowledgement of this state of affairs, today's conventional wisdom says we should be hashing text in the lowest common denominator, ASCII text or perhaps Latin-1 single-byte encodings. KISS. Since hashes are seldom used in isolation (cryptographically), all my other crypto routines need to have consistent data passing protocols, the simpler the better. Unicode does not [easily] meet this requirement.

After a couple of half-hearted false starts I decided it best to start over, beginning with some serious homework. I chose for my first "for-real" conversion a relatively simple Blowfish crypto library, blowfish.py, and its test procedures. Here are a few lessons learned from that exercise.

Unicode may be a boring topic but do read these first.
    http://diveintopython3.org/strings.html
    http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF
    http://diveintopython3.org/porting...with-2to3.html
And as usual, there's more via Google.

Converting blowfish.py took a while. Some of that was the learning curve, but in truth, getting it to work didn't take that much time. Defining simple, clean, efficient idioms, patterns, and guidelines did. Time spent here should make subsequent conversions much easier.

Perhaps not new to the community, but here is a short summary in terms that work for me.
  • If textual data (perhaps reproduced from printed materials) appears in source code, is likely to be displayed or printed, or is otherwise associated with human consumption, be sure it is str (Unicode).

  • If data is associated with processing, is to be passed as an argument or as a return value, or is to be communicated to other programs or stored in files for later processing, be sure they are bytes (bytestring).

  • If test values are defined in hex, leave them as str for ease of importing into the code and for general readability, but convert them to bytes with .encode('latin-1') and unhexlify() before processing.

  • If bytes need to be converted to str, use .decode().

  • If bytes need to be converted to hex, hexlify() and convert to str with .decode().

  • If doing crypto and 8-bit byte bytestrings are important, consider the 'latin-1' encoding (AKA iso-8559-1). The high-order Latin characters may not always print the same across all operating systems, but latin-1 will always provide an 8-bit byte representation for all values 0..255. (Both 'ascii' and 'utf-8' are a single byte for 0..127 but the values 128..255 get converted into a two-byte representation. Not good. Ignore 'utf-16' and 'utf-32'.)

  • Be on the lookout for Unicode as the result of some default action somewhere. Databases are one common source. I worked with one DB that, unbeknownst to me, accepted byte strings (seen as ASCII characters), converted and stored them as Unicode, and returned UTF-16 when selected. The before and after hash values were very different. Consider using raw hex dumps/views when things don't make sense as ASCII text and Unicode will often print the same. The DB I was using had parameters I could set, but use extreme care if your DB is already populated with data.
Additional code changes will be necessary, some supported by 2to3 and some are very manual. Thus far the most awkward one was zip(). More on this later. After you get things working under Python3, go back and try to get backward compatibility with Python 2.6 and 2.7 by adding
    from __future__ import print_function
Mod and morph as appropriate. There will be a few exceptions but do try to get the same code to work under Python2 and Python3. (Don't forget to re-test again when finished.)
 
Later, I ran blowfish.py with the time command. I was somewhat surprised with the results. (Subsequent re-runs provided very similar times.)   
          python 2.5     4.276s     (Python2 code, 32-bit w/o psyco)
          python 2.5     1.800s     (Python2 code, 32-bit w/psyco)
          python 2.6     2.655s     (64-bit)
          python 2.7     2.705s     (64-bit)
          pypy (2.7)     2.192s     (64-bit)
          python 3.2     1.783s     (64-bit)
I expect the results to be skewed even further when I encrypt larger data.
 

Labels: ,

Thursday, March 10, 2011

Python versions posted

I have posted pure Python versions of Salsa20, ChaCha, BLAKE, and SHA-512 (which also supports SHA-384 and the new NIST algorithms SHA-512/256, and SHA-512/224).

Update (May   9): added Blowfish
Update (May 17): started a folder for Python3 conversions
Update (May 30): added a Python wrapper for C version of BLAKE
 

Labels: ,

Saturday, December 18, 2010

Erase keys and credit card numbers in Python

Not to start a long discussion on system security and the advisability of doing crypto in potentially compromised environments, suffice it to say it is still good practice to erase keys, credit card numbers, and other sensitive information when no longer needed. Overwriting the sensitive content with garbage will not prevent leakage but it can reduce the likelihood.

Unfortunately Python's assignment statement
  key = "Kilroy was here!"
    ...
  key = "qwerty" 
does not overwrite previous string values. True, key now points to "qwerty" but this is a new string. The old string is not overwritten and still resides in memory flagged for garbage collection. When it will be collected and reused is indeterminate, and even then, there is no assurance it will be overwritten. This problem is common to many scripting languages, not just Python.

Python's ctypes' c_buffer() has an interesting property. Operations on the buffer's contents occur in a fixed memory location making it attractive for the later clearing of sensitive content. Consider this:
  from ctypes import c_buffer, addressof

  TEMPLATE = '  %s: key location: 0x%X, value: %s'

  # instantiate key
  key = c_buffer(16)
  print '  key: %s' % key

  # set the key value
  key.value = 'Kilroy was here!'
  print TEMPLATE % ('set', addressof(key), key.value)

  # use the key in some way
  pass

  # overwrite the key
  key.value = '-'*16
  print TEMPLATE % ('clr', addressof(key), key.value)
If you are using ctypes to access an AES routine written in C, simply pass the variable key which, as it turns out, is essentially a pointer to the character array. The above approach also works for other data types such as int and ulong.

Enjoy.

Labels: , ,

Sunday, September 27, 2009

A Stick Figure Guide to AES

Jeff Moser wrote/tooned an excellent, gentle introduction to AES.
  http://www.moserware.com/2009/09/stick-figure-guide-to-advanced.html
 

Labels: ,

Monday, May 26, 2008

Salsa20/12

I've been off the grid for the last couple of months, heads-down on a project. ...a fun and challenging project. Now that I've come up for air, I notice Salsa20/12 (Salsa20 with 12 rounds) has been selected one of the eSTREAM finalists. Congratulations to Daniel Bernstein.

I conclude from my re-read of the final report and other materials that Salasa20/12 is indeed the preferred software algorithm for Profile 1. Depending on where you look, Rabbit and Sosemanuk also received high marks, but it was Salsa20/12 that was consistently on top. ...and because Rabbit has commercial-use IP restrictions, Sosemanuk easily takes second place. (eSTREAM was careful not to get involved in the politics of being so explicit.)

While I've played some with Sosemanuk, I've actually been using Salsa20 for a while now. I'm pleased. One question though, ought not it be named "Salsa12" to help avoid confusion?

More on Salsa20 and other eSTREAM ciphers....
    Fast stream ciphers from eSTREAM
    Python access to CryptMT, Dragon, HC, LEX, NLS, Rabbit


Updated: 6Jun08
 

Labels: ,

Wednesday, March 05, 2008

Make OpenSSL CA, SSL Server and Client Certs

Even at this late date there is still confusion on how to get OpenSSL to generate a CA and SSL certificates. So, here is a script that I hope will answer some questions. ...there will undoubtedly be more.

I also have an ECC version I will upload in a few days. It was written some time ago and needs to be reviewed.

Update 3/6: Here is an ECC script. It points out a couple of likely bugs with OpenSSL. First, an OpenSSL ECC CA always signs its certs with SHA1 regardless of curve, what you specify in the command, or what you define as the default_md in openssl.cnf. This is not the case if the CA uses an RSA key. Second, if you use openssl ecparam -genkey to create a key pair, you cannot secure the PEM file output. You have to follow with a second command openssl ec to encrypt the private key with AES. ...but you have already written the key to disk. Oops!

Update 3/10: OpenSSL 0.9.9 indeed has a fix for the SHA1-only self-signed certs. The catch is 0.9.9 is still in development (the making of .dylib files fails and make test fails on one of the new TSA tests), but prognosis is good.

Update 3/20: The OpenSSL 0.9.9-dev daily snapshot is meeting my needs very nicely now. openssl (the executable) can now sign certs using ECC and the SHA2 family. I can create the OSX .dylib files if I disable the x86 asm accelerations using the -no-asm switch, and Python with M2Crypto has so far not shown any problems. The linker problem necessitating no x86 asm acceleration is my only outstanding issue. Sweet!

Update: Using config's -shared switch seems to also cure the asm problems.
 

Labels:

Sunday, November 04, 2007

Python access to CryptMT, Dragon, HC, LEX, NLS, Rabbit (eSTREAM ciphers)

The last couple of months have been heads down and very long hours with my day job. I'm back now to personal projects and put together another Python wrapper for these ciphers using the eSTREAM APIs. See http://www.seanet.com/~bugbee/crypto
 

Labels: , , ,

Monday, June 11, 2007

Fast stream ciphers from eSTREAM

Several years ago the EU (European Union), through a project called NESSIE, analyzed and recommended a number of cryptographic primitives. You can learn more about NESSIE from the links below. One category in which they were not satisfied and made no recommendation is stream ciphers, so they launched a new project, eSTREAM, expressly for the evaluation of stream ciphers. eSTREAM is now in Phase 3, the final evaluation phase, and there are a number of promising candidates.
    NESSIE
        http://en.wikipedia.org/wiki/NESSIE
        https://www.cosic.esat.kuleuven.be/nessie/
    eSTREAM
        http://en.wikipedia.org/wiki/ESTREAM
        http://www.ecrypt.eu.org/stream/

Personally, I like Salsa20. I benchmarked Salsa20 as encrypting over 100 MB per second on a lowly 1.5 GHz G4 PPC mini Mac. ...128-bit encryption strength, memory resident data (no I/O), and before Python's GC kicked in. This is more than 3x faster than AES-128 on the same machine. Others I like are Sosemanuk and Phelix, but Phelix didn't advance to Phase 3. Ugh. Well, two out of three isn't bad.

The core routines are written in C so I wrote ctypes wrappers to access Salsa20, Sosemanuk, Phelix and others. I'm making that code available, free for any use.
        http://www.seanet.com/~bugbee/crypto

Enjoy.
 

Labels: , , ,

Sunday, June 10, 2007

A Python ctypes wrapper for LibTomCrypt

Recently I wrote a Python ctypes wrapper for LibTomCrypt. I'm making that code available, free for any use.

pyTomCrypt v0.20 implements most of Tom's crypto library:
    - public key algorithms: RSA, DSA, ECDSA, ECDH
    - hash algorithms:
      md2, md4, md5, rmd128, rmd160, rmd256, rmd320,
      sha1, sha224, sha256, sha384, sha512, tiger, whirlpool
    - symmetric ciphers:
      aes, rijndael, twofish, blowfish, des, des3, cast5,
      kasumi, anubis, kseed, khazad, noekeon, rc2, rc5, rc6,
      xtea, skipjack
    - modes: ecb, cbc, ctr, cfb, ofb
    - MACs: HMAC, OMAC, PMAC, Pelican, XCBC, F9
    - PRNGs: fortuna, rc4, sprng, yarrow, sober128
and is based on:
    - libtomcrypt 1.17
    - libtommath 0.41 (default)
    - tomsfastmath 0.12 (optional)

See...
    http://libtom.org
    http://www.seanet.com/~bugbee/crypto

Enjoy.
 

Labels: , ,