Even though CR2 and FITS files both seem to be very common, unfortunately you just can’t simply google and find out how to convert between each other. So after a lot of googling, here’s my solution to the problem using Python with Numpy, PyFITS, Netpbmfile.py, and dcraw.
Actually if you take a look at dcraw’s homepage, you’ll see that it says you can use the following code to convert cr2 files to fits :
$ dcraw -c crw_0001.crw | pnmtofits > crw_0001.fits
But this wasn’t the case for me, since I insisted on going for 16 bits, the pnmtofits tool gave me bufferoverflows and other crap.
Here’s my solution.
$ dcraw -6 -c RAWDATA.cr2 > ThePPM.ppm
Ok this was easy. The -6 option says that we insist on our output to be 16 bits. The -c tells the program to write the output to stdout. Well, since the default output of dcraw is ppm, we redirect it to a ppm file.
After this, I needed a way to handle 16 bit images with Python. Unfortunately the Python Imaging Library doesn’t support 16 bit files. There’s this PythonMagick library which is a wrapper of the C++ bindings of ImageMagick named Magick++ but unfortunately it is a pain in the ass to get documentation for the library. So it seems both PIL and PythonMagick are out of the way here.
Other than that, I’ve found a library called GDAL which they say also handles 16 bit images (but in TIFF format, which is not an issue since dcraw can create 16 bit TIFF outputs with the -6 -T options) but using GDAL didn’t seem to be that clever since it comes with a lot of side effects. (GDAL stands for geospatial data abstraction library)
So, I’ve started looking for ways reading 16bit ppm data with Python, and luckily Christoph Gohlke has written a script for that, netpbmfile.py
So here’s a little snippet for you :
from netpbmfile import *
im = NetpbmFile("ThePPM.ppm").asarray()
Now we have the ppm file as a numpy array. The rest is easy to handle with numpy. Let’s say we only want one channel. (for me, that would be the Green channel, which is the second value in the pixel values)
green = numpy.zeros((im.shape,im.shape),dtype=numpy.uint16)
for row in xrange(0,im.shape) :
for col in xrange(0,im.shape) :
green[row,col] = im[row,col]
Cool. Now we have the 16 bit data of the Green channel in the numpy array called green. Using the PyFITS library we can easily write the data to a new fits file.
hdu = pyfits.PrimaryHDU(green)
Well, ofcourse the header information is not copied from the cr2 to fits here, but one can easily get the basic exifdata out of cr2 with dcraw like this :
marvin@marvin:/media/galileo/dcraw$ dcraw -i -v gor.cr2
Timestamp: Thu Dec 1 17:42:51 2011
Camera: Canon EOS 550D
ISO speed: 800
Shutter: 24.7 sec
Focal length: 37.0 mm
Embedded ICC profile: no
Number of raw images: 1
Thumb size: 5184 x 3456
Full size: 5344 x 3516
Image size: 5202 x 3465
Output size: 5202 x 3465
Raw colors: 3
Filter pattern: RGGBRGGBRGGBRGGB
Daylight multipliers: 2.222196 0.932800 1.295405
Camera multipliers: 2194.000000 1024.000000 1702.000000 1024.000000
So you can easily call this with a pipe in your Python script and after catching the necessary information (like using the re library) you can easily add the info to the header of the FITS image with PyFITS.
Soon I’ll add a sample Python script that does all it.
Here’s a sample output, don’t forget that the FITS is only the green channel and it is shown in grayscale. Also, ds9 normally uses the lower left pixel as origin, so the FITS is displayed in an inverted Y axis according to ds9’s painting settings.