PDL, The Perl Data Language

by levien on wo 26 november 2008 // Posted in misc // under science

PDL is an extension of Perl for numeric/scientific data processing. It was originally developed by astrophysicists as a free alternative to packages like IDL and Matlab. It's quite fast and memory-efficient, and very powerful. I've found it to be most useful in cases where you have to mix data-processing with the strengths of Perl (anything involving list & hash-operations, regular expressions and/or text-parsing or output). Its main drawback however is that it has a rather steep learning-curve, because the documentation is quite fragmented and not always clear. Therefore I've collected some useful links and examples.

Introduction material

The PDL Book, bundled with the latest release of PDL
Book "Beginning PDL" (PDF)
The Perl Monks PDL Quick Reference Guide maintains an extensive list of resources, examples and frequently asked questions

Reference material

A list of PDL modules and functions in the online-book "PDL -- Scientific Programming in Perl"
PDL cheat sheet at Lino Ramirez' blog on Intelligent Data Analysis
PDL "cribsheet" at Art Davis' excellent site. (While you're at it, check out his cribsheets on mathematics and essential Windows-software as well!)

Official site

Add-ons

PDL-Stats, a set of statistical tools for PDL. Includes functions for basic statistics, probability distributions, linear regression, GLM, K-means clustering and time-series analysis.
PDL-NetCDF, an interface that allows data in NetCDF files to be accessed as piddles. Very useful for reading and writing large vectors and multi-dimensional grids.

Installing PDL

In Ubuntu, you can simply install the pdl package using Synaptic.
In RPM/yum-based systems, install the package perl-PDL.

If you don't have root/sudo access, you can get PDL from CPAN and install it in a local directory. From the documentation:\

PDL depends on a number of other Perl modules for feature complete operation.
These modules are generally available at the CPAN. The easiest way to 
resolve these dependencies is to use the CPAN module to install PDL.
Installation should be as simple as

cpan install PDL # if the cpan script is in your path

or if you don't have the cpan script try

perl -MCPAN -e shell
cpan> install PDL

NOTE: if this is your first time running the cpan shell, you'll be prompted 
to configure the running environment.

Some examples of using PDL

#!/usr/bin/perl -w

use PDL;        # To include basic PDL functionality
use PDL::NiceSlice; # For a shorter "slicing" syntax
use strict;     # Always a good idea

# Create a 256-element vector of double-precision floats
my $vector = zeroes(256);

# Create a 100 x 100 matrix of bytes
# (see also http://pdl.sourceforge.net/PDLdocs/Core.html#datatype_conversions )
my $matrix = zeroes(byte, 100, 100);

# Set values with index 64-128 to a numerical sequence:
$vector->slice("64:128") .= sequence(65);


# Same thing, but with the shorter NiceSlice syntax:
$vector(64:128) .= sequence(65);

# Get length of first dimension
# (for alternative methods see http://pdl.sourceforge.net/PDLdocs/Core.html#nelem )
my $elements = $vector->getdim(0);
print "Vector has $elements elements\n";

# Get and print value at index 100
my $value = $vector->at(100);
print "Value at index 100 is: $value\n";

# Get all values >32
my $largevalues = $vector->where($vector > 0);
print "There were " . nelem($largevalues) . " values > 32, namely: ";
print $largevalues . "\n";

# Replace all 0's with 42's
my $indices = which($vector == 0);
$vector->dice($indices) .= 42;

# Make a reversed copy
my $reversed_vector = $vector(-1:0);    # -1 = last element

# Write both vectors to a file
# (you could use *FILEHANDLE instead of a filename)
my $file = "/tmp/bla.dat";
wcols($vector, $reversed_vector, $file);

# Read them back
my ($column1, $column2) = rcols($file);

# Put some values in our matrix:
$matrix(:,:) .= sequence(100) * 2;

# Create a (rather boring) PNG picture
my $red = $matrix;
my $green = transpose($matrix);
my $blue = $matrix->xchg(0,1)->slice("-1:0,:");  # xchg swaps dimensions

my $picture = zeroes(bytes, 3, 100, 100);
$picture(0) .= $red;
$picture(1) .= $green;
$picture(2) .= $blue;

wpic ($picture, "/tmp/foo.png");

# OK, we're done
exit(0);

# If you're new to PDL, the best way to start is by reading this:
# http://www.johnlapeyre.com/pdl/pdldoc/newbook/node4.html