How to generate professional-quality PDF files from PHP

Thursday August 21stUncategorized Category

There are a few ways to go about making PDF files from your PHP website. Your options are basically-

  1. Put all of your text in a 210mm column and get the user to save it as PDF.
  2. Learn a purpose-built libraryFPDF (free) or pdflib (proprietary), or
  3. Use PHP for generating markup which can be saved to PDF. This is of course

This article assumes an intermediate knowledge of both PHP and LaTeX, and that your server is not running Windows.

The software mix

PHP is an open-source server package which generates HTML pages, usually based on some sort of dynamic data. It is equally good at (but less well known for) generating other types of markup.

LaTeX is an open source document typesetting system, which will take a markup file in .tex format, and output a printable document, such as a PDF. The engine I will use here is XeLaTeX, because it supports modern trimmings such as Unicode and OpenType fonts.

Naturally, this post will use PHP to populate a .tex file, and then xelatex to create a PDF for the user.

This sounds straightforward enough, but it may not work with all shared hosts. Check your setup before you read on:

  1. Your server needs PHP, with safe mode disabled, so that it can run commands.
  2. This server needs xelatex, or an old-style suitable substitute such as pdflatex.

A bit about markup

We will be working with .tex templates, which will be valid LaTeX files. The basic rules are:

  1. Define a \newcommand for every variabe, so that you can compile the document without PHP.
  2. Drop PHP code in comments, which will print out code to override those variables.

So you will end up with code like this:

% Make placeholders visible
\newcommand{\placeholder}[1]{\textbf{$<$ #1 $>$}}

% Defaults for each variable
\newcommand{\test}{\placeholder{Data here}}

% Fill in
% <?php echo "\n" . "\\renewcommand{\\test}{" . LatexTemplate::escape($data['test']) . "}\n"; ?>

Look messy? A multi-line block of PHP is a little easier to follow. This example is from the body of a table, see if you can figure out the syntax:

%<?php                                                                      /*
% */ foreach($data['invoiceItem'] as $invoiceItem) {                        /*
% */    echo "\n" . LatexTemplate::escape($invoiceItem['item']) . " & " .   /*
% */        LatexTemplate::escape($invoiceItem['qty']) . " & " .            /*
% */        LatexTemplate::escape($invoiceItem['price']) . " & " .          /*
% */        LatexTemplate::escape($invoiceItem['total']) . "\\\\\n";        /*
% */ } ?>

So what about this LatexTemplate::escape() business? In LaTeX, just about every symbol seems to be part of the syntax, so it is sadly not very simple to escape.

I have settled on the following series of str_replace() calls to sanitise information to be displayed in the script. It is crude but effective. Generating LaTex is much like generating SQL, HTML or LDIF from your website: it is quite important to make a habit of wrapping every piece of data in a function to prevent users from writing (‘injecting’) arbitrary code into your document:

/**
 * Series of substitutions to sanitise text for use in LaTeX.
 *
 * http://stackoverflow.com/questions/2627135/how-do-i-sanitize-latex-input
 * Target document should \usepackage{textcomp}
 */
public static function escape($text) {
	// Prepare backslash/newline handling
	$text = str_replace("\n", "\\\\", $text); // Rescue newlines
	$text = preg_replace('/[\x00-\x1F\x7F-\xFF]/', '', $text); // Strip all non-printables
	$text = str_replace("\\\\", "\n", $text); // Re-insert newlines and clear \\
	$text = str_replace("\\", "\\\\", $text); // Use double-backslash to signal a backslash in the input (escaped in the final step).

	// Symbols which are used in LaTeX syntax
	$text = str_replace("{", "\\{", $text);
	$text = str_replace("}", "\\}", $text);
	$text = str_replace("$", "\\$", $text);
	$text = str_replace("&", "\\&", $text);
	$text = str_replace("#", "\\#", $text);
	$text = str_replace("^", "\\textasciicircum{}", $text);
	$text = str_replace("_", "\\_", $text);
	$text = str_replace("~", "\\textasciitilde{}", $text);
	$text = str_replace("%", "\\%", $text);

	// Brackets & pipes
	$text = str_replace("<", "\\textless{}", $text);
	$text = str_replace(">", "\\textgreater{}", $text);
	$text = str_replace("|", "\\textbar{}", $text);

	// Quotes
	$text = str_replace("\"", "\\textquotedbl{}", $text);
	$text = str_replace("'", "\\textquotesingle{}", $text);
	$text = str_replace("`", "\\textasciigrave{}", $text);

	// Clean up backslashes from before
	$text = str_replace("\\\\", "\\textbackslash{}", $text); // Substitute backslashes from first step.
	$text = str_replace("\n", "\\\\", trim($text)); // Replace newlines (trim is in case of leading \\)
	return $text;
}

We then have a template which we can include() from PHP, or run xelatex over. Below is minimal.tex, a minimal example of a PHP-latex template in this form:

% This file is a valid PHP file and also a valid LaTeX file
% When processed with LaTeX, it will generate a blank template
% Loading with PHP will fill it with details

\documentclass{article}
% Required for proper escaping
\usepackage{textcomp} % Symbols
\usepackage[T1]{fontenc} % Input format

% Because Unicode etc.
\usepackage{fontspec} % For loading fonts
\setmainfont{Liberation Serif} % Has a lot more symbols than Computer Modern

% Make placeholders visible
\newcommand{\placeholder}[1]{\textbf{$<$ #1 $>$}}

% Defaults for each variable
\newcommand{\test}{\placeholder{Data here}}

% Fill in
% <?php echo "\n" . "\\renewcommand{\\test}{" . LatexTemplate::escape($data['test']) . "}\n"; ?>

\begin{document}
	\section{Data From PHP}
	\test{}
\end{document}

Generate a PDF on the server

Here is where the fun begins. There is no plugin for compiling a LaTeX document, so we need to directly execute the command on a file.

Looks like we need to save the output somewhere then. You would generate your filled-in LaTeX code in a temporary file by doing something like this:

/**
 * Generate a PDF file using xelatex and pass it to the user
 */
public static function download($data, $template_file, $outp_file) {
	// Pre-flight checks
	if(!file_exists($template_file)) {
		throw new Exception("Could not open template");
	}
	if(($f = tempnam(sys_get_temp_dir(), 'tex-')) === false) {
		throw new Exception("Failed to create temporary file");
	}

	$tex_f = $f . ".tex";
	$aux_f = $f . ".aux";
	$log_f = $f . ".log";
	$pdf_f = $f . ".pdf";

	// Perform substitution of variables
	ob_start();
	include($template_file);
	file_put_contents($tex_f, ob_get_clean());

The next step is to execute your engine of choice on the output files:

	// Run xelatex (Used because of native unicode and TTF font support)
	$cmd = sprintf("xelatex -interaction nonstopmode -halt-on-error %s",
			escapeshellarg($tex_f));
	chdir(sys_get_temp_dir());
	exec($cmd, $foo, $ret);

Once this is done, you can delete a lot of the extra LaTeX files, and check if a .pdf appeared as expected:

	// No need for these files anymore
	@unlink($tex_f);
	@unlink($aux_f);
	@unlink($log_f);

	// Test here
	if(!file_exists($pdf_f)) {
		@unlink($f);
		throw new Exception("Output was not generated and latex returned: $ret.");
	}

And of course, send the completed file back via HTTP:

	// Send through output
	$fp = fopen($pdf_f, 'rb');
	header('Content-Type: application/pdf');
	header('Content-Disposition: attachment; filename="' . $outp_file . '"' );
	header('Content-Length: ' . filesize($pdf_f));
	fpassthru($fp);

	// Final cleanup
	@unlink($pdf_f);
	@unlink($f);
}

The static functions escape($text) and download($data, $template_file, $outp_file) are together placed into a class called LatexTemplate for the remainder of the example (complete file on GitHub).

Gluing it all together

With the library and template, it is quite easy to wrap this into a PHP script which triggers the above code:

<?php
require_once('../LatexTemplate.php');

$test = "";
if(isset($_GET['t'])) {
	// Make the LaTeX file and send it through
	$test = $_GET['t'];
	if($test =="") {
		// Test pattern to show symbol handling
		for($i = 0; $i < 256; $i++) {
			$test .= chr($i) . " . ";
		}
	}

	try {
		LatexTemplate::download(array('test' => $test), 'minimal.tex', 'foobar.pdf');
	} catch(Exception $e) {
		echo $e -> getMessage();
	}

}
?>
<html>
<head>
<title>LaTeX test (minimal)</title>
</head>
</html>
<body>
	<p>Enter some text to be placed on the output:</p>
	<form>
		<input type="text" name="t" /><input type="submit" value="Generate" />
	</form>
</body>
</html>

The above code will show a form, which asks for input. When it gets some text, it will generate a PDF containing the text. If no text is given, it will output an ASCII table, simply to show that it can handle the symbols.

It is that simple once all the templating code is hidden away.

Results

In any real example, your template would be more extensive than this minimal example.

Compiling the template directly creates this PDF:

From the web, a form is presented to fill this single field:

Which results in a PDF containing the user data:

Tips

  1. The text after \end{document} is not even parsed in latex. Use this area to write <?php ?> with fewer constraints.
  2. Consult the github repository for this code to see the complete example.
  3. Comment out the line @unlink($tex_f); of you want to take a look at (debug, etc) the generated markup.
    Comments Off

    How to query Microsoft SQL Server from PHP

    Tuesday June 10thUncategorized Category

    This post is for anybody who runs a GNU/Linux server and needs to query a MSSQL database. This setup will work on Debian and its relatives. As it’s a dense mix of technologies, so I’ve included all of the details which worked for me.

    An obvious note: Microsoft SQL is not an ideal choice of database to pair with a GNU/Linux server, but may be acceptable if you are writing something which needs to import some data from external application which has a better reason to be using it.

    A command-line alternative to this setup would be sqsh, which will let you running scheduled queries without PHP, if that’s what you’re after.

    Prerequisites

    Once you have PHP, the required libraries can be fetched with:

    sudo apt-get install unixodbc php5-odbc tdsodbc

    MSSQL is accessed with the FreeTDS driver. Once the above packages are installed, you need to tell ODBC where to find this driver, by adding the following block to /etc/odbcinst.ini:

    [FreeTDS]
    Description=MSSQL DB
    Driver=/usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
    UsageCount=1

    The path is different on platforms other than amd64. Check the file list for the tdsodbc package on your architecture if you lose track of the path.

    The next step requires that you know the database server address, version, and database name. Add a block for your database to the end of /etc/odbc.ini:

    [foodb]
    Driver = FreeTDS
    Description = Foo Database
    Trace = Yes
    TraceFile = /tmp/sql.log
    ForceTrace = yes
    Server = 10.x.x.x
    Port = 1433
    Database = FooDB
    TDS_Version = 8.0

    Experiment with TDS_Version values if you have issues connecting. Different versions of MSSQL require different values. The name of the data source (‘foodb’), the Database, Description and Server are all bogus values which you will need to fill.

    An example

    For new PHP scripts, database grunt-work is invariably done via PHP Data Objects (PDO). The good news is, it is easy to use it with MSSQL from here.

    The below file takes a query on standard input, throws it at the database, and returns the result as comma-separated values.

    Save this as query.php and fill in your data source (‘odbc:foodb’ here), username, and password.

    #!/usr/bin/env php
    <?php
    $query = file_get_contents("php://stdin");
    $user = 'baz;
    $pass = 'super secret password here';
    
    $dbh = new PDO('odbc:foodb', $user, $pass);
    $dbh -> setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
    $sth = $dbh -> prepare($query);
    $sth -> execute();
    $results = $sth -> fetchAll(PDO::FETCH_ASSOC);
    
    /* Quick exit if there are no rows */
    if(count($results) == 0) {
    	return 0;
    }
    $f = fopen("php://stdout", "w");
    
    /* Output header */
    $a = $results[0];
    $header = array();
    foreach($a as $key => $val) {
    	$header[] = $key;
    }
    fputcsv($f, $header);
    
    /* Output rows */
    foreach($results as $result) {
    	fputcsv($f, $result);
    }
    
    fclose($f);

    To test the new script, first make it executable:

    chmod +x query.php
    To run a simple test query:
    echo "SELECT Name from sys.tables ORDER BY Name;" | ./query.php

    Refining the setup

    The above script has some cool features: It’s short, actually useful, and it sets PDO::ERRMODE_EXCEPTION. This means that if something breaks, it will fail loudly and tell you why.

    Hopefully, if your setup has issues, you can track down the cause with the error, and solve it by scrolling through this how-to again.

    If you encounter a MSSQL datablase with an unknown schema, then you may want to list al rows and columns. This is achieved with:

    SELECT tables.name AS tbl, columns.name AS col FROM sys.columns JOIN sys.tables ON columns.object_id = tables.object_id ORDER BY tbl, col;

    The catch

    I’ve run into some bizarre limitations using this. Be sure to run it on a server which you can update at the drop of a hat.

    A mini-list of issues I’ve seen with this combination of software (no sources as I never tracked down the causes):

    • An old version of the driver would segfault PHP, apparently when non-ASCII content appeared in a text field.
    • Substituting non-text values fails in the version I am using, although Google suggests that updating the ODBC driver fixes this.
    Comments Off

    A Yes/No filtering tool for images

    Monday June 9thUncategorized Category

    While collecting very large numbers of screen captures for writing documentation, I noticed that it takes far too long to filter out the junk.

    To fix this, I coded up yn (source code via github). Given a list of images, it will display each one to the user, which they then include by pressing Y (yes) or exclude by pressing N (no) (hence the name yn).

    The images which are selected are saved as a list, so that a script can continue by processing them in some way. This could be by copying them elsewhere, or generating a document with spaces to caption them. The example in the README.md does both of these.

    The code to this early version is very simple, which makes it a good example of a simple OpenCV C++ app. I’ve stepped through it below.

    The code

    C++ does console input and output via the iostream library:

    #include <iostream>

    To use OpenCV with a GUI, you need these headers. You then need to add them to your include path, and link to OpenCV for the program to compile:

    #include <cv.h>
    #include <highgui.h>

    This line simly tells the compiler that when we say cout (console out), we mean std::cout.

    using namespace std;

    The only OpenCV code is in the below function. The steps are:

    1. load an image from a file via imread(),
    2. display it in a window with imshow(),
    3. capture the next keypress with waitKey(), then
    4. delete the window with destroyWindow()

    /**
     * Load a file, and wait for the user to press a key.
     *
     * If the pressed key is 'y', print the filename.
     */
    bool yn(string fn) {
        cv::Mat img;
        char key = 0;
    
        img = cv::imread(fn);
        if(!img.data) {
            cerr << "Failed to load " << fn << endl;
        } else {
            cv::namedWindow(fn);
            cv::imshow(fn, img);
            key = cv::waitKey(0);
            cv::destroyWindow(fn);
            if(key == 'y') {
                cout << fn << endl; // 'y' pressed
            } else if(key == 0x1B) {
                return false; // ESC pressed
            }
        }
        return true;
    }

    So with that library usage out of the way, all we need to do is get the list of files to check, and stop popping up windows when the user has pressed the escape key.

    /**
     * Get list of files from command-line arguments and display them in turn
     */
    int main(int argc, char** argv) {
        int i;
    
        /* Command-line arguments given */
        for(i = 1; i < argc; i++) {
            if(yn(string(argv[i])) == false) {
                cerr << "Quitting.\n";
                break;
            }
        }
    
        return 0;
    }

    Next

    The delay between images appearing can be removed by loading them in a separate thread, which I may do in a future version.

    Comments Off

    Crash course: Run Windows on desktop Linux

    Monday June 9thUncategorized Category

    Sometimes, you need to use a tricky windows-only proprietary program on a GNU/Linux desktop. If you have a Windows install disk and licence at your disposal, then this post will show you how to get a Windows environment running without dual-booting.

    The host here is a Debian box, and the guest is running Windows 7. The instructions will work with slight modifications for any mix of GNU/Linux and Windows

    On the desktop, some things are not as important as the server world. Some things are excluded for simplicity: network bridging, para-virtualised disks, migration between hosts, and disk replication.

    Software setup

    Everything required from the host machine can be pulled in via Debian’s qemu-kvm package.

    sudo apt-get install qemu-kvm

    Install

    Prepare a disk image for Windows. The qcow2 format is suggested for the desktop as it will not expand the file to the full size until the guest uses the space:

    qemu-img create -f qcow2 windows.img 30G

    Launch the Windows installer in KVM with a command that looks something like this:

    kvm -hda windows.img --cdrom windows-install-disc.iso -vga std -localtime -net nic,model=ne2k_pci -m 2048

    Note the -m option is the number of megabytes of RAM to allocate. You can set it a little lower if you don’t have much to spare, but if it’s too low you’ll get this screen:

    2014-06-04-capture-noboot

    If you have a physical disk but no .iso of it, then using the disk drive via --cdrom /dev/cdrom will work.

    Install

    If you have GNU/Linux, chances are you have installed an OS before. In case you haven’t seen the Windows 7 installer, the steps are below:

    Select language, accept the licence agreement, choose the target disk, and let the files copy:

    2014-06-04-capture-005 2014-06-04-capture-023 2014-06-04-capture-024 2014-06-04-capture-028 2014-06-04-capture-033 2014-06-04-capture-035 2014-06-04-capture-081 2014-06-04-capture-149

    After reboot, enter the user details, licence key, update settings and timezone:

    2014-06-04-capture-160 2014-06-04-capture-178 2014-06-04-capture-190 2014-06-04-capture-191 2014-06-04-capture-199 2014-06-04-capture-205 2014-06-04-capture-214 2014-06-04-capture-250 2014-06-04-capture-254 2014-06-04-capture-261

    After another reboot, Windows is installed in the virtual machine:

    2014-06-04-capture-263 2014-06-04-capture-265 2014-06-04-capture-277

    Post-install tasks

    The guest you have now will only run at standard VGA resolutions, and will probably not be networked. This section will show you how to fix that.

    Network drivers

    You will notice that we are launching the guest with -net nic,model=virtio. This means that we are using a virtual network card, rather than simulating a real one. You need to fetch a disk image with the latest binary drivers, which are best tracked down on linux-kvm.org via google.

    Once you have the disk image in the same folder as your virtual machine, shut down and launch it with a CD:

    kvm -hda windows.img --cdrom virtio-win-0.1-74.iso -vga std -localtime -net nic,model=ne2k_pci -m 2048

    Under "My Computer" track down the "Device Manager", find your network card, and tell Windows to update the drivers. You can then point it to the CDROM’s "Win7" directory (or other one, if you are installing a different guest). After the network adapter is recognised, you will be connected automatically.

    Note that you are using "user-mode" networking, which means you are on a simulated subnet, and can only communicate via TCP and UDP (ie, ping will not work). This can be a little slow, but will work on a laptop host whether plugged in or running on WiFi.

    Remote desktop

    You may also be annoyed by the screen resolution and mouse sensitivity having strange settings. The best way around this is not to fiddle with settings and drivers, but to enable remote desktop and log in via the network. This lets you use an arbitrary screen size, and match mouse speed to the host.

    This is set up to run locally, so it is neither laggy nor a security issue, and makes it possible to leverage all RDP features.

    First, in the guest, enable remote desktop using these Microsoft instructions.

    Then shut down and boot up with the extra -redir tcp:3389::3389 option:

    kvm -hda windows.img -vga std -localtime -net nic,model=ne2k_pci -m 2048 -redir tcp:3389::3389

    On the host, wait for the guest to boot, then use rdesktop to log in:

    rdestkop localhost

    One this works, you can shut down and boot with the extra -nographic option to leave remote desktop as the only way to interact with the guest:

    kvm -hda windows.img -vga std -localtime -net nic,model=ne2k_pci -m 2048 -nographic -redir tcp:3389::3389

    The rdesktop tool supports sound, file and printer redirection. It can also run fullscreen when launched with -f. All the details are in man rdesktop

    If you end up using the guest operating system more, it is worth investigating USB-redirection for any peripherals (printers or mobile phones), running a virtual sound card, or running SAMBA on the host to share files.

    Comments Off

    Including git commit history in a LaTeX document

    Sunday April 27thUncategorized Category

    LaTeX is a document typesetting system, which lends itself well to use within a source-code management program such as git. I found a need to include a changelog in a document, and found this brilliant blog article by Jerel Unruh which creates a HTML log using git log.

    For inclusion within a Makefile target, I re-worked this into a shell script, changes.sh, which will also auto-detect URL’s for any Github-hosted repository:

    #!/bin/bash
    # Find remote URL for hashes (designed for GitHub-hosted projects)
    origin=`git config remote.origin.url`
    base=`dirname "$origin"`/`basename "$origin" .git`
    
    # Output LaTeX table in chronological order
    echo "\\begin{tabular}{l l l}\\textbf{Detail} & \\textbf{Author} & \\textbf{Description}\\\\\\hline"
    git log --pretty=format:"\\href{$base/commit/%H}{%h} & %an & %s\\\\\\hline" --reverse
    echo "\end{tabular}"

    The above script outputs a LaTeX table, including a hyperlink to each commit. In the pre-amble, you need:

    \usepackage{hyperref}

    In your Makefile, you might add something like:

    changelog:
    	./changes.sh > changelog.tex

    The resulting changelog can then be included in the document via:

    \input{changelog.tex}
    Comments Off

    QJoyPad update

    Sunday April 27thUncategorized Category

    As of April, there is now a qjoypad package available in the official Debian repository, see https://packages.debian.org/unstable/main/qjoypad. This program allows you to map joystick events to keyboard and mouse actions. This means that you could use, for example, a USB gamepad to browse the web.

    I have previously blogged about the lack of usable, packaged programs to do this, and highly recommend getting your hands on a USB gamepad and giving this a shot. Depending on your distribution, you will soon be able to just run:

    sudo apt-get install qjoypad

    Then connect the gamepad and type:

    qjoypad

    You then get a window which shows you what buttons you can map:

    Clicking a button prompts you to press a key on your keyboard or select a mouse button, and each profile can be saved and named for later use.

    As far as small utility programs go, I now rely on this about as much as the Gnome calculator.

    An alternative: AntiMicro

    I’ll also give a mention to AntiMicro, which has similar goals, and looks very promising.

    As a bit of background, there is a limitation in QJoyPad around adding modifier keys: For example, you simply can’t map a joystick button to <Ctrl>+Q, making it inefficient to use with most applications. The author of AntiMicro, Travis, sent me an email back in September to point out that his program both solves this issue, and has a Debian package available for download.

    Whilst I was quite impressed with the functionality in AntiMicro, I found it a whole lot more complex to set up. I think this would be more useful for gamers, as mouse acceleration and key repeats are overkill for my use case (browsing the web and controlling VLC, MythTV, etc).

    So, if you are a power user, then I suggest you give it a shot. Power users with apt-get may also consider joy2key, which is the best non-GUI tool for this, and is available in the Debian repositories.

    Comments Off

    Winning 2048 game with key-mashing?

    Monday March 24thUncategorized Category

    This new, simple, addictive game is out, called 2048. You need to slide two numbers together, resulting in a bigger number, in ever-increasing powers of two. You get 2048, and you win.

    2014-03-2048-1

    I noticed that somebody already wrote neat AI for it, although it does run quite slowly. But then I also noticed a friend mashing keys in a simple pattern, and thought I should test whether this was more effective. The answer: it kinda is.

    2014-03-2048-2

    At least in the first half of the game, a simple key-mashing pattern is a much faster way to build high numbers. The PHP script below will usually get to 512 without much trouble, but rarely to 1024. I would suggest running it for a while, and then taking over with some strategy.

    The script

    This script spits out commands which can be piped to xte for some automatic key-mashing on GNU / Linux. Save as 2048.php

    #!/usr/bin/env php
    mouseclick 1
    <?php
    for($i = 0; $i < 10; $i++) {
    	move("Left", 1);
    	move("Right", 1);
    }
    
    while(1) {
    	move("Down", 1);
    	move("Left", 1);
    	move("Down", 1);
    	move("Right", 1);
    }
    
    function move($dir, $count) {
    	for($i = 0; $i < $count; $i++) {
    		echo "key $dir\nsleep 0.025\n";
    		usleep(25000);
    	}
    }

    And then in a terminal, run this then click over to a 2048 game:

    sleep 3; php 2048.php | xte

    Good luck!

    Comments Off

    Importing myki data into GnuCash

    Wednesday March 5thUncategorized Category

    GnuCash, despite all its bugs, is one of the best open source accounting programs going around.

    Since it is not hard to export the history from a myki (public transport) card, I figured that it would be nice to track it as an account in Gnucash.

    myki logo Gnucash logo

    The data on the statements is not quite suitable for an accounting program. Some changes that need to be done are:

    • Conversion of date format to YYYY-MM-DD.
    • Single-field descriptions for each transaction (“Top up myki money”, “Travel zone 2″), rather than multiple fields.
    • Entries which have a 0.00 cost need to be removed.

    Once you have a CSV of your data, the script below will filter it to be ready for import:

    #!/usr/bin/env php
    <?php
    /* Dump Myki CSV file to a file suitable for gnucash
    	(c) Michael Billington < michael.billington@gmail.com >
    	MIT Licence */
    $in = fopen("php://stdin", "r");
    $out = fopen("php://stdout", "w");
    $err = fopen("php://stdout", "w");
    $lc = 0;
    
    while($line = fgets($in)) {
    	$lc++;
    	$a = str_getcsv($line, ',', '"');
    	if(count($a) == 8) {
    		$date = implode("-", array_reverse(explode("/", substr($a[0], 0, strpos($a[0], " ")))));
    		$credit = $a[5] == "-" ? "" : $a[5];
    		if($credit != "") { // Probably a top-up or reimbursement
    			$description = $a[1];
    		} else if($a[3] == "-") { // Probably buying myki pass
    			$description = trim($a[1], "*");
    		} else { // Probably travel charges
    			$description = "Travel: " . $a[2] . ", Zone " . $a[3];
    		}
    		$debit = $a[6] == "-" ? "" : $a[6];
    		$balance = $a[7] == "-" ? "" : $a[7];
    		if($balance != "") { // Ignore non-charge entries
    			fputcsv($out, array($date, $description, $credit, $debit, $balance));
    		}
    	}
    }
    fclose($in);
    fclose($out);
    fclose($err);

    Why would you track it as an account?

    There are lots of reasons why a public transport card is account-like enough to put into GnuCash:

    • Money is not spent until you touch on and off — note that no GST is payable until the card balance is used. This means that if you record a top-up under Expenses, it’s not quite correct.
    • You can cancel a myki and have its balance moved to another card.
    • A card can be handed in, and the balance paid back to you as cash.

    Update 2014-05-04: All of these myki-related scripts are now available on github.

    Comments Off

    Successful migration to WordPress in 3 easy steps

    Monday March 3rdUncategorized Category

    We made the decision in January to migrate our website to a WordPress installation, which is the CMS we recommend for our customers.

    This posed a big challenge, mainly because we had been using our in-house BitBlog CMS to publish BitRevision content for the past 2 years, meaning our content was tied up in a difficult-to-export format.

    Still, it allowed me to dig into the nifty and well-developed world of WordPress. My magic formula for a WordPress migration, in a nutshell:

    1. Export your old blog as something WordPress can understand.
    2. Hack at the theme until your site is beautiful.
    3. Don’t break your URL’s.

    1. Export your blog

    Depending on how you are blogging already, you may be able to save an export which can be loaded into WordPress with a plugin (see Importing Content on the WordPress wiki).

    We were not in this lucky category, so I delved into the WXR (WordPress eXtended RSS) format. This example file was a big help, and I wrote up a short PHP script to create a similar-looking file from my blog.

    2. Hack at the theme

    We adapted our site from the ‘Skittlish’ theme, which has also been ported to WordPress. Every theme carries some baggage, so I highly suggest rolling up your sleeves and opening wp-content/themes on your blog.

    All non-feature modifications are done in WordPress via themes, so keep tweaking it until you’re happy, or get a designer to put together a theme that suits your needs.

    3. Don’t break your URL’s

    I link between blog posts a lot, and breaking these links would be mind-numbing to clean up after (and a SEO sin). Using the import method above, I used article titles which matched the old permalinks.

    WordPress then lets you configure permalinks to use this field, replicating the old behavior and keeping everybody happy.

    Wrap-up

    If you run WordPress on your site, then it makes sense to have somebody on your team who really knows how it works.

    If your initial setup is not handled with care, then you could end up wasting several days of work checking old content for errors.

    Good luck!

    Comments Off

    Rolling your own cloud storage with Unison

    Sunday February 23rdUncategorized Category

    Cloud storage is a very cool way to back up files, but it has two major drawbacks:

    1. If you back up everything, it will cost you a bit for the space.
    2. You need to trust that the company wont lose, tamper with, or leak your files.

    I’ve recently discovered a little program called Unison, which provides a bandwidth-efficient way to synchronise folders on two computers, solving both of those problems. It has clients for just about every platform, it’s open source, and it’s been around for over a decade.

    I’ve set up a copy of my laptop’s home directory on my desktop computer, so if it ever explodes, I can just sync it back:

    With most ISP’s offering static addresses, you could even put a NAS on the internet to sync with, which would basically be a home DropBox.

    Laptop and desktop example

    Unison is in most Linux distributions. On Debian or Ubuntu, you can install it with:

    apt-get install unison

    mikebook has a profile called mikebox.prf, located in /home/mike/.unison/, which tells it to keep a copy of its home directory on mikebox.

    root = /home/mike
    root = ssh://mike@mikebox//home/mike/Remote/mikebook/
    ignore = Name .*
    ignore = Name *.iso
    ignore = Name *.img
    ignore = Name unison.log
    ignore = BelowPath Downloads
    ignore = BelowPath workspace

    Everything can be brought up to speed with:

    unison -batch mikebox

    If mikeboox is running out of space, then I can drag files out of Remote/mikebook to somewhere else, and they simply vanish from the laptop next time it is synchronised.

    Notes

    • Remote paths require that extra / in the filename.
    • Hidden folders seem to be skipped, so keep a copy of your .prf file in case you blow up your computer.
    Comments Off