The Sean Code: February 2008

Tuesday, February 26, 2008

The Freedom Framework Bends to Your Will

On Sunday, I came up with the concept for the Freedom Framework, which consists of PHP Object Generator, EpiCode, and Smarty. I'd like to take some time and go into what I feel are some of its actual advantages, over other frameworks and over not using a framework at all.

First, you define your data model from the perspective of the database, rather than from the code. While you don't actually have to write any SQL, you're defining the fields and their types as if you're just creating a table. It adds to this with the Parent/Child/Sibling concept, but doesn't go as far as other frameworks which give you the impression that you're just creating and working with objects, and then try to generate the database from that (or make you write both the database and the models).

When you've done this, you edit the config file with your database's login information and run setup. It tests your database connectivity, creates the tables, and then runs some unit tests to make sure everything's working. At this point you have a set of classes that give you CRUD functionality for the tables you've defined.

And now, EpiCode comes in to define your controllers. This is the best part of the "framework." Since you define the controller classes yourself (rather than having them generated), you have a little extra freedom there. And since you actually define the routes table (rather than generating it or being stuck with a convention), you gain more control over the URLs that'll be used by your application. If it's a small application, you can use just one controller class, even if you have a bunch of differently named URLs pointing to its methods.

That freedom allows for excellent refactoring support. As I was feeling my way around the new "framework," I had only one controller class. After my methods started growing, I decided it would be best to separate them into multiple classes. This was the work of a few seconds (made easier since these are static methods); then I just had to update the routes table to point the URLs to the new classes and methods ... and voila! The project graduates from being a simple proof of concept to being a maintanable MVC application.

Even Smarty allows this sort of flexibility for the view. When I was putting together my proof of concept, I just threw the .html files into the root templates/ directory. As it grew, and I decided that this was more than just a tiny project, I created subdirectories underneath the templates/ directory and moved the .html files into them. (It's probably simplest to name these subdirectories in such a way that they match the URLs, but you don't HAVE to.)

The goal of the Freedom Framework is to minimize the things you're forced to do to work with it. There's no generated filesystem structure or controller classes; you can name them and structure them however you please. The framework grows with each project, starting out as just a little bit of glue and expanding into a well defined structure for your code (and since it's YOUR code, you get to define that structure). You don't have to know (or be told) the final structure before you start working on it.

The more I think about this "framework," the more I like it. The more I use it, the more I like it. It's the first I've found that doesn't inherently limit the way I want to work. The tool bends to my will. That's what makes it a good tool.

Sunday, February 24, 2008

POG + EpiCode + Smarty == The Freedom Framework

I generally have a problem with web frameworks, like Rails, CakePHP, Symfony, and the millions of others that have sprung up lately (moreso in PHP than in Ruby, as Ruby hadn't really been used for web development until Rails came along). As a rule, they force you to do things a certain way, and -- more importantly -- to think in a certain way. In most cases, this "way" is mimicry of Rails, but in all cases, this "correct way of thinking" is the way the framework developers think. That's the aspect of frameworks that I take issue with, while at the same time acknowledging the productivity gains that are possible with these frameworks.

Since everybody likes productivity gains ... I decided to go out and see if there was some way to get the same increase in productivity without using a framework. I want to use tools in such a way that makes sense to me, not to the tool.

The place I started was with the model; I needed to find a good ORM tool for PHP. I hadn't found one that I like yet, so this search took me a while. I still haven't found a real ORM library that I like, but I have come across an interesting tool. The PHP Object Generator allows you to define a database table (with an interface similar to phpMyAdmin, which I don't care for, but at least it's a familiar concept), and from that it generates not only the SQL to create the table, but also a PHP class with the standard CRUD methods. It handles relationships between tables with a parent/child concept (a parent can have multiple children, but a child may only have one parent). At first, it didn't support has_and_belongs_to_many relationships, but added that in with the concept of "siblings." I think that terminology is a bit of a stretch, and the documentation on it isn't the best, but it works well enough.

For the controller, I wanted something really light. I considered just making my own, and that probably would have worked. But by chance, I discovered EpiCode, which is just about exactly what I would have wanted to create on my own. It uses a .htaccess file to direct all HTTP access to the index.php file, which contains an array called $routes, which in turn wires paths to functions (or static methods, which works even better). It doesn't touch the query string parameters, so you get to process those however you like.

For the view, I stuck with the old Smarty standby, and it works like a charm for this. You have to define which variables to open up to the template, as well as which template file to use; despite having to type a couple of extra lines of code, I prefer this to being forced into a particular directory structure and naming convention that is common among the true frameworks.

In my first foray into the POG+EpiCode+Smarty "framework," I made an IMDB clone in about an hour and a half, and I didn't have to write a single line of SQL (table creation or queries). In fact, I didn't even have to log into the database (as the POG setup process creates the tables and verifies that everything's working). I'd say that's a fairly significant increase in productivity over my previous methods, and I still got to define my workflow for myself, rather than kneeling before some anonymous framework creator.

I know a lot of people like frameworks, and a lot of people use them. Because of this, my opinion on the matter probably isn't very popular. But I think this freedom is really valuable. And I find it difficult to believe that my opinion is unique.

Saturday, February 23, 2008

PHP: Reading Key/Value Pairs from a File

I'm working on an application in which I have to parse a file with key-value pairs, looking for a single value. I decided to make the datafile look like this:

key0|val0
key1|val1
key2|val2
...
key2000|val2000

I don't expect to have more than 2000 values in the file, but just in case I set up a second sample file with 100000 values just to see how that would affect things. (That kind of growth in usage would be awesome.)

My initial thoughts on this are to keep it simple. Performance is important, but I don't think spending the time for a binary search file seek algorithm would be worth it; also, that would be premature optimization. So I want to check out the performance of a couple simpler options first.

My first option was to loop through each line, explode on the | character, and compare the keys. Like this:

function searchLinearly($filename, $key) {
    $lines = file($filename);  

    foreach ($lines as $line) {
        list($k, $v) = explode('|', $line);

        if ($k == $key) {
            return $v;
        }
    }

    return false;
}

That's just about as simple as you can get. But how does it perform? I searched for a few keys throughout the file, and measured how long it took to find them.

In a 2000 line datafile:

Key 10: 0.00215s
Key 150: 0.00233s
Key 900: 0.00555s
Key 1700: 0.00866s

In a 100000 line datafile:

Key 10: 0.11047s
Key 1500: 0.04429s
Key 50000: 0.11832s
Key 99000: 0.18710s

What's really strange about this is that key 1500 is consistently faster than key 10, whether I search for key 10 first or second, or just search once per run (in an attempt to separate caching issues). I can't explain that.

Another option is to use fscanf() instead of just using file() to get all the lines. The idea is to define a regex that matches the format of the lines in your file, and "scan" through the file, extracting the pieces of data along the way. That version looks like this:

function searchFscanf($filename, $key) {
    $f = fopen($filename, 'r');
    while ($line = fscanf($f, "%[^\|]|%[^\|]\n")) {
        list($k, $v) = $line;

        if ($k == $key) {
            fclose($f);
            return $v;
        }
    }

    fclose($f);
    return false;
}

It's only slightly more complicated; does it perform faster?

In a 2000 line datafile:

Key 10: 0.00052s
Key 150: 0.00101s
Key 900: 0.00572s
Key 1700: 0.01033s

In a 100000 line datafile:

Key 10: 0.00052s
Key 1500: 0.00947s
Key 50000: 0.12980s
Key 99000: 0.20901s

I think these results are pretty interesting. Using fscanf() can actually be considerably faster, and it isn't affected by the size of the file. However, it is drastically affected by the key you're searching for; if you have to scan through a lot of lines, it slows down significantly and steadily.

In contrast, the file() method is more affected by the filesize than by which key you're searching for (searching for later keys does take longer, but it's not nearly as significant).

So which option is better? As the number of records increases, it appears the fscanf() method becomes more and more advantageous (assuming that early records are hit as often as later records, considerably less time is spent searching than with the file() method). The same consideration remains true even if the number of records remains constant, it just isn't as noticeable or significant at that level.

Unsurprisingly, fscanf() beats file(). But not by as much as I'd like. In order to maximize performance, another method is going to have to be devised; either a binary search that minimizes the number of comparisons made (O(ln n) versus O(n)), or perhaps just using something like memcached to keep the datafile in memory at all times. Maybe both.

Something to think about.

Monday, February 18, 2008

Using the Velocity Template Engine

Templates are really important for separating the presentation of your program from the actual code. In PHP, I prefer Smarty (mostly because it's the one I'm most familiar with). In Java, I've taken to using Velocity.

The advantages of templates are very quick to see. Imagine you're creating a web page. You could pretty easily put the <html> code right in along with your PHP or Java code. In PHP this is absurdly easy, just by using the <?php ?> tags. The problem is that this very quickly gets cumbersome when you want to change things. Changing the way the display looks requires changing the way the program works. That's not really what you want at all. So you use templates to separate the display from the logic; that way you can change the two independently.

But templates aren't just for web development. I recently found a reason to use templates in a Java program. It is intended to create content for a web-delivered program, but it doesn't have to be dynamically generated; instead, we can pre-generate all the necessary content periodically. When I first threw the content generator together, I simply used a StringBuilder to put the content together; then I had to change something, and it was horrible. I had to parse these long append() lines, then re-compile the entire program. It became especially obvious that something needed to change because my program has to support multiple languages ... and I only speak one. I'll be getting translations from other people, and I'm not that interested it pasting translations into my code (or giving commit access to the translators). In steps Velocity. I moved the content out into its own file:

my_template.vm:
------------------------------
This is my content. My name is $name.

Of course, you can have any number of these template files, and you can load any of them at runtime just by specifying the filename.

In order to do that, you have to set up your program to load the template files at runtime using a classloader.

Properties p = new Properties();
p.setProperty("resource.loader", "class");
p.setProperty("class.resource.loader.class",
       "org.apache.velocity.runtime.resource.loader.ClasspathResourceLoader");

(That's an important step. A lot of Velocity documentation leaves that out. I figured I'd put it before the more "significant" stuff that actually "does the work." Without this, the work doesn't happen.)

Speaking of which, let's get to the meaty part.

VelocityEngine ve = new VelocityEngine();
ve.init(p);

Here we set up the VelocityEngine. Note that we pass the Properties object to the init() method. That's where the classloader setup actually happens, so don't overlook that little character.

VelocityContext vc = new VelocityContext();
vc.put("name", "Sean");

The VelocityContext is where you put all the data you want to expose to the template. It's essentially a set of name-value pairs; like most templating engines, Velocity allows you to use conditionals and loops within the templates. These are almost always necessary in non-trivial templates, but you should start from the standpoint of avoiding it if at all possible. After all, before you're used to using templates, it's natural to attempt to put your program's logic into the template. That totally defeats the point.

StringWriter writer = new StringWriter();
Template template = ve.getTemplate("<package_name>/templates/my_template.vm");
template.merge(vc, writer);
System.out.println(writer.toString());

Here you load the template, merge it (which basically matches up all the name-value pairs in the VelocityContext with all the $tags matching the names in the template), and output it. In a web application, your output would be printing it back to the browser. In my case, it was using a FileWriter to output the text to a file. I've used PHP's templating a whole lot more than Java's, but I've already found Java to be considerably more flexible in what you do with the output after it's been generated.

I hope that example helped if you wanted to start using Velocity ... or if you hadn't considered using templates. It's pretty essential.

Saturday, February 16, 2008

Java Properties Files

Yesterday I posted about using JSch to SCP a File in Java, and the code I'd written at the time was pretty simplistic. It only sent a file to a remote server, and it even foolishly had string constants in the code that specified the location of the known_hosts file and the information about the remote server. I closed with a call to action to move that to a config file. Well, I've now done that, using the simple but effective Properties files that comes built in with Java.

I added a properties file to the package, and it opened right up for editing. The basic concept is that you fill it out with name-value pairs. Here's the properties file:

knownHostsFilename=/home/sschulte/.ssh/known_hosts
numServers=1
server0_host=172.16.40.128
server0_username=user
server0_password=user

I designed it with the idea in mind that I could send files and commands to multiple servers at once. I'd just have to increment the numServers field and add a new set of server info fields. Pretty nice.

To load the properties file, I just add these lines:

Properties configFile = new Properties();
configFile.load(this.getClass().getClassLoader().getResourceAsStream("remoteappmain/serverconfig.properties"));

Note that "remoteappmain" is the name of my package, which is why it's in the path to the properties file. To get one of the values, you just call the getProperty() method, passing it the key. They're strings, so in the case of numServers I had to parse it into an integer using Integer.parseInt(). No big deal.

To simplify my classes and to standardize their interfaces, I created a RemoteAuth class that encapsulates the hostname, username, and password values. And there's now an abstract base class called RemoteConnector that the FileSender extends. The constructor takes a RemoteAuth object and a known_hosts filename.

Creating an abstract base class allowed me to quickly create another class with useful functionality. This one is RemoteExecutor, which allows you to execute a command on a remote server. Stuff like "ls -la" or "rm my_file" or anything else you might type into the command line.

RemoteExecutor re = new RemoteExecutor(auth, knownHostsFilename);
re.execute("rm AllPhysics.wmv");
re.execute("ls -la");
System.out.println(re.getResult());

That's what you'd do if you wanted to delete a file, then call ls on the directory, and display it to the screen (which is what the getResult() method does).

I think that pretty much covers what I wanted. The main thing I want to add now is a FileGrabber class so I can download a file from a server as well as upload one. I feel like these classes are going to be pretty useful; I guess I'll know more about what I missed here once I use them in a few projects and decide on what they're lacking.

Friday, February 15, 2008

JSch: SCP a File in Java

If you want to upload a file to another computer, SCP is an excellent way to go. And if you want to do it from within a Java program, your best bet is to use the JSch library from JCraft. They've implemented the SSH protocol purely in Java, and it works splendidly. I've written a nice little class that encapsulates the action of sending a file.

Using the class is pretty simple. It goes like this:

FileSender fs = new FileSender("172.16.40.128", "user", "user");
boolean ret = fs.sendFile("/home/sschulte/Desktop/AllPhysics.wmv", "AllPhysics.wmv");

The constructor takes a host, a username, and a password. The sendFile() method takes a source filename and a destination filename. It returns true if the upload was successful, false if it failed.

I developed it in Netbeans, and in order to use JSch, the first step is to download the JAR and add it as a library in your project. (When I first tried it, I downloaded the ZIP instead, and added the source to my project; this works, but it's not necessary to compile this library along with your project unless you plan to edit it. I didn't.)

Most of the code in this class is taken from the ScpTo example included in the ZIP file. I recommend reading it. Something to pay close attention to, however, is the SSH known_hosts file. Their example file doesn't take this into account, so if you just run their code, you get an unknown host exception and it doesn't work. So be sure to include the following:

String knownHostsFilename = "/home/sschulte/.ssh/known_hosts";
jsch.setKnownHosts(knownHostsFilename);

(This is, of course, after you instantiate your JSch object.)

I like it, and it works. But it definitely needs a few things before I'm satisfied.

Read the known_hosts filename from a config file, rather than compiling it into the code.
Have a config file for the host/username/password information, possibly for multiple hosts at once.
Throw exceptions for various types of failures, rather that simply returning false.

Pretty cool stuff. The JCraft guys did a great job with this library.

Monday, February 11, 2008

Using CURL to Download a File in PHP

Today at work I came across an interesting problem. I had to make a REST-style call to a backend server, which would do some processing and return a file. The filename and content type of that file is not known to me at the time I make the call, but the file must be downloaded properly directly to the client. And the file could easily be large, meaning that I can't load it completely into memory before sending it to the client (and even if I could, that would be unwise due to the double-time download).

I'd used CURL in PHP before, but was having problems getting the content type. If I typed the REST URL directly into the browser, it would download successfully, but display a useless byte stream instead of the file itself. After discovering the CURLOPT_HEADER option, I came up with the initial solution of making the call twice: the first time I just got the headers (and not the body, using CURLOPT_NOBODY), and then set the headers using header() before making a second call which would get the body. This worked ... except that the backend team informed me that it's possible that the processing could get pretty extensive, and they didn't want it to be doubled every time. Needless to say, that made my solution pretty stupid.

That's when I dug deeper and discovered CURLOPT_HEADERFUNCTION. With this, you register a callback function which will process the headers.

curl_setopt($c, CURLOPT_HEADERFUNCTION, array(&$this, 'applyHeaders'));

The third parameter here is the function that processes the header. If you're using an instance method, you pass an array containing a reference to the current object and the name of the method. In this case, my callback function looks like this:

function applyHeaders($c, $header) {
    header($header);
    return strlen($header);
}

It simply calls the header() function on the file's headers and returns the length of the header (which is necessary for the callback function to work).

This way, you can keep CURLOPT_RETURNTRANSFER false, meaning you don't have to load the file into memory and it's streamed directly to the browser, and the applyHeaders() method set the content type and filename properly, so it gets downloaded just as it's supposed to. Best of all, you only have to make one call to the backend, and the processing is only done once.

Excellent. And the documentation on this isn't the best, so I figured I should post it on here to help out. And to help me remember.

Sunday, February 10, 2008

An Early Update into the Life of a Consultant

I've now been on my current contract for about 2.5 months, so I figure it's a good time to come back and post an update.

The work isn't that challenging. The complicated problems are on the backend, and that team is constantly frantic and has been falling behind as the codebase becomes larger. Unfortunately, I'm not on that team, and can't help out. All I'm doing is making XML-RPC calls to the backend and displaying the data to the user. My days are currently spent waiting, either for something to be finished on the backend so I can finish it on the frontend, or for a bug to be found so I can fix it. On some days, neither happens.

But enough of the bad. I don't much care for complaining.

The people are great. The developers I've been working with and talking to are talented, and the managers I report to are very technical. From a personnel standpoint, this is just about ideal. However, I've begun to think that this might actually be a problem. Sooner or later, this contract will end and I'll be moving on, possibly never working with these guys again. I'll have to start over, left to hope that the people at the next contract are as talented and fun as they are at this one. This is the life of a consultant, I suppose.

This has led to me eating lunch by myself, eschewing the chance to get close to my coworkers and managers. This is probably unwise, but I feel it's a natural response to the situation I'm in, where I'll probably be leaving them soon. I should try to make an active effort to change my behavior and start spending more time socializing with my coworkers, starting at lunchtime.

And things are looking up, workload-wise. The current project is close enough to being complete that I'm apparently going to be placed onto another one this week. I think this is excellent, and hopefully I can slide over easily to the new project and get its wheels moving as fast as possible. That should feel pretty good.

The Sean Code