Terrestrial Navigation

Amtrak, B-Movies, Web Development, and other nonsense

Page 2 of 7

Measuring activity in Moodle

It’s a simple question with a complex answer: in a given academic term, what percentage of our Moodle courses are “active” (used by a faculty member in the teaching of their course). We have to start by figuring out what counts as a “course” in a term, and then come up with an inclusive measurement of activity.

Continue reading

Pick a date, any date

Moodle 3.2 introduced the concept of end dates for courses. Moodle 3.3 added a new Course Overview block which uses end dates to determinate whether a course is in progress, in the past, or in the future. This is pretty great, unless you’re in the following situation:

  • Your school has five years worth of courses
  • Those courses don’t have end dates

Congratulations—you now have five years of courses in progress. Your faculty will have five pages worth of past courses on the Course Overview block! That’s probably undesirable. To avoid it, I’m writing a plugin that lets an administrator set course start and end dates at the category level. While working on it, I ran an interesting edge case with Behat acceptance tests, reminding me that you’d best treat Behat like it’s a real user.

Continue reading

WordPress and partial content

Eighteen months ago we had an anomalous problem where video playback didn’t work on some, but not all, of our WordPress multisites. Videos wouldn’t play, or would play but wouldn’t seek. The problem was confined to local uploads embedded in a page. Videos from YouTube played fine; if you viewed the video directly playback worked as expected.

The problem turned out to be long-standing issue with how ms-files.php served up files from pre-WordPress 3.5 multisites. Solutions had floated around for years. Our problem was describing the problem with enough specificity to actually find the right solution.

Continue reading

Rolling rocks downhill

I’ve written about how we used Composer to overlay a dependency management system on our existing WordPress ecosystem. The final step was to actually deploy the site somewhere. For that we turned to Capistrano.

Continue reading

Overlaying dependency management

I’ve described how Lafayette’s deployment strategy involved pushing rocks uphill. A key change in our thinking came when we started treating each of our WordPress multisite installations as its own software project, with its own dependencies and lifecycle. Enabling this pivot was a technology which wasn’t mature in 2013: Composer.

What is Composer?

Composer is a package manager for PHP. It fills a role similar to npm for Node.js and bundler for Ruby. It uses a JSON file to capture metadata about the project and the project’s dependencies. If you have a custom application built on Symfony and Silex, your composer.json file might look like this:


{
    "name": "outlandish/wpackagist",
    "description": "Install and manage WordPress plugins with Composer",
    "require": {
        "php": ">=5.3.2",
        "composer/composer": "1.3.*",
        "silex/silex": "~1.1",
        "twig/twig": ">=1.8,<2.0-dev",
        "symfony/console": "*",
        "symfony/filesystem":"*",
        "symfony/twig-bridge": "~2.3",
        "symfony/form": "~2.3",
        "symfony/security-csrf": "~2.3",
        "symfony/locale": "~2.3",
        "symfony/config": "~2.3",
        "symfony/translation": "~2.3",
        "pagerfanta/pagerfanta": "dev-master",
        "franmomu/silex-pagerfanta-provider": "dev-master",
        "doctrine/dbal": "2.5.*",
        "knplabs/console-service-provider": "1.0",
        "rarst/wporg-client": "dev-master",
        "guzzlehttp/guzzle-services": "1.0.*"
    },
    "bin-dir": "bin",
    "autoload": {
        "psr-4": {
            "Outlandish\\Wpackagist\\": "src/"
        }
    }
}

The real action here is in the require block, where we spell out all the different packages that compose our application. Each key/value pair is an individual package and its version constraint. Composer users semantic versioning, and supports a wide range of expressions. Here are some quick examples:

  • “*”: wildcard; highest version possible (probably don’t ever do this)
  • “2.*”: highest version within the 2.x major version
  • “~1.2”: highest version between 1.2 and 2.0

The key name is the name of the package, in the format vendor/project name. By default Composer assumes that you’re installing these packages from Packagist, but as we’ll see that’s just the beginning of the Composer ecosystem.

What is Packagist?

Packagist is a centralized Composer repository which anyone can register packages on. It’s full of packages like the ones listed in the example above. Given a specific package and version constraint, it returns the matching files. Packagist is special inasmuch as it’s available by default for every Composer project, but it’s possible to define additional repositories if you have packages which aren’t in Packagist.

What Repositories, where?

Let’s say you’ve got a private project that you can’t publish to Packagist, but you’d like to make it available to an internal Composer project. Composer has a number of options for doing this. The simplest is adding your version control repository (VCS) as a repository to your project:


{
  "name": "yourcompany/yourproject",
  "description": "Your sample project",
  "repositories": [
    {
      "type": "vcs",
      "url": "https://github.com/yourcompany/someotherproject"
    }
  ]
}

This means that Composer will scan that repository, in addition to Packagist, when searching for packages. All you need to do to make this work is to add a reasonable composer.json file to that private project.

This is fine for one or two packages, but becomes unwieldy with a dozen or more. This is where a project like Satis becomes useful. Satis transforms a repository block into a lightweight Composer repository. This way, your internal projects need to include the Satis repository only—as you add new VCS repositories to Satis they become available to your downstream projects:


{
  "name": "yourcompany/yourproject",
  "description": "Your sample project",
  "repositories": [
    {
      "type": "composer",
      "url": "https://satis.yourcompany.com"
    }
  ]
}

What’s this got to do with WordPress?

Composer’s structure lets you overlay a dependency management system on existing code without too much pain. With your own private packages adding composer.json files is straightforward. Obviously you’re not going to get composer.json files added to every package on the WordPress theme and plugin repositories.

Fortunately, this is a solved problem. The fine folks at Outlandish maintain the WordPress Packagist composer repository. WPackagist scans the WordPress theme and plugin subversion repositories, mapping projects and versions in a way that composer can understand. The themes and plugins themselves are downloaded as zip files and extracted. At a stroke, all themes and plugins from WordPress.org are available to your project along with your private projects:


{
  "name": "yourcompany/yourproject",
  "description": "Your sample project",
  "repositories": [
    {
      "type": "composer",
      "url": "https://satis.yourcompany.com"
    },
    {
      "type":"composer",
      "url":"https://wpackagist.org"
    }
  ]
}

This is far more efficient than converting repositories to git, and you arrive at the same end result: specifying a known version constraint for a given project. Next up: how you actually deploy a WordPress site with this stuff.

Don’t push rocks uphill

Pictured: developer pushing rocks uphill

For three years Lafayette’s official WordPress deployment strategy was to push rocks uphill. This was a doubtful plan, but it represented an improvement over its predecessor, which was to stare at the rocks doubtfully, then roll them around a field at random. Here follows a warning to others.

Git all the things!

In 2013 we had embraced git with the fervor of the converted. Applying this to WordPress was difficult. WordPress.org gave us two options for getting themes and plugins:

  • Download them as a ZIP file manually
  • Clone them from subversion

These weren’t great options. We weren’t going to adopt a strategy which incorporated subversion as part of the deployment itself, and we didn’t want to lose revision information with a manual download process.

We hit upon the strategy of pushing rocks uphill. We setup a platform to clone the WordPress themes and plugins we need and then convert them from SVN to git, using the svn2git ruby gem. We pushed the result into a private git repository. This git repository was then added as a submodule to our WordPress deployment repository.

This was cumbersome and time-consuming. The WordPress theme and plugin SVN repositories are massive. The initial conversion of a module could take hours, or just fail. Repository structures varied according to the whims of the plugin maintainers. Tagging was inconsistent. WordPress doesn’t encourage atomic committing to SVN, which undercut the value of having commit messages. Maintaining a private repository for each theme and plugin added significant overhead.

Submodules: threat or menace?

Deployment in progress. Taken by Peter Standing [CC BY-SA 2.0], via Wikimedia Commons

We haven’t even talked about submodules. With a submodule you nest a git repository inside a git repository. The top-level repository has a .gitmodules file which tracks the location of the submodule remote; the revision history tracks which commit should be checked out in the submodule.

In a sample WordPress deployment, you would have your WordPress git repository, and you would attach your theme and plugin submodules. You then clone this repository on to your web server, and update from git as needed. This works, but it’s not as slick as it sounds.

Time was I wouldn’t hear a word said against them. That time is past. They’re a kludge and should be used sparingly. Most of “Submodules considered harmful” think pieces focus on their use in development. Here’s a couple: Why your company shouldn’t use Git submodules and Git: why submodules are evil. Their use is more defensible in a deployment context, but there are still problems:

  1. When you clone a git repository which has submodules, the submodules have to be initialized and updated separately.
  2. When you update a repository, the submodules have to updated as well. A sample command would be git submodule foreach git fetch —all.
  3. With deployments, you’re now worried about the state of each git repository, submodule or no.

With a collection of shell scripts this is manageable, but again it’s a lot of overhead. It also doesn’t self-document very well. Looking at my project repo, I can run git submodule status and get a mixture of useful and not useful information:


 20e6e064792e9735157d88f97eece9c5aef826a8 wp-content/plugins/conditional-widgets (2.2)
 90c74decfb020fdaa255bba68acb142550dfac35 wp-content/plugins/contact-form-7 (4.4.1)
 13cfa86ceb8001438b0fec9ea3a5a094d42e2397 wp-content/plugins/custom-field-template (2.3.4)
 292e34607378de0b4352ba164ccf7e1ecdaa44e9 wp-content/plugins/mathjax-latex (v1.1-59-g292e346)

This is okay as far as it goes, but I’m at the mercy of what’s in the submodules. I can’t rely on it to be human-readable, and I can’t use them for anything. If I want to update a module, I’ve got to push the new code through the pipeline, then update the submodule on my local machine, commit, and then send it out to the web server.

Also, you can’t easily remove a submodule, which you might want to do if you’re deleting a plugin. The last time I did this, I had to follow this process:

  1. Using a text editor, delete its entry from .gitmodules
  2. Stage that change
  3. Using a text editor, delete its entry from .git/config
  4. Run git rm --cached path/to/submodule
  5. Run rm -rf path/to/submodule
  6. Commit everything
  7. Push to your repository
  8. Repeat steps 3-5 when updating on the remote

A developer removes a submodule

Rocks. Uphill. Pushing. Allegedly this has gotten better in the last few years, but (a) we’ve moved on and (b) the version of git is low enough on our servers that it wouldn’t matter anyway.

Do you have a better idea?

The mistake here was forcing the WordPress modules into an unnatural path. They have versions on the WordPress.org repository; we should have adopted a method that leveraged that, rather than re-create that method in our own private repositories. Next up: overlaying dependency management.

Apparating a source

This is a story about adding a little knowledge to the public internet.

Three M-K TE70-4S locomotives on their initial run in 1978. Photo by Roger Puta.

A few weeks ago I went looking for information on the Morrison-Knudsen TE70-4S diesel locomotive. It’s an oddball; a rebuild of a General Electric U25B with a Sulzer engine. Only four conversions were done, all for the Southern Pacific Railroad, and they were unsuccessful. There isn’t much information on the internet about them.

Googling turned up a curious reference: Application of Sulzer 12ASV 25/30 Diesel Engines to M-K TE70-4S Locomotives. 12 pages by a J. G. Fearon in 1979 devoted to the subject, published at the time of the conversion. Sounds great. Except…

  • This is the only search result for this title
  • No one offers this “book” for sale
  • No library counts it among its holdings, although it’s in WorldCat

What the hell is this thing?

In the course of researching the Turboliner and the General Electric E60 electric locomotive, I spent some time with industry journals such as Mechanical Engineering, Transportation Research Record, and the various conference proceedings that are published by IEEE. The functional title certainly sounded like a conference paper.

I drew a blank on IEEExplore, but full text has reached 1988 only, with exceptions. Next up was HathiTrust. HathiTrust can be irritating to work with: it doesn’t offer full-text for many things, and when searching journals the results aren’t more specific than the volume, which makes it hard to identify items for inter-library loan.

The first interesting result was for Conference papers index, which reported a hit for 1979. That confirmed my guess (without details) that Fearon, who worked for Morrison-Knudsen, presented on the topic.

The second was a hit for both the author name (common enough) and the locomotive name (very uncommon) in the same volume of Mechanical Engineering. A reel of scrolled microfilm later, and I had an entry in the “Technical Digest” (meeting abstracts) from ASME, and there it was: J. G. Fearon, “Application of Sulzer 12ASV 25/30 Diesel Engines to M-K TE70-4S Locomotives.” Presented at the ASME Energy Technology Conference, November 5-9, 1978. Either it was also issued in pamphlet form, or someone at Google erred. Either way, I was grateful for the pointer.

The next step was requesting the conference proceedings. Time passed. The proceedings arrived and…no joy. Turns out there’s an important difference between the “Petroleum Division” and the “Diesel & Gas Engine Power Division” of the ASME, although both were present at this conference. I’m sure there’s a story. I put in a request for those proceedings and crossed my fingers.

No dice; only one library has those proceedings in that format, and they weren’t giving them up. After various blind alleys I discovered, somehow, that “Paper”, published by the ASME, is a recognized serial by a couple dozen libraries in the United States (ISSN 0402-1215). This serial encompasses all the papers given at ASME conferences throughout the year. You can see this record from Cornell for example of what a nightmare the various ASME codes and numbers are.

The next step was identifying a library which had holdings for DGP from 1978, including paper No. 15. At this point I cheated: I knew someone at a library which held it and was able to bypass the ILL process. I’m pretty confident, however, that an ILL request for that paper and author, specifying the ISSN, year, and paper code, would ultimately have been successful.

The moral of the story is that there’s a wealth of information which, even now, is effectively buried unless you really know where to look. For that unusual Google Books entry, I’m not sure I’d have thought to throw the query at HathiTrust, though I will from now on. Even then, I caught a break that someone had digitized the conference paper index. There are plenty of journals and books which have never been digitized and are undiscoverable via the public internet.

Character recognition with PDFs in OS X

I’ve started using Tesseract to add an optical character recognition (OCR) layer to PDFs. What follows are my notes on getting this to a reasonable state, and a word of warning about Preview on Sierra.

Background

I’ve written about my collection of articles before. They’re all PDFs and indexed in Zotero. Their source various: some are distillations of digital documents, some are scans from print or microfilm. Some, but especially the latter, haven’t been through OCR so they aren’t searchable. That’s not a big deal in a 1-2 page article, but in longer works it’s obnoxious. Adding OCR also exposes the text to Spotlight, OS X’s internal search.

Tesseract

Tesseract is the de facto standard for open source OCR solutions. It’s installable via homebrew. That’s easy. I discovered pretty quickly that Tesseract doesn’t work with PDFs out of the box. I wasn’t averse to building a script that pulled apart the PDF, converted each page to a TIFF, did the OCR, and then put it all back together, but I figured someone had crafted a more elegant solution.

pdfsandwich

Per Cornelius’ excellent tutorial, the more elegant solution is Tobias Elze’s pdfsandwich, which wraps all that, plus plenty of additional functionality which I don’t fully grok yet. The code is available via SVN: svn checkout svn://svn.code.sf.net/p/pdfsandwich/code/trunk/src pdfsandwich

You’re going to need a few more dependencies from homebrew, in addition to tesseract:

Those are all the dependencies I needed to install; a bare system probably needs more.

Examples

You invoke pdfsandwich from the command line. pdfsandwich —help gives a rundown of the many options. This is the most basic invocation:

pdfsandwich -lang eng <filename>

This will process the given file and output a new file with the OCR layer included. By default the new file is named . If the file has images you’ll want an additional argument to ensure they’re overlaid correctly. I used the -gray flag for grayscale documents and -rgb for color. Note that while pdfsandwich use multiple threads for processing a large document (50+ pages) will take at least several minutes on a pretty fast MacBook.

By default the command spits out quite a bit of information which you’re free to ignore. Once it’s done you’ll have a new PDF and the text is searchable.

Beware Preview

There’s a pretty nasty bug on OS X which, depending on who you talk to, was introduced in Sierra or has been around for years. In a nutshell, saving a PDF in Preview can corrupt the OCR layer. The text will no longer be searchable nor copyable. Despite reports to the contrary it’s not fixed (at least not for me) in 10.12.3. I’ve taken the precaution of locking the PDFs for now. I’m sure I’ll forget in two months time, but it’ll solve the immediate problem.

How I file things

At the end of my blog post about Hammerspoon and case conversion I had this aside: ” …it works well for storing electronic articles and books.” I’m going to expand a little on that and how it fits into my scheme for organizing information. Hat-tip to nateofnine for prodding me into sharing this.

The Library of Zotero

Spreadsheet of citations

A typical Zotero collection. This one contains everything tagged for the GE E60 electric locomotive. Exciting, right?

I use Zotero to keep track of information, mostly related to railroading history. I wrote a post last year about developing a custom translator for importing articles. As of writing I have 1193 1200 items indexed in Zotero: books, chapters, journal and newspaper articles, blog posts, doctoral theses, maps, etc.

Zotero lets you attach things to an entry such as notes or tags. You can also link an external resource. This might be a website, or a link to the a local copy of the document if one exists. Zotero lets you sync content to its cloud, but the free tier is limited to 300 MB. If you’re storing article PDFs you’ll exceed that pretty quickly. Leaving aside HTML snapshots, I have 249 articles consuming ~2.5 GB of space.

Let the data flow

Electronic documents come from all over the place—interlibrary loan, online databases, websites, scans I’ve done myself of physical media that I possess but need to store. The only common factor that is that they become a PDF and I need to organize them.

I start by having a folder structure with a top-level folder named Articles. Beneath that I organize by publication, with definite articles removed to permit natural sorting (thus The New York Times becomes New York Times). Within each publication I adopt a naming convention of DATE-NAME. Date is of the format YYYY, YYYYMM, or YYYYMMDD, depending on the granularity of the publication. A monthly journal, for example, will go no further than YYYYMM. The purpose of this is to ensure chronological sorting when looking at the articles outside of Zotero.

For the NAME, I fall back on my Hammerspoon case conversion module. I’ve already created an entry for the item in Zotero, with its full title. I throw that title into the case conversion, get the slug, and append it to the date. This gives me a filename that sorts by date, is human-readable if need be, and is easy to manipulate from the command line. For example, J. David Ingles’ article in Trains magazine from the May 1979 issue entitled “How super are the Superliners?” becomes 197905-how-super-are-the-superliners.

Putting it all together

The file created, I drop it into the appropriate publication folder, then use the “Add Attachment > Attach Link to File” option in Zotero to associate it with the index. Now, when I double-click on the item in Zotero, it’ll open the file for me to read. The Articles directory tree lives on my Nextcloud which means that (a) there’s redundancy in case something happens to my laptop and (b) I can read the articles on my phone, without needing to have Zotero installed.

 

Featured image by Sue Peacock (card catalogs) [CC BY-SA 2.0], via Wikimedia Commons

Writing LDAP unit tests for a Moodle plugin

In 2016 Lafayette College began maintaining the LDAP Syncing Scripts (local_ldap) plugin after the tragic death of the previous maintainer, Patrick Pollet.

I didn’t know Patrick but he had a strong reputation in the Moodle community. I’m pleased to say that we made few substantive changes to his code. Most of the changes were simple updates, such as migrating the command-line/cron scripts to Moodle’s task infrastructure, and various nit-picky code standards issues which didn’t affect functionality.

PHPUnit

The biggest lift was implementing PHPUnit test coverage for the plugin. I started out with the following requirements:

  • Fully-scripted setup for OpenLDAP, so that the tests can run inside a continuous integration environment
  • Test coverage for group synchronization
  • Test coverage for attribute synchronization

I started this project by building an OpenLDAP environment inside Moodle Hat, the Vagrant development profile I maintain. Implementing a configuration in Puppet is good practice for wrestling with Travis.

Starting from scratch with OpenLDAP (every time!) presents certain challenges that you don’t encounter in a mature environment. A few I encountered:

  1. When you bootstrap OpenLDAP it has a completely empty schema. PHP’s ldap libraries can’t talk to it in that state. You have to populate it with some data, even if it’s completely arbitrary.
  2. Selection of backend databases matter. LDIF is the quick and easy path, but it doesn’t support pagination and Moodle will break in obscure ways. I chose bdb because it’s available in most repositories and it worked.
  3. When you’re setting a generic testing password in your slapd.conf you can just dump in rootpw SomeArbitraryPlaintextPassword and it’ll work. Don’t run in production! Or, really, anywhere that has state.

Once I’d worked through those issues Christian Weiske’s invaluable blog post provided everything I needed for implementing on Travis.

Travis

LDAP Syncing Scripts leverages Moodlerooms’ excellent moodle-plugin-ci plugin for travis-ci integration, with a few tweaks. The full travis-ci.yml file is visible on the GitHub repository; let me walk through a few things.

We need the slapd and ldap-utils packages installed. To use Moodle’s built-in LDAP PHPUnit testing we need to define the location of the test server in the config file:

define("TEST_AUTH_LDAP_HOST_URL", "ldap://localhost:3389");
define("TEST_AUTH_LDAP_BIND_DN", "cn=admin,dc=example,dc=com");
define("TEST_AUTH_LDAP_BIND_PW", "password");
define("TEST_AUTH_LDAP_DOMAIN", "dc=example,dc=com");

We need to create an INI file to force PHP (in travis) to load the ldap extension, and a slapd.conf file to define how our OpenLDAP enviroment will function. The schema settings need to match what you added to Moodle’s config.php. We start slapd and then, as the final step, import our default data. This data isn’t used, but it gets around the problem of an empty schema. Note that while this data is stored as an ldif file for readability purpose, the backend is bdb.

Tests

The actual tests I derived from the tests for Moodle’s auth_ldap plugin. The code is long but self-documenting. There are no particular gotchas, though I found it helpful to extend auth_ldap_plugin_testcase instead of starting fresh.

« Older posts Newer posts »