🔗 How to write Lua modules in a post-module() world

Our beloved function module() is really going away. As of Lua 5.2 it’s only available with compatibility flags on, and the writing’s on the wall: it is going away for good in Lua 5.3. So, in a new Lua project I wrote this past semester, I decided to write it without using module(), while making sure my code runs on both Lua 5.1 and 5.2 (as a side result, I started the compat52 project, which allows you to write code in a well-behaved Lua 5.2 style and make it run on both 5.1 and 5.2).

So, why I liked module() in the first place? While vilified by some, I think the pros of module() largely trumped its cons. It had indeed some nice properties:

It provided much-needed policy for interoperability between modules - for the first time people were mostly writing Lua modules the same way and they worked with each other
It encouraged documenting the module name in its argument, which is especially useful in a world without clearly defined path policies. One would often find out where to put the module by looking at its name. Nevermind the ill-advised suggestion of writing module(...) “so that you can name the module whatever you want” — users of a module must agree on a name so that all other modules that need it can call require() properly!
It pushed modules that return a table through require() — while Lua’s mechanisms for modules were too lax and resulted in spilling globals too often, consistent use of module() meant that you could rely on writing local foo = require("foo"), which is “environmentally” clean idiom, albeit a bit repetitive.
You could nicely tell visibility through syntax: private functions declared with local function, public functions with function.
Apart from the awkward package.seeall argument, use of module() was pretty-much boilerplate-free (I hate repetition in code, from the ugly local print = print idioms in Lua to the redundancy of .h and .c files in C).

So, how to try to retain some of these properties without module()? The solution I found was to adopt some bits of policy, which I list below.

Yes, bits of policy. I know many in the Lua world hate policies, but of course I’m not putting a gun against anyone’s head to follow them. I’m only sharing what works for me and hopefully you may find some use. And don’t worry it’s nothing too esoteric, and it’s mostly cherry-picking some established practices.

Starting from the outside in

Keeping in mind that the goal of a module is to be required by client code, this is how a module foo.bar will be used:

local bar = require("foo.bar") -- requiring the module

bar.say("hello") -- using the module

An interesting observation here is that although we have a hierarchical structure of modules, the practice of loading them into locals means that in use they have to be accomodated in a flat namespace. So here’s Policy Bit #1:

Policy Bit #1: always require a module into a local named after the last component of the module’s full name.

Don’t do stuff such as local skt = require("socket") — code is much harder to read if we have to keep going back to the top to check how you chose to call a module.

Naming modules

Now that you know that your module will end up in people’s locals, please take that into consideration when naming your module. (I wish we had a capitalization policy to separate that nicely, but naming things LikeThis in Lua tends to be used only for object-oriented code.)

The idea is to choose a name that strikes a balance between convenience and uniqueness, and that is usable. And what better way to achieve this other than using this name. So, here’s Policy Bit #2, let’s use the module name in its declaration!

Policy Bit #2: start a module by declaring its table using the same all-lowercase local name that will be used to require it.

So, in the beginning of module foo.bar (which will live in foo/bar.lua), we begin with:

local bar = {}

It’s not a nice self-documenting header as we used to have with module("foo.bar", package.seall), but it’s something. We can improve that with LDoc comments:

--- @module foo.bar
local bar = {}

Don’t name your module something like “size”.

Declaring functions

When I’m scrolling through source code, I like to be able to tell what’s the sphere of influence of the piece of code I’m looking at. Is this a tiny helper function that’s only used below this line in this file? Is it an important function that’s used by other clients, so that an added or removed argument would mean API breakage? Ideally I like to be able to tell that without running back and forth in the code, so I really like visibility to be explicit in the syntax.

We must not declare global functions (or globals of any type, really!) in our modules, so using “globals vs. locals” to tell the difference won’t cut it. We have some alternatives, though. But first, let’s assert one thing:

Policy Bit #3: Use local function to declare local functions only: that is, functions that won’t be accessible from outside the module.

That is, local function helper_foo() means that helper_foo is really local.

This sounds obvious, but there are advocates of declaring all functions, public and private, as local functions and then writing an “export list” at the bottom of the module. Reading code written like this feels to me like a thriller with a twist ending: “haha, I was a public function all along!”

How to write public functions then? We must not declare global functions, but there are alternatives. Say we’re writing a function that will be used in client code as bar.say("hello"). It’s nice that we can declare it just like that:

function bar.say(greeting)
   print(greeting)
end

Policy Bit #4: public functions are declared in the module table, with dot syntax.

Visibility is made explicit through syntax. This is the same idea advocated by those who tell you to name your module tables “M”, except that you’re eating your own dogfood and using the name you expect your users to use. It’s also more consistent, since calls of say() are written bar.say() everywhere, instead of say(), M.say(), etc. (Also, “M.” looks really really ugly and people can’t decide if they want to use “M” or “_M”.)

In case you have speed concerns about having your calls go through the module table: first, this is what your users will go through; second, this is no different than using colon-syntax and dispatching through self and nobody complains about that; third, if you really need it (and have a benchmarked case for it), sure go ahead and make locals for optimization clearly marked as such; fourth, if you’re really after speed you’re probably using LuaJIT and last I heard the value of caching functions into locals is put into question there.

Classes and objects

When talking about classes and objects, it’s then time to talk about things named LikeThis. (If you don’t do OOP, feel free to skip this section!)

As we did above, let’s start look from the outside in: how to instantiate an object. There are two common practices (oh why I am not surprised :( )… you either make a class table with a “new” method, or make the “class object” callable (as a function or a table with a __call metamethod — wait, that makes it three practices…)

local myset1 = Set.new() -- style 1
local myset2 = Set() -- style 2.1 (set is a function)
local myset3 = Set() -- style 2.2 (set is a table)

If your module represents a class, I tend to like style 1 better because:

it keeps the invariant that modules are tables
it’s easy to store “static” methods as the other functions of the table
it’s less magic — I often run modules through for k,v in pairs(bar) do print(k,v) end in the interactive prompt to get a quick look of what they export.
it just screams “I’m creating an object”

If all your module does is define a class, I guess it makes sense to name the module file MyClass.lua and have the class table be the module table. But I prefer not to do that, because often what we store as “static” class methods in purely OOP languages are really module functions. I still use the uppercase table when implementing the class, like this:

--- @module myproject.myclass
local myclass = {}

-- class table
local MyClass = {}

function MyClass:some_method()
   -- code
end

function MyClass:another_one()
   self:some_method()
   -- more code
end

function myclass.new()
   local self = {}
   setmetatable(self, { __index = MyClass })
   return self
end

return myclass

It’s easy to see in the code above that the functions with MyClass in their signature are methods. Sometimes it’s nice to declare the functions as fields inside the table declaration, but declaring methods separately as in the example above allows you to keep local helper functions closer to where they’re used.

If all the module does is declare the class, the class and module table may be one and the same. If you want to use style 2, we get something like this:

--- @module myproject.MyClass
local MyClass = {}

function MyClass:some_method()
   -- code
end

function MyClass:another_one()
   self:some_method()
   -- more code
end

local metatable = {
   __call = function()
      local self = {}
      setmetatable(self, { __index = MyClass })
      return self
   end
}
setmetatable(MyClass, metatable)

return MyClass

Both methods are acceptable, as long as it’s easy and obvious to tell you’re doing OOP:

Policy Bit #5: construct a table for your class and name it LikeThis so we know your table is a class.

Policy Bit #6: functions that are supposed to be used as object methods should be clearly marked as such, and the colon syntax is a great way to do it.

Don’t make people reading your function have to guess (or look up) if it is a method, a public module function or a local function.

Wrapping up

Return the module table. It’s a bit of boilerplate, but it’s what we have to deal with in a module()less world:

return bar

Policy Bit #7: do not set any globals in your module and always return a table in the end.

To sum it all up, a complete module foo.bar would look like this:

--- @module foo.bar
local bar = {}

local function happy_greet(greeting)
   print(greeting.."!!!! :-D")
end

function bar.say(greeting)
   happy_greet(greeting)
end

return bar

The result is that we type a bit more than we did with module(), and we risk polluting the global namespace if we’re not careful, but with this set of policies, we have:

fairly self-documented code
visibility rules readable through syntax
modules that predictably return tables
as much consistency and as little boilerplate as possible

…which mostly matches what I liked about module(), to the extent that can be done without _ENV tricks.

I’ve been using these policies successfully in a university project, and my plan is to follow them when I update the LuaRocks codebase to drop the use of module(). Consider your self encouraged to adopt some or hopefully all of them, but most importantly, whatever you do, be consistent! Good luck in this brave post-module() world!

Posted by hisham on Thursday, January 2, 2014 16:18:28 in en_US, Coding, Computing, Language

🔗 Mounting the SD card from your Android device on Linux over WiFi

Quick-and-dirty notes so I remember how to do it later. This is may all be mostly automatic in some distros (Ubuntu?) but here’s how to do it “by hand”. I may streamline the process myself for my machine in the future, but this is a minimal process that works.

My setup:

Samsung Galaxy S3 SGH-T999, not rooted
my own hacked up version of GoboLinux, so this should work on any distro

The steps:

Install Droid NAS on your phone. Their description say it “most probably […] won’t work” on Linux, but it is based on Bonjour (aka mDNS protocol) and SMB, and there are Linux clients for that.
Create a directory for your local mount point (mine is /Mount/Phone) and then mount it like this:
```
mount -t cifs -o port=7777 //192.168.0.106/"SD Card" /Mount/Phone
```
(You can find your phone’s IP with avahi-browse -altr, if avahi-daemon is running).

TODO: convince mount.cifs to use a logical name instead of IP there. Might need to use avahi and nss-dns for that; by the way, if you get an error such as “Connection “:1.10” is not allowed to own the service “org.freedesktop.Avahi” due to security policies in the configuration file” when using avahi-daemon, follow the instructions found here).

Posted by hisham on Sunday, May 5, 2013 20:49:27 in en_US, Computing

🔗 Git Cheat Sheet

I have a hard time memorizing certain unfrequent tasks on Git. I’ll write them down here as I learn them so I can look it up later.

Undo a commit

If you just made a commit (perhaps typing git commit -a too eagerly) and realize you made a mistake, use this to uncommit the modified code, so it shows as “modified:” in git status again and appears again in git diff:

git reset HEAD~

I even added an alias git undo-commit, like this:

git config --global alias.undo-commit 'reset HEAD~'

Getting the last commit from another branch

When working in an alternative branch, you can retrieve the latest commit from another branch with this:

git cherry-pick other-branch

Example:

$ git checkout anotherBranch
$ git branch
* anotherBranch
  master
$ edit some files...
$ git commit -a
$ git checkout master
$ git branch
  anotherBranch
* master
$ git cherry-pick anotherBranch

Interactively select which edits should be committed

After making several edits across files, sometimes you want to add them as separate commits. Committing files separately is easy, but sometimes various edits in the same file should be different commits (or some changes to a file are ready to be committed but others are not). To commit part of the changes to a file, use:

git add -p

From the manual:

“Interactively choose hunks of patch between the index and the work tree and add them to the index. This gives the user a chance to review the difference before adding modified contents to the index. This effectively runs add –interactive, but bypasses the initial command menu and directly jumps to the patch subcommand.”

Posted by hisham on Monday, March 26, 2012 19:35:00 in en_US, Coding, Computing

🔗 Advice for a beginning Linux programmer

A piece from a private message I posted in a forum, which I thought it could be useful to share.

As advice for somebody just starting out writing system utils, what resources have you found the most useful/valuable over the years? What is worth spending lots of time to understand? Best books? Advice that you wish you knew 8 years ago?

Oh, good questions. I can only think of general pieces of advice; you may already know some, most or all of them, but let’s see:

If you want to write system utils, do it in C. Yes, sometimes it’s annoying, pointers, cumbersome string handling, etc. I like other languages better too. But it’s the most portable option and the only thing that’s guaranteed to stay around in Unix in the long run. You’re writing code today but you may end up using it 15 years from now. Today’s Python may be the 90’s Tcl. I do write stuff in other languages (actually, my research work was in the Programming Languages field, so I’m all for language diversity) but if you’re doing system stuff which you want to be as widely available as possible, do it in C.
- The so-called “pitfalls” of C are more a matter of discipline than anything else. Whenever you allocate a structure to start passing pointers of it around, define which one pointer will be the “owner” of that structure; the one responsible for deleting it. Make sure other pointers don’t outlive the owner and make sure the owner frees the memory. It’s no black magic.
- One of my motivations for writing htop was actually improving my C skills, which were weak (and some of the early parts of htop still show it, look at the amount of asserts in Vector.c), but nowadays I’m work as a C coder.
- If you’re writing for Unix only, C99 is fine (and much nicer to write than ANSI C). If you’re planning to support Visual Studio, stick to ANSI C. Depending on what you’re doing you can get away with C99 and using GNU compilers on Windows, too.
Learn the GNU Autotools. Same thing: it’s even more annoying than C, but in the end it’s what provides the best experience to end users. You do get a lot of features for free, such as support for cross-compiling, etc. Things you may not need now, but that will pop up as feature requests once you have a user base. And to grow a user base, you want your installation to be just “type in ./configure; make; sudo make install” and not “install this-and-that build tool”. Distro packagers will thank you, and cooperation with them is fundamental to get your app out there.
- A neat side effect of learning Autotools as a developer is that you develop skills in troubleshooting problems when installing other packages. Autotools is ubiquitous, so being proficient at it is another professionally useful skill.
Similarly, if you’re writing libraries, learn Pkgconfig. If you’re using libraries, learn how to use Pkgconfig from within Autotools. Pkgconfig is nice and will make your life much simpler, the more you use it.
I haven’t used many physical books, but there are tons of online resources that will help you with the above. The “Autobook” is something you’ll quickly end up in right after your first Google searches for Autotools stuff.
- man-pages are your friends. You’ll code more efficiently by refering to them (when you need to check the order of the arguments for strncmp for the 1000th time) instead of relying on online search for everything (which is more distracting).
- I use pinfo as my man reader (with “alias man=pinfo -m” in my shell) because it makes hyperlinks between man-pages. Very efficient.
You don’t need to use an IDE, but that doesn’t mean you shouldn’t rely on debugging tools.
- Valgrind is your best friend. In comparison, C programming was miserable before it.
- Build with “-g”, run with “ulimit -c unlimited” in your shell at all times. Learn to load core files with gdb and examine backtraces for post-mortem analysis when your program
- If you decide to open the can of worms of multithreading, you can set custom thread names with the prctl command (and then examine them with nice names in htop)
- Other tools I use often: strace, lsof.

That’s all I can think of for the moment, I hope that helps!

Happy hacking!

Posted by hisham on Friday, November 25, 2011 19:41:41 in en_US, Coding, Computing

🔗 How to make the Print dialog use PDF by default in Firefox

Go to about:config
Type “print_to_filename” in the search box
For all entries that appear, change their values to .pdf

Posted by hisham on Tuesday, September 20, 2011 19:44:15 in en_US, Computing

🐘 Mastodon ▪ RSS (English), RSS (português), RSS (todos / all)