Parsing Quoted Strings in Ruby

  • Posted by Mike Naberezny in Python,Ruby

    Python has a nice module in the standard library called shlex that parses strings as a Unix shell would. Here’s a Python interpreter session demonstrating its usage:

    >>> import shlex
    >>> shlex.split('foo "bar baz" qux')
    ['foo', 'bar baz', 'qux']

    It’s useful for creating your own mini-languages or external DSLs that need to parse quoted strings like the one shown above.

    We recently built an inventory tracking application in Ruby that has a user interface for selecting search filters. To expose the same search capabilities as a web service, we created a simple query language. I was looking for an shlex equivalent in Ruby.

    It turns out that the Ruby Standard Library has a module called Shellwords:

    >> require 'shellwords'
    => true
    >> Shellwords::shellwords('foo "bar baz" qux')
    => ["foo", "bar baz", "qux"]

    Shellwords is a little less capable than shlex but handles the most common use case just fine. It’s a convenient solution for a problem that comes up too often.

    Update: Ruby 1.8.7 has added the shortcut methods Shellwords.split and String#shellsplit. Very nice.


  • comment by Henrik N 5 May 08

    Note that

    >> Shellwords.shellwords %{“a””b”}
    => [“ab”]

    though. That makes sense in some contexts (shells) but not in others (Flickr-style tag parsing) where you might be tempted to use this lib.

    See my post and the comments for more.

  • comment by Mike Naberezny 5 May 08

    Good catch. Interestingly, tagging a photo on Flickr itself with your string of "a""b" also results in one tag, ab.

  • comment by Henrik N 6 May 08

    Mike: Ooh, full circle. Cool, didn’t try that.

  • comment by Taylor Carpenter 20 Apr 09

    >> s = “a b “c d” e”
    >> s += ” f ‘g h’ i”
    >> s
    => “a b “c d” e f ‘g h’ i”

    >> a = s.split(%r{[“‘]([^”]*)[“‘]|\s})
    >> a.delete(“”)
    >> a
    => [“a”, “b”, “c d”, “e”, “f”, “g h”, “i”]

Sorry, the comment form is closed at this time.