R’s Scoping

by

[Update: 10 September 2010 I didn’t study Radford Neal’s example closely enough before making an even bigger mess of things. I’d like to blame it on HTML formatting, which garbled Radford’s formatting and destroyed everyone else’s examples, but I was actually just really confused about what was going on in R. So I’m scratching most of the blog entry and my comments, and replacing them with Radford’s example and a pointer to the manual.]

A Better Mousetrap

There’s been an ongoing discussion among computational statisticians about writing something better than R, in terms of both speed and comprehensibility:

Radford Neal’s Example

Radford’s example had us define two functions,

> f = function () { 
+     g = function () a+b
+     a = 10
+     g()
+ }

> h = function () { 
+     a = 100
+     b = 200
+     f()
+ }

> b=3

> h()
[1] 13

This illustrates what’s going on, assuming you can parse R. I see it, I believe it. The thing to figure out is why a=10 was picked up in the call to g() in f, but b=200 was not picked up in the call to f() in h. Instead, the global assignment b=3 was picked up.

RTFM

Even after I RTFM-ed, I was still confused.

It has a section 10.7 titled “Scope”, but I found their example

cube <- function(n) {
    sq <- function() n*n
    n*sq()
}

and the following explanation confusing,

The variable n in the function sq is not an argument to that function. Therefore it is a free variable and the scoping rules must be used to ascertain the value that is to be associated with it. Under static scope (S-Plus) the value is that associated with a global variable named n. Under lexical scope (R) it is the parameter to the function cube since that is the active binding for the variable n at the time the function sq was defined. The difference between evaluation in R and evaluation in S-Plus is that S-Plus looks for a global variable called n while R first looks for a variable called n in the environment created when cube was invoked.

I was particularly confused by the “environment created when cube was invoked” part, because I couldn’t reconcile it with Radford’s example.

Let’s consider a slightly simpler example without nested function calls.

> j =10
> f = function(x) j*x
> f(3)
[1] 30
> j =12
> f(3)
[1] 36

This shows it can’t be the value of j at the time f is defined, because it changes when I change j later. I think it’s actually determining how it’s going to find j when it’s defined. If there’s a value of j that’s lexically in scope (not just defined in the current environment), it’ll use that value. If not, it’ll use the environment of the caller. And things that go on in subsequent function definitions and calls, as Radford’s example illustrates, don’t count.

Am I the only one who finds this confusing? At least with all your help, I think I finally understand what R’s doing.

13 Responses to “R’s Scoping”

  1. Andrew Gelman Says:

    Hey, Bob–you should be posting this stuff on our main blog now!

    • lingpipe Says:

      HQ’s still working out brand management issues. I think a post like this one would’ve made sense on your blog. I’ll start posting there soon.

      I’m both excited and intimidated by the size of your audience.

      Luckily, I don’t mind being wrong in public (once per topic). Especially when I can get tutelage from the likes of Radford Neal!

  2. Ken Williams Says:

    I believe you’re incorrect about scoping in R, as the following example shows:

    > f <- function(x) { y g f(4)
    Error in g(x) : object ‘y’ not found

    As in most languages, it’s possible to create global variables in R, which is what your example shows. However, functions effectively use lexical scope, if you define that as ‘called functions won’t accidentally see my variables’.

    Personally I *love* the R language. I know there’s a lot of talk about redesigning it or replacing it somehow, but I’m skeptical that it’s a good idea.

    • lingpipe Says:

      Thanks. I updated the body of the blog post to point to the comments.

      I think the function definition got garbled somehow (or maybe it’s just an unfamiliar R syntax convention).

  3. Radford Neal Says:

    You’re wrong about R’s scoping rules. It uses lexical scoping.

    Here’s an example demonstrating this:

    > f = function ()
    + { g = function () a+b
    + a = 10
    + g()
    + }
    >
    > h = function ()
    + { a = 100
    + b = 200
    + f()
    + }
    >
    > b = 3
    > print(h())
    [1] 13

    The expression a+b is evaluated with b from the global environment, and a from the lexically enclosing environment of g. The b inside h is not seen even though with dynamic scoping it would take precedence over the global b.

  4. Rob V. Says:

    Looks like you’ve tripped over lambda calculus and closures, things that are extremely common in many languages (particularly functional languages) but NOT in the world of Java and C derivatives. This is one of the best features of Javascript, in my opinion far more useful than the prototyping that gets more attention. And one of the most obvious shortcomings in Java (although generics was a nice alternative that reduced the need for closures in some cases). Even Java’s granddaddy, Smalltalk, has these features. Perhaps the confusion (between your interpretation of the problem and Radford’s) stems from something akin to Javascripts slightly flawed implementation of closures whereby variables in the topmost scope are actually global but all other variables are properly scoped.

    • lingpipe Says:

      Ironic, given that I used to teach programming language theory and write about denotational semantics! And I got my feet wet in professional programming by integrating the C implementation of Javascript (ECMAScript, technically) into SpeechWorks’s semantic interpreter!!!

      As you say, there’s really nothing like a closure in C or Java. About as close as I get is writing search algorithms with a continuation-passing style.

  5. Ken Williams Says:

    Here’s an even simpler example:

    > f <- function(x) { y g f(4)
    Error in g(x) : object ‘y’ not found

  6. Nick Says:

    Super-simple example of lexical scoping in R:

    > x g f <- function() {x f()
    [1] “A”

    If R was dynamically scoped, the ‘x’ in g() would take its value from the calling environment, where it is ‘B’. However, because R is lexically scope, it comes from the environment where g() is defined, where it is ‘A’.

  7. Nick Says:

    > This is also why I’m still unclear about Radford’s example, becuase the a=10
    > was part of the environment when g() was called in h, but b=200 was not part
    > of the environment when f() was called in h.

    The difference is that a=10 is part of the environment where g() was DEFINED in f. But the b=200 is not part of the environment where f() is DEFINED. That unbound variables take their values from the defining, rather than calling, environment is what makes R (and most other languages) lexically scoped.

  8. Nick Says:

    > This shows it can’t be the value of j at the time f is defined, because
    > it changes when I change j later. I think it’s actually determining how
    > it’s going to find j when it’s defined.

    Right. This example is no more mysterious than referencing an instance variable in java. If the variable’s value is changed, then subsequent references will see this change. In your example, f() and j are defined in the same environment. This is where the free variable j in f() is bound. When you change j’s value in that environment, f() picks it up.

    • lingpipe Says:

      Thanks for the explanation in the previous comment.

      Java’s bit more restrictive. For instance, you can’t copy the R style and write:

          interface Foo { public int foo(); }
      
          public static void main(String[] args) {
              Foo f = new Foo() {
                      public int foo() {
                          return a;
                      };
                  };
              int a = 10;
              System.out.println(f.foo());
          }
      

      You have to declare the variable a to be a static class variable, or you have to define a local variable before the anonymous inner class and declare it final.

      And there’s no way to do the equivalent of R’s attaching a list, which promotes a data structure to local variable. Turns out that doesn’t quite work the way I was thinking it did in R, either. For instance,

      > f = function() { a }
      > f()
      Error in f() : object 'a' not found
      > a = 12
      > f()
      [1] 12
      > b = list(a = 5)
      > attach(b)
      
              The following object(s) are masked _by_ .GlobalEnv :
      
               a 
      
      > f()
      [1] 12
      

      but it works if there’s not already a value.

      > k = function() { m }
      > k()
      Error in k() : object 'm' not found
      > j = list(m = 5)
      > attach(j)
      > k()
      [1] 5
      > m = 10
      > k()
      [1] 10
      > attach(j)
      
              The following object(s) are masked _by_ .GlobalEnv :
      
               m 
      
      
              The following object(s) are masked from j ( position 3 ) :
      
               m 
      
      > k()
      [1] 10
      
  9. lingpipe Says:

    From Christian Robert’s latest blog post on R, Simply Start Over and Build Something Better, I found this amazing snippet:

    One of the worst problems is scoping. Consider the following little gem.

        f =function() {
          if (runif(1) > .5)
            x = 10
          x
        }
    

    The x being returned by this function is randomly local or global.

    Cool!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s