I have been a R user for a long time (>20 years) and I have long been confused by the behavior of linear model objects generated by lm. The standard help file in R is not very helpful, so this week I put together an example code that hopefully explains this better. I am posting it here in case others find it useful:
# Part 1: Generate random data containing a linear relationship with noise x = 1:100 + rnorm(100,sd=10) # x positions, scattered m = 2 # input slope b = 30 # input intercept y = m*x + b + rnorm(100,sd=30) # y positions, with added scatter # plot the data points we just generated plot(x,y) # regress y against x by generating a linear model. # Note that the variable names and order matter here. fit = lm(y~x) # Let's see how our prediction match the inputs print(paste("slope: ",fit$coefficients[2])) print(paste("intercept: ",fit$coefficients[1])) # generate new set of x data to predict newx = -20:120 # predict y values. # Note that we are assigning newx to x inside a data frame. # This is important! If we just passed 'newx' to predict it # doesn't work. newy = predict(fit,data.frame(x=newx)) # add best-fit line to plot. lines(newx,newy) # Let's do this again with confidence interval for the best-fit line # predict returns a matrix with named columns. # I wrap this in data.frame() to coerce the output to a data frame. newy = data.frame(predict(fit,data.frame(x=newx),interval="confidence")) # Note that the default level of confidence is 95%. # For 99% confidence, as an example, add level=0.99 to the predict input. # plot all three lines (confidence as dashed, all in blue) lines(newx,newy$fit,col=4) lines(newx,newy$lwr,col=4,lty=2) lines(newx,newy$upr,col=4,lty=2) # We can also plot the prediction interval # (the predicted scatter of the data around our line): newy = data.frame(predict(fit,data.frame(x=newx),interval="prediction")) # Plot these as short dashed lines lines(newx,newy$lwr,col=4,lty=3) lines(newx,newy$upr,col=4,lty=3)