I was looking at this example: https://onlinecourses.science.psu.edu/stat500/node/59
and I got to the point where they use $\beta_0=-222$ My question is where did they get those parameters, since they state the $\beta_0$ is when $X=0$ but since it is not in the scope of the scatter plot then they just mention that "it is not much of interest," does that mean that we can pick another value for our constant term?
No that is the intercept term ( the value of Y when X=0 ). What they mean is that it is the prediction for Y when X=0 which is never thus it has no interpretation by itself and this is of little interest because X can be = 0.0
you can not pick another term as the value of B1 is conditional on B0 as they were simultaneously estimated.
prediction when x=0 is outside the range of experimentation and thus can be suspect. http://autobox.com/dave/OUTSIDE.png
EDITED AFTER QUESTION...
If B0 is set 10.0 for example then minimize (Y(i)-B1 * X(i) -10.0)**2 . This can be directly solved by subtracting 10. from each value of Y to get YY and then use a regression package to estimate the model YY=B1*X essentially requiring the regression equation to go through the origin . Good software like AUTOBOX , a piece of software that I helped to develop allows that feature/constraint while (nearly!) everybody else does not.