Residual Containment Approach (RCA)

Residual Containment Approach (RCA)

Residual Containment Approach (RCA) Protocol

Residual Containment Approach (RCA)

A Variation on the Traditional Sales Comparison Approach for Appraisal

Introduction and Overview

by Bert Craytor, Pacific Vista Net, August 6, 2022
Pacifica, CA

I introduced a new method for the sales comparison approach in 2020 and 2021 that promises far higher accuracy, objectivity, and reliability than current methods commonly employed by appraisers.  Note that I have used the method under several different names, such as the Subjective Value or Intangible Value Containment Approach to appraise properties in the San Francisco Bay Area for IRS and Lending purposes.  The new designation is intended to be more meaningful.  Instead of “intangible” or “subjective” attributes for condition, quality, functional utility, view, and the like, it is more precise to use the term “unmeasured” attributes, as this technique incorporates the value of all unmeasured attributes through regression residuals.  I, therefore, suggest that “Residual Containment Approach” is the most concise and correct term.

This paper is intended to be a short, non-mathematical introduction to the Residual Containment Approach.  

The method can be described as a workflow that includes:

  1. Focusing on the determination of the value contribution of all features that add to the value of a property, rather than the classical direct determination of adjustment values through matched pairs or normal regression techniques.  Adjustment values are simply the difference between the contribution values of the subject and the comparables.   Yet adjustment values themselves easily bypass safety constraints.  Contribution values must add up to the sale price.  Therefore, it is certainly safer to calculate the contribution values before calculating adjustments.
  2. Gathering many sales comparables from the neighborhood or market area. This may include a town or distinct area of a larger city.  You should try to stay within a 3-mile radius, but that of course depends on the property type.  You should try to get at least 80 or more comparables that have features that are likely to at least partially overlap with each other and the subject.  My average number of comparables is 125-180 but may go up to 600.  You may also have to go back 5 to 15 or more years to get a useful number of sales and ensure you have a good sampling of features that you find in the subject property.   A subject with a rare combination of feature values may require going back in time by 10 or more years.  There are also retrospective appraisals.  The date of sale for a comparable or the subject property can always be adjusted for.   However, understand that the further back in time you go, the more factors there are in price changes.   If you are going back more than 10 years, you are likely to discover that buyer tastes have changed in that period, which creates an interaction between the date of sale and other features.  You will then need to use two-way interactions, which increases the complexity of the final regression model.
  3. Separation of measured and unmeasured attributes.   Regression is run only on measured attribute values.  The valuation of unmeasured attributes is done through residuals.
  4. Construction of a regression model based on measured attribute values such as GLA, lot size, date of sale, stories, room counts, GIS coordinates, and age (actual or effective), using a highly accurate regression technique such as multivariate adaptive regression splines (MARS). For this, the open source “earth” package of R is recommended along with the “caret” package.   This can be run in the open-source R Studio with parallel processing.    I currently run this on an AMD Ryzen 9 5950 desktop typically using about 28 out of 32 cores, where caret spins off 28 +/- separate instances of R Studio for parallel processing of multiple runs of earth. 
  5. Creation of a residual model based on the difference between the estimates provided by the regression model and the actual sale prices of the sales comparables. This model is two things:

    1. A function that maps a residual score of say 0.00 to 10.00 to a residual value. You get this function by ranking the sales comparables by residual value from greatest to smallest, then creating a score based on the percentage of sales less than a given value.  The function can be created by running a regression on these two sets of values or simply creating a lookup array that maps a particular score to a value.  You need to write a program in R to do this.  Using this function, you can then enter the subject’s estimated residual score to get an estimate of its residual value.  That is the key to estimating the sale price of the subject.   Note:  This step can be fully automated in R.

    2. Allocation of the residual value for each sale comparable and the subject property to the unmeasured attributes. The residual gives the total value of all unmeasured attributes. Typically, the user of the report will want something more meaningful.  The residuals need to be split between the various unmeasured attributes, such as Quality, Condition, Functional Utility, View, and whatever other unmeasured attributes you think are valuable.   The appraiser only needs to do this for those comparables going into the Sales Grid.  That is a good thing because it takes some work and time to do this.  Note that this is a manual process that requires knowledge of appraisal.    Keep in mind, that as far as the final value conclusion is concerned, it is only the total value that has any impact.  How you split that total residual value only impacts your communication to the report reader as to whey the sales comparable sold for more or less than the subject and other comparables.    This step is where the “constraint” of the method enters the picture.  You are assigning value to the unmeasured attributes, – under the constraint that all such residual component values for a particular comparable must add up to the corresponding residual value.  If you want to add extra value for Condition, then you must take that value away from one or more of the other attributes. 

      Note:  This step cannot be fully automated.  At least not anytime soon, – such as in the next decade.  However, how good of a job you do in this step will not impact the value conclusion – only the quality of your explanation of why one property sold for more or less than another.  You can shortcut this process by estimating unmeasured values based on percentages if you lack time.  For example, you might automatically assign 30% of the residual to Quality, 30% to Condition, 20% to Functional Utility, and 20% to View and then tweak individual sales (or the subject) based on exception.  More likely, however, you will find the task more complicated as the residual can be composed of both negative and positive values.  For example, if the residual for a comparable is $50K, then perhaps $75K goes to Condition, and – $25K goes to View.    Again, you could automate the initial allocation based on percentages and then tweak the allocation based on deviations from the norm.

  6. Recognition that a regression model with a significantly high R2 value will provide residual values that correlate to the value of the unmeasured attributes. That is to say, a property that sells for more than expected or indicated by the model based on measured attributes, but has unmeasured attributes that are more valuable than the average property.

  7. Ranking comparables by residual value from large to small, then corresponds to the value of the unmeasured attributes. The ranking can provide percentage scores, for example, 0.00 to 10.00, where for example a score of 6.25 would indicate that 62.5% of the sales comparables had a lower residual score.   You could use scores that go from 0.0 to 100.0, but I find that 0.00 to 10.00 is more convenient.

  8. Ranking subject property against the ranked list of comparable residuals, to find which two properties best fits between based on unmeasured attributes. This then would indicate a score somewhere between the better and lesser (in terms of residual) properties.  A rough estimate can be used.  This score can then be reverse mapped to a residual value and that value then used as the subject’s residual value.

  9. The estimated residual value for the subject property is then added to the estimate from the regression model based on its measured attributes to arrive at an indicated value.

For example, the subject may fit between a comparable with a score of 6.2 and one with a score of 6.3, so you assign a score of 6.25 to the subject.  From this score, you find the corresponding residual value for the subject is $120K.  Comparable 1 has a residual value of $80K, so the difference is +$40K which is your total adjustment for all unmeasured features in Comparable 1.   Of this, you allocate $20K to Condition, $10K to View, and $10K to Functional Utility.  And note that in this case Comparable 1 is inferior to the Subject, with respect to unmeasured attributes.

If we take the output of the above workflow and place it in a sales grid,  calculating each attribute adjustment as   

              Attribute Adjustment = Subject Attribute Value Contribution – Sales Comparable Value Contribution,

we will obtain all adjustments, except that the total adjustment value for all unmeasured attributes will be under a single residual adjustment.  However, we can backup and first split the residual contribution values among the various unmeasured attributes before calculating the adjustments.  How we split the residual adjustment will not impact the final value outcome (or in other words the average of all Adjusted Sale Price), as long as the individual values still add up to the residual total for each sale comparable.  As a shortcut, we could also just split the residual adjustment between the sales comparable unmeasured attributes, although that risks adjustments that are likely to be somewhat off-base, although such errors would not impact the adjusted sale prices.

The end-result will be that all sales comparables will have exactly the same adjusted sales price.  And if we average those values, we will have the same value as obtained by step 8 above.  

One might ask – Why not just throw away the sales grid and use the value obtained from step 8 above?  The best reason is that the sales grid provides grounded support for the value conclusion that the user can review.  It provides reasoning for why the individual sales comparables sold for less or more than the value conclusion for the subject property.


  1. Question: How do we know that the ranking of the sales comparables by residual value will correspond to the value contribution of the various unmeasured attributes?

    Answer:   This depends on how well the regression model for the measured attributes is.   A high R2 value of 70-80%+ ensures that 70-80% of the value deviations between the comparables is accounted for by the regression model of measured attributes. 20-30% must by necessity be due to the unmeasured attributes and errors.  The errors are assumed to be random over the comparables and their features.  Since regression is averaging the impact of these features, the impact of the error on the regression model is likely to be unbiased with respect to measured attributes.

    It is therefore to be understood that the effectiveness of this method depends on using a highly accurate and refined regression technique such as multivariate adaptive regression splines (MARS).   R/earth is recommended as a free open-source software package for this purpose.

    Note also, that just using an advanced and accurate regression method does not guarantee the quality of the result.   Using MARS regression requires a good deal of experience, knowledge, and skill.   Even a high R2 is not a guarantee of quality, as it could be the result of overfitting.   Each model must be reviewed for overfitting and whether it makes sense in real-world terms.
  2. Question: Does the above Q-A mean that this method will be difficult to automate?

    Answer:  Automation is absolutely needed because the above workflow involves a number of smaller steps that have to be done in a consistent and highly accurate fashion. 

    However, the construction of a regression model is an art, that requires not only knowledge of the regression tool being used, but also good knowledge and experience in appraisal.  However,  automation only goes so far.   Every model or set of value equations kicked out by MARS must be reviewed by the analyst/appraiser who knows what to look for.  Many models will likely be generated and reviewed by altering the parameters to MARS and then correcting or otherwise modifying data in order to find a suitable model that will provide accurate valuation results.

    While it should be possible to design a fully automated system in the future, that is likely a ways off.  As a result this method, while promising more accuracy and objectivity in valuation, is also more time consuming and requires more skill.

  3. Question: How robust is this method against appraiser bias?

    Answer:  This method really has only one spot where bias can be introduced and that is the placement of the subject property in the residual ranking of the sales comparables.  If the model is not overfitted, the data is reasonably accurate, and the R2 value is about 70% or higher, then bias in placement will likely be obvious.  Placement is more difficult in the broad ranking area around the average, between 10% and 90% – where the residual value vs score curve is typically not very steep.  Placement errors in this part of the residual ranking do not typically impact value that much.   However, the residual curve tends to be steep at both the low and high ends, the lower and higher residual value properties.  In this part of the spectrum, the differences in quality and other unmeasured attributes are so great that placing the subject too low or too high in the ranked list should be obvious.  In any case, the reviewer knows where to look for signs of bias.  It is not going to be a question of trying to find a needle in a haystack.  If questions arise, the next step would be to review the regression model graphs and equations.

  4. Question: “How do subjective parameters, such as quality, condition, and location/view get input into the system? And if they are not manually input, does the data exist in today’s MLS systems with enough granularity/depth to automate the input of these factors? (Like granite vs Corian vs laminate countertops, or 10′ vs 8′ ceilings).”  (From

    Answer: These features you mentioned are just “unmeasured” features. We do not have measures for them. At least not good enough to be useful. If we did, we would simply input them into the regression along with GLA, Lot Size, Room Count, and so on.

    So, what we do is input all measured features into the regression and it kicks out a model that estimates the sale price based on these inputs. Of course, the model is not perfect. At best, it may account for about 80% of the price variance. That leaves about 20%, called the “residual”. (The “residual” in this context is the “comparable sale price” – “regression estimated sale price” !!)   That 20% is the value of all the unmeasured values lumped together plus errors from incorrect data. Most of that residual, however, is going to be for the attributes that didn’t go into the regression, mostly condition, quality, and view.

    Now, this is the critical point: We really don’t need to value each of those unmeasured features separately, because as it turns out, we only need their total value to get a value conclusion for the subject. You can however split the residual between the different unmeasured attributes to complete your explanation of why the sale prices differ between a given sales comparables and the subject and other comparables. How you do the split has no impact on the adjusted sale price and, by transfer (through averaging) the indicated value for the subject. Since all of your adjusted sale prices will be exactly the same, weighting isn’t going to serve any purpose. The whole issue of dealing with unmeasured attributes using residuals is a subject in itself. You don’t need to be overly concerned with it – as it has no impact on value, only on your support for the given value conclusion, that is to say, your ability to explain exactly why some property sold for more or less than another.
    There is a second critical point: Say you do the regression on 120 sales comparables. then you turn around and create estimates based on the measured attributes, and from the comparables sales prices and those estimates, the residuals. Then you rank the comparables based on their residuals, largest to smallest. If you have a decent regression with an R2 of 70% or higher, avoiding overfitting, then you will almost certainly find that the ranking of the comparables will put the better condition, quality and view comparables at the top of the list —- because they sold for more than expected. So you can use that ranking to score the comparables in terms of those unmeasured attributes. You can map the scores to the residuals to get a “residual model”. With a residual model, you can enter any such score and it will then give you the estimated residual. So, you take the subject property and find where it best fits in the ranking. Its score is the average of the two best properties it fits between. From the score, get the residual. Add the subject’s estimated residual to the regression estimate to get an estimated sale price. … Or go to the trouble to subtract each comparable residual from the subject residual to get a residual adjustment, calculate all the adjusted sale prices — they will all be the same — and they will be exactly equal to the subject’s aforementioned estimated sale price.

    Note: If the comparables are ranked by residual but you don’t see that the larger residuals relate very well to the unmeasured attributes — then you can be pretty sure your regression has a low R2 or is overfitted. This makes it imperative to use a high-quality regression technique such as MARS – and to know what you are doing.

    Finally:  A short answer to your question is that this method provides a fairly exact measure for your subjective variables — only it is a value for ALL of them.  – But that is good enough to arrive at a value conclusion.


Translate »