by Simon
Last Updated November 17, 2017 12:19 PM

I have two data sets of variables where one of them - the new observations - has no dependent variable. The data set *without* a dependent variable has around 20 times the number of records. Modeling/training the first data set is fairly simple and provides reasonable results in order to score the new observations. I receive new instances of the two data sets periodically. In previous iterations of the data sets they both had very similar variable means, stand. dev., medians, and 25%/75% quantiles. But in the most recent instance they are very different.

Are there any methodologies for scoring/predicting new observations with situations as the above? Currently I'm modeling with a generalized linear model.

My initial thought is to weight the new observations in some fashion (haven't thought long enough about the best way, or even if it could be cheating in some way). Another idea is to sample from the new observations based off of distributions determine from the the data set with the dependent variable. Anyways, I'm curious if others know of any literature or key words to look for when googleing.

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger