by mdo
Last Updated September 27, 2018 21:19 PM

Imagine a cube filled with air and ~5000 levitating potatoes. The distance between two poatatoes are usually at least an order of magnitude larger than their volume and their distribution is pretty random but not normal. I don't know how many potatoes are in the cube nor their sizes but I have two methods for measuring it: the good but expensive reference method A, and method B which I'm developing. Both methods have given me the x-, y- and z-position of each potato that it found along with it's size. Now I'd like to correlate the two methods and answer the following questions:

- How well does method B localize and estimate the size of each potato?
- Which method would be appropriate to determine the overall "goodness" of method B? (these two must be standard statistical measures, I'm simply not very experienced in statistics)
- Given a result using method B, how do I calculate what the result would be for method A? I'm looking for some kind of conversion formula. Primarily for averaged properties like density, but preferably on the level of each individual potato. Or if I have a histogram of potato sizes vs number of finds by method B, can I translate that to what method A would yield?

Difficulties

- Method A and B doesn't find the same number of potatoes. In fact, method B finds about twice as many. The reason for this could be that these potatoes can be very irregular in shape. Whichever the reason is the comparison would have to be able to compare a possible group of potato finds in set B to one in set A.
- The resolution of method A is better. That means that method B might find only one potato where method A recognises it's several smaller potatoes (the opposite to the case above). Also, method B will be more likely to over estimate the size of smaller potatoes.
- (Bonus: the sets can be translated and rotated <90 degrees. We have a crude method for sorting that out before the analysis starts, but any input on how we could deal with that is appreciated. I'm thinking about solving the translation part with the center of gravity for each set.)

Do I have to start with some method to determine if several close potatoes are one or several "real" potatoes, before matching the sets? Is there a "standard" way to do all of this?

If you lack any information please ask. I might very well not realize that it's needed to solve the problem. And do edit the tags/title if it's needed because I'm not very familiar with proper terminology.

Will method B has a measurement error (variance) that is about constant over the space? For a start, lets assume that. Then, does it have a somewhat constant bias (translation) over the space? Also assuming that for a start.

So, to get a starting point. Write $B_{[i]}$ for the set of measured points (method B) assigned in some way to correspond to measured point $A_i$. Let $\text{pos}(B_{[i]}) = \text{pos}(A_i) + \epsilon_{[i]}$ with $\epsilon$ having some distribution with mean $\mu$ and variance $\sigma^2$. Choose some grid of values for $mu$ (a three-vector), and for each, translate the B-point accordingly, and assign them to A-points in the voronoi cell (https://en.wikipedia.org/wiki/Voronoi_diagram) surrounding it. Then for each $\mu$, find the sum of squared distances from the A-point to its set of B-points.

This only as an idea, a point of departure.

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger