Probability threshold and signal/noise ratio

by Nikaidoh   Last Updated March 06, 2018 18:19 PM

I'm working on a classification problem in which the underlying signal to identify is very hard to find.

I suppose that this is because the signal/noise ratio is very low.

My questions are fundamentally linked to my precedent question here:

1) Is there a way to calculate\estimate the signal/noise ratio? Given a set of features, and classes, for example?

2) If my signal/noise is very low, will increasing my training set size augment the signal/noise ratio? Could be also that the difficulties are bound to the features and to the type of problem? In which case is useless to increase the training set?

3) Concerning my precedent question on the threshold used for classify in neural network, a user (elkoul) said:

Usually lowering the threshold takes place when you care a lot about the metric called recall. For instance you develop an algorithm to detect terrorists in an airport and you want to find them all, even though some times you might identify normal people as terrorists. The metric that you are interested in this case is recall. On the other hand, increasing the threshold takes place when you care a lot about precision.

In my specific case, lowering the threshold increase the recall, but on the opposite, increasing the threshold don't increase the precision, leading to near or totally random results. (my false positive rate is not improving\lowering)

My AUROC(area under the Receiver Operating Characteristic Curve) on the problem is between 0.5 to 0.57 using different nets from a grid search. If my precision is not increasing with the threshold shifting, does this mean that I'm not learning anything?

Related Questions