| Title: | Fast Analysis of ROC Curves |
|---|---|
| Description: | A toolkit for analyzing classifier performance by using receiver operating characteristic (ROC) curves. Performance may be assessed on a single classifier or multiple ones simultaneously, making it suitable for comparisons. In addition, different metrics allow the evaluation of local performance when working within restricted ranges of sensitivity and specificity. For details on the different implementations, see McClish D. K. (1989) <doi:10.1177/0272989X8900900307>, Vivo J.-M., Franco M. and Vicari D. (2018) <doi:10.1007/S11634-017-0295-9>, Jiang Y., et al (1996) <doi:10.1148/radiology.201.3.8939225>, Franco M. and Vivo J.-M. (2021) <doi:10.3390/math9212826> and Carrington, André M., et al (2020) <doi: 10.1186/s12911-019-1014-6>. |
| Authors: | Pablo Navarro [aut, cre, cph], Juana-María Vivo [aut], Manuel Franco [aut] |
| Maintainer: | Pablo Navarro <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.0.9000 |
| Built: | 2026-05-18 11:35:48 UTC |
| Source: | https://github.com/pablopnc/rocngo |
Plot chance line in a ROC plot.
add_chance_line()add_chance_line()
A ggplot layer instance object.
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_chance_line()plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_chance_line()
Calculate and plot lower bound defined by FpAUC sensitivity index.
add_fpauc_lower_bound() provides an upper level function which
automatically calculates curve shape and plots a lower bound that better fits
it.
add_fpauc_partially_proper_lower_bound() and
add_fpauc_concave_lower_bound() are lower level functions that enforce the
plot of specific bounds.
First one plots lower bound when curve shape is partially proper (presents some kind of hook). Second one plots lower bound when curve shape is concave in the region of interest.
add_fpauc_partially_proper_lower_bound( data, response = NULL, predictor = NULL, threshold, .condition = NULL, .label = NULL ) add_fpauc_concave_lower_bound( data, response = NULL, predictor = NULL, threshold, .condition = NULL, .label = NULL ) add_fpauc_lower_bound( data, response = NULL, predictor = NULL, threshold, .condition = NULL, .label = NULL )add_fpauc_partially_proper_lower_bound( data, response = NULL, predictor = NULL, threshold, .condition = NULL, .label = NULL ) add_fpauc_concave_lower_bound( data, response = NULL, predictor = NULL, threshold, .condition = NULL, .label = NULL ) add_fpauc_lower_bound( data, response = NULL, predictor = NULL, threshold, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
threshold |
A number between 0 and 1, inclusive. This number represents the lower value of TPR for the region where to calculate and plot lower bound. Because of definition of |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
A ggplot layer instance object.
# Add lower bound based on curve shape (Concave) plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_fpauc_lower_bound( data = iris, response = Species, predictor = Sepal.Width, threshold = 0.9 )# Add lower bound based on curve shape (Concave) plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_fpauc_lower_bound( data = iris, response = Species, predictor = Sepal.Width, threshold = 0.9 )
Include a threshold line on an specified axis.
add_fpr_threshold_line(threshold) add_tpr_threshold_line(threshold) add_threshold_line(threshold, ratio = NULL)add_fpr_threshold_line(threshold) add_tpr_threshold_line(threshold) add_threshold_line(threshold, ratio = NULL)
threshold |
A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve. If If |
ratio |
Ratio in which to display threshold.
|
A ggplot layer instance object.
# Add two threshold line in TPR = 0.9 and FPR = 0.1 plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_threshold_line(threshold = 0.9, ratio = "tpr") + add_threshold_line(threshold = 0.1, ratio = "fpr") # Add threshold line in TPR = 0.9 plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_tpr_threshold_line(threshold = 0.9) # Add threshold line in FPR = 0.1 plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_fpr_threshold_line(threshold = 0.1)# Add two threshold line in TPR = 0.9 and FPR = 0.1 plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_threshold_line(threshold = 0.9, ratio = "tpr") + add_threshold_line(threshold = 0.1, ratio = "fpr") # Add threshold line in TPR = 0.9 plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_tpr_threshold_line(threshold = 0.9) # Add threshold line in FPR = 0.1 plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_fpr_threshold_line(threshold = 0.1)
Add an specific region of a ROC curve to an existing ROC plot.
add_partial_roc_curve( data, response = NULL, predictor = NULL, ratio, threshold, .condition = NULL, .label = NULL )add_partial_roc_curve( data, response = NULL, predictor = NULL, ratio, threshold, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
ratio |
Ratio or axis where to apply calculations.
|
threshold |
A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve. If If |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
A ggplot layer instance object.
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_partial_roc_curve( iris, response = Species, predictor = Sepal.Length, ratio = "tpr", threshold = 0.9 )plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_partial_roc_curve( iris, response = Species, predictor = Sepal.Length, ratio = "tpr", threshold = 0.9 )
Add points in a specific ROC region to an existing ROC plot.
add_partial_roc_points( data, response = NULL, predictor = NULL, ratio, threshold, .condition = NULL, .label = NULL )add_partial_roc_points( data, response = NULL, predictor = NULL, ratio, threshold, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
ratio |
Ratio or axis where to apply calculations.
|
threshold |
A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve. If If |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
A ggplot layer instance object.
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_partial_roc_points( iris, response = Species, predictor = Sepal.Length, ratio = "tpr", threshold = 0.9 )plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_partial_roc_points( iris, response = Species, predictor = Sepal.Length, ratio = "tpr", threshold = 0.9 )
Add a ROC curve to an existing ROC plot.
add_roc_curve( data, response = NULL, predictor = NULL, .condition = NULL, .label = NULL )add_roc_curve( data, response = NULL, predictor = NULL, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
A ggplot layer instance object.
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_roc_curve(iris, response = Species, predictor = Sepal.Length)plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_roc_curve(iris, response = Species, predictor = Sepal.Length)
Add ROC points to an existing ROC plot.
add_roc_points( data, response = NULL, predictor = NULL, .condition = NULL, .label = NULL )add_roc_points( data, response = NULL, predictor = NULL, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
A ggplot layer instance object.
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_roc_points(iris, response = Species, predictor = Sepal.Length)plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_roc_points(iris, response = Species, predictor = Sepal.Length)
Calculate and plot lower bound defined by TpAUC specificity index.
add_tpauc_lower_bound() provides a upper level function which
automatically calculates curve shape and plots a lower bound that better fits
it.
Additionally, several lower level functions are provided to plot specific lower bounds:
add_tpauc_concave_lower_bound(). Plot lower bound corresponding to a ROC
curve with concave shape in selected region.
add_tpauc_partially_proper_lower_bound. Plot lower bound corresponding to
a ROC curve with partially proper (presence of some hook) in
selected region.
add_tpauc_under_chance_lower_bound. Plot lower bound corresponding to
a ROC curve with a hook under chance line in selected region.
add_tpauc_concave_lower_bound( data, response = NULL, predictor = NULL, lower_threshold, upper_threshold, .condition = NULL, .label = NULL ) add_tpauc_partially_proper_lower_bound( data, response = NULL, predictor = NULL, lower_threshold, upper_threshold, .condition = NULL, .label = NULL ) add_tpauc_under_chance_lower_bound( data, response = NULL, predictor = NULL, lower_threshold, upper_threshold, .condition = NULL, .label = NULL ) add_tpauc_lower_bound( data, response = NULL, predictor = NULL, lower_threshold, upper_threshold, .condition = NULL, .label = NULL )add_tpauc_concave_lower_bound( data, response = NULL, predictor = NULL, lower_threshold, upper_threshold, .condition = NULL, .label = NULL ) add_tpauc_partially_proper_lower_bound( data, response = NULL, predictor = NULL, lower_threshold, upper_threshold, .condition = NULL, .label = NULL ) add_tpauc_under_chance_lower_bound( data, response = NULL, predictor = NULL, lower_threshold, upper_threshold, .condition = NULL, .label = NULL ) add_tpauc_lower_bound( data, response = NULL, predictor = NULL, lower_threshold, upper_threshold, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
lower_threshold, upper_threshold
|
Two numbers between 0 and 1, inclusive. These numbers represent lower and upper values of FPR region where to calculate and plot lower bound. |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
A ggplot layer instance object.
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_tpauc_lower_bound( data = iris, response = Species, predictor = Sepal.Width, upper_threshold = 0.1, lower_threshold = 0 )plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_tpauc_lower_bound( data = iris, response = Species, predictor = Sepal.Width, upper_threshold = 0.1, lower_threshold = 0 )
Calculates area under curve (AUC) of a predictor's ROC curve.
auc(data = NULL, response, predictor, .condition = NULL)auc(data = NULL, response, predictor, .condition = NULL)
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
A numerical value representing the area under ROC curve.
# Calc AUC of Sepal.Width as a classifier of setosa species auc(iris, Species, Sepal.Width) # Change class to predict to virginica auc(iris, Species, Sepal.Width, .condition = "virginica")# Calc AUC of Sepal.Width as a classifier of setosa species auc(iris, Species, Sepal.Width) # Change class to predict to virginica auc(iris, Species, Sepal.Width, .condition = "virginica")
calc_curve_shape() calculates ROC curve shape over a specified region.
calc_curve_shape( data = NULL, response = NULL, predictor = NULL, lower_threshold, upper_threshold, ratio, .condition = NULL )calc_curve_shape( data = NULL, response = NULL, predictor = NULL, lower_threshold, upper_threshold, ratio, .condition = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
lower_threshold, upper_threshold
|
Two numbers between 0 and 1, inclusive. These numbers represent lower and upper bounds of the region where to apply calculations. |
ratio |
Ratio or axis where to apply calculations.
|
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
A string indicating ROC curve shape in the specified region. Result can take any of the following values:
"Concave". ROC curve is concave over the entire specified region.
"Partially proper". ROC curve loses concavity at some point of the
specified region.
"Hook under chance". ROC curve loses concavity at some point of the
region and it lies below chance line.
# Calc ROC curve shape of Sepal.Width as a classifier of setosa species # in TPR = (0.9, 1) calc_curve_shape(iris, Species, Sepal.Width, 0.9, 1, "tpr") # Change class to virginica calc_curve_shape(iris, Species, Sepal.Width, 0.9, 1, "tpr", .condition = "virginica")# Calc ROC curve shape of Sepal.Width as a classifier of setosa species # in TPR = (0.9, 1) calc_curve_shape(iris, Species, Sepal.Width, 0.9, 1, "tpr") # Change class to virginica calc_curve_shape(iris, Species, Sepal.Width, 0.9, 1, "tpr", .condition = "virginica")
Calculates a series pairs of (FPR, TPR) which correspond to ROC curve points in a specified region.
calc_partial_roc_points( data = NULL, response = NULL, predictor = NULL, lower_threshold, upper_threshold, ratio, .condition = NULL )calc_partial_roc_points( data = NULL, response = NULL, predictor = NULL, lower_threshold, upper_threshold, ratio, .condition = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
lower_threshold, upper_threshold
|
Two numbers between 0 and 1, inclusive. These numbers represent lower and upper bounds of the region where to apply calculations. |
ratio |
Ratio or axis where to apply calculations.
|
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
A tibble with two columns:
"tpr". Containing "true positive ratio", or y, values of points within the specified region.
"fpr". Containing "false positive ratio", or x, values of points within the specified region.
# Calc ROC points of Sepal.Width as a classifier of setosa species # in TPR = (0.9, 1) calc_partial_roc_points( iris, response = Species, predictor = Sepal.Width, lower_threshold = 0.9, upper_threshold = 1, ratio = "tpr" ) # Change class to virginica calc_partial_roc_points( iris, response = Species, predictor = Sepal.Width, lower_threshold = 0.9, upper_threshold = 1, ratio = "tpr", .condition = "virginica" )# Calc ROC points of Sepal.Width as a classifier of setosa species # in TPR = (0.9, 1) calc_partial_roc_points( iris, response = Species, predictor = Sepal.Width, lower_threshold = 0.9, upper_threshold = 1, ratio = "tpr" ) # Change class to virginica calc_partial_roc_points( iris, response = Species, predictor = Sepal.Width, lower_threshold = 0.9, upper_threshold = 1, ratio = "tpr", .condition = "virginica" )
Concordance derived indexes allow calculation and explanation of area under ROC curve in a specific region. They use a dual perspective since they consider both TPR and FPR ranges which enclose the region of interest.
cp_auc() applies concordan partial area under curve (CpAUC), while
ncp_auc() applies its normalized version by dividing by the total area.
cp_auc( data = NULL, response, predictor, lower_threshold, upper_threshold, ratio, .condition = NULL ) ncp_auc( data = NULL, response, predictor, lower_threshold, upper_threshold, ratio, .condition = NULL )cp_auc( data = NULL, response, predictor, lower_threshold, upper_threshold, ratio, .condition = NULL ) ncp_auc( data = NULL, response, predictor, lower_threshold, upper_threshold, ratio, .condition = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
lower_threshold, upper_threshold
|
Two numbers between 0 and 1, inclusive. These numbers represent lower and upper bounds of the region where to apply calculations. |
ratio |
Ratio or axis where to apply calculations.
|
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
A numeric value representing index score for the partial area under ROC curve.
Carrington, André M., et al. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. BMC medical informatics and decision making 20 (2020): 1-12.
# Calculate cp_auc of Sepal.Width as a classifier of setosa especies in # FPR = (0, 0.1) cp_auc( iris, response = Species, predictor = Sepal.Width, lower_threshold = 0, upper_threshold = 0.1, ratio = "fpr" ) # Calculate ncp_auc of Sepal.Width as a classifier of setosa especies in # FPR = (0, 0.1) ncp_auc( iris, response = Species, predictor = Sepal.Width, lower_threshold = 0, upper_threshold = 0.1, ratio = "fpr" )# Calculate cp_auc of Sepal.Width as a classifier of setosa especies in # FPR = (0, 0.1) cp_auc( iris, response = Species, predictor = Sepal.Width, lower_threshold = 0, upper_threshold = 0.1, ratio = "fpr" ) # Calculate ncp_auc of Sepal.Width as a classifier of setosa especies in # FPR = (0, 0.1) ncp_auc( iris, response = Species, predictor = Sepal.Width, lower_threshold = 0, upper_threshold = 0.1, ratio = "fpr" )
Hide legend showing name of ploted classifiers and bounds in a ROC curve plot.
hide_legend()hide_legend()
A ggplot theme object.
Calculate and plot lower bound defined by NpAUC specificity index.
add_npauc_normalized_lower_bound() allows to plot normalized
lower bound, which is used to formally calculate NpAUC.
add_npauc_lower_bound() is a lower level function
providing a way to plot lower bound previous to normalization.
add_npauc_lower_bound( data, response = NULL, predictor = NULL, threshold, .condition = NULL, .label = NULL ) add_npauc_normalized_lower_bound( data, response = NULL, predictor = NULL, threshold, .condition = NULL, .label = NULL )add_npauc_lower_bound( data, response = NULL, predictor = NULL, threshold, .condition = NULL, .label = NULL ) add_npauc_normalized_lower_bound( data, response = NULL, predictor = NULL, threshold, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
threshold |
A number between 0 and 1, inclusive. This number represents the lower value of TPR for the region where to calculate and plot lower bound. Because of definition of |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
A ggplot layer instance object.
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_npauc_lower_bound( iris, response = Species, predictor = Sepal.Width, threshold = 0.9 )plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_npauc_lower_bound( iris, response = Species, predictor = Sepal.Width, threshold = 0.9 )
Calculates area under curve curve in an specific TPR or FPR region.
pauc( data = NULL, response, predictor, ratio, lower_threshold, upper_threshold, .condition = NULL )pauc( data = NULL, response, predictor, ratio, lower_threshold, upper_threshold, .condition = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
ratio |
Ratio or axis where to apply calculations.
|
lower_threshold, upper_threshold
|
Two numbers between 0 and 1, inclusive. These numbers represent lower and upper bounds of the region where to apply calculations. |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
A numeric value representing the area under ROC curve in the specified region.
# Calculate pauc of Sepal.Width as a classifier of setosa species in # in TPR = (0.9, 1) pauc( iris, response = Species, predictor = Sepal.Width, ratio = "tpr", lower_threshold = 0.9, upper_threshold = 1 ) # Calculate pauc of Sepal.Width as a classifier of setosa species in # in FPR = (0, 0.1) pauc( iris, response = Species, predictor = Sepal.Width, ratio = "fpr", lower_threshold = 0, upper_threshold = 0.1 )# Calculate pauc of Sepal.Width as a classifier of setosa species in # in TPR = (0.9, 1) pauc( iris, response = Species, predictor = Sepal.Width, ratio = "tpr", lower_threshold = 0.9, upper_threshold = 1 ) # Calculate pauc of Sepal.Width as a classifier of setosa species in # in FPR = (0, 0.1) pauc( iris, response = Species, predictor = Sepal.Width, ratio = "fpr", lower_threshold = 0, upper_threshold = 0.1 )
Create a curve plot using points in an specific region of ROC curve.
plot_partial_roc_curve( data, response = NULL, predictor = NULL, ratio, threshold, .condition = NULL, .label = NULL )plot_partial_roc_curve( data, response = NULL, predictor = NULL, ratio, threshold, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
ratio |
Ratio or axis where to apply calculations.
|
threshold |
A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve. If If |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
A ggplot object.
plot_partial_roc_curve( iris, response = Species, predictor = Sepal.Width, ratio = "tpr", threshold = 0.9 )plot_partial_roc_curve( iris, response = Species, predictor = Sepal.Width, ratio = "tpr", threshold = 0.9 )
Create an scatter plot using points in an specific region of ROC curve.
plot_partial_roc_points( data, response = NULL, predictor = NULL, ratio, threshold, .condition = NULL, .label = NULL )plot_partial_roc_points( data, response = NULL, predictor = NULL, ratio, threshold, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
ratio |
Ratio or axis where to apply calculations.
|
threshold |
A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve. If If |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
A ggplot object.
plot_partial_roc_points( iris, response = Species, predictor = Sepal.Width, ratio = "tpr", threshold = 0.9 )plot_partial_roc_points( iris, response = Species, predictor = Sepal.Width, ratio = "tpr", threshold = 0.9 )
Create a curve plot using ROC curve points.
plot_roc_curve( data, response = NULL, predictor = NULL, .condition = NULL, .label = NULL )plot_roc_curve( data, response = NULL, predictor = NULL, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
A ggplot object.
plot_roc_curve(iris, response = Species, predictor = Sepal.Width)plot_roc_curve(iris, response = Species, predictor = Sepal.Width)
Create an scatter plot using ROC curve points.
plot_roc_points( data, response = NULL, predictor = NULL, .condition = NULL, .label = NULL )plot_roc_points( data, response = NULL, predictor = NULL, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
A ggplot object.
plot_roc_points(iris, response = Species, predictor = Sepal.Width)plot_roc_points(iris, response = Species, predictor = Sepal.Width)
This dataset contains gene expression levels obtained from healthy and diseased tissue samples from patients with prostate cancer. The data includes the expression values for each selected gene, as well as clinical variables derived from the direct observation of the tissue samples.
prostprost
A tibble with 554 observations and 2654 variables:
Gene expression levels. Column names correspond to the measured gene identifier.
Score derived from tissue observation, which indicates disease severity and progression.
Categorical variable indicating whether the sample comes from diseased ("1") or healthy ("0") tissue.
Categorical variable indicating whether a poor ("1") or a good ("0") prognosis is expected for the patient. Diseased cases in the dataset are assumed to have a poor prognosis when their Gleason score is equal to or above 8. Non-diseased cases are labelled as "Normal".
Gene identifiers used in columns (e.g. ENSG00000113924,
ENSG00000109182, ...) correspond to Ensembl gene identifiers.
Gene expression values are reported as Transcript Per Million (TPM).
Raw data obtained from The Cancer Genome Atlas (TCGA) through the Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov/). Data have been preprocessed and curated by the authors (e.g., gene selection and variable creation, etc.) to create the final dataset.
The data shown here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.
Calculates a series pairs of (FPR, TPR) which correspond to points displayed by ROC curve. "false positive ratio" will be represented on x axis, while "true positive ratio" on y one.
roc_points(data = NULL, response, predictor, .condition = NULL)roc_points(data = NULL, response, predictor, .condition = NULL)
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
A tibble with two columns:
"tpr". Containing values for "true positive ratio", or y axis.
"fpr". Containing values for "false positive ratio", or x axis.
# Calc ROC points of Sepal.Width as a classifier of setosa species roc_points(iris, Species, Sepal.Width) # Change class to predict to virginica roc_points(iris, Species, Sepal.Width, .condition = "virginica")# Calc ROC points of Sepal.Width as a classifier of setosa species roc_points(iris, Species, Sepal.Width) # Change class to predict to virginica roc_points(iris, Species, Sepal.Width, .condition = "virginica")
Sensitivity indexes provide different ways of calculating area under ROC curve in a specific TPR region. Two different approaches to calculate this area are available:
fp_auc() applies fitted partial area under curve index (FpAUC). This
one calculates area under curve adjusting to points defined by the curve
in the selected region.
np_auc() applies normalized partial area under curve index (NpAUC),
which calculates area under curve over the whole specified region.
fp_auc(data = NULL, response, predictor, lower_tpr, .condition = NULL) np_auc(data, response, predictor, lower_tpr, .condition = NULL)fp_auc(data = NULL, response, predictor, lower_tpr, .condition = NULL) np_auc(data, response, predictor, lower_tpr, .condition = NULL)
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
lower_tpr |
A numeric value between 0 and 1, inclusive, which represents lower value of TPR for the region where to calculate the partial area under curve. Because of definition of sensitivity indexes, upper bound of the region will be established as 1. |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
A numeric value representing the index score for the partial area under ROC curve.
Franco M. y Vivo J.-M. Evaluating the Performances of Biomarkers over a Restricted Domain of High Sensitivity. Mathematics 9, 2826 (2021).
Jiang Y., Metz C. E. y Nishikawa R. M. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 201, 745-750 (1996).
# Calculate fp_auc of Sepal.Width as a classifier of setosa species # in TPR = (0.9, 1) fp_auc(iris, response = Species, predictor = Sepal.Width, lower_tpr = 0.9) # Calculate np_auc of Sepal.Width as a classifier of setosa species # in TPR = (0.9, 1) np_auc(iris, response = Species, predictor = Sepal.Width, lower_tpr = 0.9)# Calculate fp_auc of Sepal.Width as a classifier of setosa species # in TPR = (0.9, 1) fp_auc(iris, response = Species, predictor = Sepal.Width, lower_tpr = 0.9) # Calculate np_auc of Sepal.Width as a classifier of setosa species # in TPR = (0.9, 1) np_auc(iris, response = Species, predictor = Sepal.Width, lower_tpr = 0.9)
Specificity indexes provide different ways of calculating area under ROC curve in a specific FPR region. Two different approaches to calculate this area are available:
tp_auc() applies tighter partial area under curve index (SpAUC).
This one calculates area under curve adjusting to points defined by the curve
in the selected region.
sp_auc() applies standardized partial area under curve index (TpAUC),
which calculates area under curve over the whole specified region.
sp_auc( data = NULL, response, predictor, lower_fpr, upper_fpr, .condition = NULL, .invalid = FALSE ) tp_auc( data = NULL, response, predictor, lower_fpr, upper_fpr, .condition = NULL )sp_auc( data = NULL, response, predictor, lower_fpr, upper_fpr, .condition = NULL, .invalid = FALSE ) tp_auc( data = NULL, response, predictor, lower_fpr, upper_fpr, .condition = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
lower_fpr, upper_fpr
|
Two numbers between 0 and 1, inclusive. These numbers represent lower and upper values of FPR region where to calculate partial area under curve. |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.invalid |
If |
A numeric value representing the index score for the partial area under ROC curve.
McClish D. K. Analyzing a Portion of the ROC Curve. Medical Decision Making 9, 190-195 (1989).
Vivo J.-M., Franco M. y Vicari D. Rethinking an ROC partial area index for evaluating the classification performance at a high specificity range. Advances in Data Analysis and Classification 12, 683-704 (2018).
# Calculate sp_auc of Sepal.Width as a classifier of setosa species # in FPR = (0.9, 1) sp_auc( iris, response = Species, predictor = Sepal.Width, lower_fpr = 0, upper_fpr = 0.1 ) # Calculate tp_auc of Sepal.Width as a classifier of setosa species # in FPR = (0.9, 1) tp_auc( iris, response = Species, predictor = Sepal.Width, lower_fpr = 0, upper_fpr = 0.1 )# Calculate sp_auc of Sepal.Width as a classifier of setosa species # in FPR = (0.9, 1) sp_auc( iris, response = Species, predictor = Sepal.Width, lower_fpr = 0, upper_fpr = 0.1 ) # Calculate tp_auc of Sepal.Width as a classifier of setosa species # in FPR = (0.9, 1) tp_auc( iris, response = Species, predictor = Sepal.Width, lower_fpr = 0, upper_fpr = 0.1 )
Calculate and plot lower bound defined by SpAUC specificity index.
add_spauc_lower_bound( data, response = NULL, predictor = NULL, lower_threshold, upper_threshold, .condition = NULL, .label = NULL )add_spauc_lower_bound( data, response = NULL, predictor = NULL, lower_threshold, upper_threshold, .condition = NULL, .label = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
lower_threshold, upper_threshold
|
Two numbers between 0 and 1, inclusive. These numbers represent lower and upper bounds of the region where to apply calculations. |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.label |
A string representing the name used in labels. If |
SpAUC presents some limitations regarding its lower bound. Lower bound defined by this index cannot be applied to sections where ROC curve is defined under chance line.
add_spauc_lower_bound() doesn't make any check to ensure the index can be
safely applied. Consequently, it allows to enforce the representation even
though SpAUC cound't be calculated in the region.
A ggplot layer instance object.
plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_spauc_lower_bound( iris, response = Species, predictor = Sepal.Width, lower_threshold = 0, upper_threshold = 0.1 )plot_roc_curve(iris, response = Species, predictor = Sepal.Width) + add_spauc_lower_bound( iris, response = Species, predictor = Sepal.Width, lower_threshold = 0, upper_threshold = 0.1 )
Transforms a SummarizedExperiment into a data.frame which can be used as input for other functions.
sumexp_to_df(se, .n = NULL)sumexp_to_df(se, .n = NULL)
se |
A SummarizedExperiment object. |
.n |
An integer or string, representing the index or name of the assay
to use. Same as By default, function combines every assay in |
A data.frame created from combining assays and colData in a SummarizedExperiment.
Calculate a series of metrics describing global and local performance for selected classifiers in a dataset.
summarize_dataset( data, predictors = NULL, response, ratio, threshold, .condition = NULL, .progress = FALSE )summarize_dataset( data, predictors = NULL, response, ratio, threshold, .condition = NULL, .progress = FALSE )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
predictors |
A vector of numeric data variables which represents the different classifiers or predictors in data to be summarized. If |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
ratio |
Ratio or axis where to apply calculations.
|
threshold |
A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve. If If |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
.progress |
If |
A list with different elements:
Performance metrics for each of evaluated classifiers.
Overall description of performance metrics in the dataset.
summarize_dataset(iris, response = Species, ratio = "tpr", threshold = 0.9)summarize_dataset(iris, response = Species, ratio = "tpr", threshold = 0.9)
Calculates a series of metrics describing global and local classifier performance.
summarize_predictor( data = NULL, predictor, response, ratio, threshold, .condition = NULL )summarize_predictor( data = NULL, predictor, response, ratio, threshold, .condition = NULL )
data |
A data.frame or extension (e.g. a tibble) containing values for predictors and response variables. |
predictor |
A data variable which must be numeric, representing values of a classifier or predictor for each observation. |
response |
A data variable which must be a factor, integer or character vector representing the prediction outcome on each observation (Gold Standard). If the variable presents more than two possible outcomes, classes or categories:
New combined category represents the "absence" of the condition to predict.
See |
ratio |
Ratio or axis where to apply calculations.
|
threshold |
A number between 0 and 1, both inclusive, which represents the region bound where to calculate partial area under curve. If If |
.condition |
A value from response that represents class, category or condition of interest which wants to be predicted. If Once the class of interest is selected, rest of them will be collapsed in a common category, representing the "absence" of the condition to be predicted. See |
A single row tibble with different predictor with following metrics as columns:
Area under curve (AUC) as a metric of global performance.
Partial are under curve (pAUC) as a metric of local performance.
Indexes derived from pAUC, depending on the selected ratio. Sensitivity indexes will be used for TPR and specificity indexes for FPR.
Curve shape in the specified region.
# Summarize Sepal.Width as a classifier of setosa species # and local performance in TPR (0.9, 1) summarize_predictor( data = iris, predictor = Sepal.Width, response = Species, ratio = "tpr", threshold = 0.9 ) # Summarize Sepal.Width as a classifier of setosa species # and local performance in FPR (0, 0.1) summarize_predictor( data = iris, predictor = Sepal.Width, response = Species, ratio = "fpr", threshold = 0.1 )# Summarize Sepal.Width as a classifier of setosa species # and local performance in TPR (0.9, 1) summarize_predictor( data = iris, predictor = Sepal.Width, response = Species, ratio = "tpr", threshold = 0.9 ) # Summarize Sepal.Width as a classifier of setosa species # and local performance in FPR (0, 0.1) summarize_predictor( data = iris, predictor = Sepal.Width, response = Species, ratio = "fpr", threshold = 0.1 )