What is Tweedie Regression?

The tweedie distribution has a density that follows an exponential curve but has a large concentration of data points around 0. Analogous to the discrete case of Zero-Inflated Poisson regression, the Tweedie can be used in continuous data that has a lot of 0 data points.

A common use case of the Tweedie distribution is in modeling the pure premium of insurance claims, or total claim amount per exposure, which consists of both the frequency of claims (count data with many 0’s) and amount per claim (continuous, right-skewed data). One approach would be to separately model the frequency of claims using a Poisson-like approach and the amount portion using a Gamma-like approach and then multiplying the predictions together to model the pure premium.

However, the Tweedie distribution can also be used for such cases and removes the need for separately modeling the individual components using a different distribution. When performing a Tweedie regression, the user must specify a power parameter that represents the underlying target distribution, which can be tuned using cross validation.