What is Jaccard Index / Distance?

The Jaccard Index measures similarity for two sets of data by computing the ratio of items present in both sets (the intersection) to the total number of distinct items present in either set (the union). As a larger Jaccard Index indicates more similar sets, it can be converted to a distance metric by subtracting the index from 1. It is also a measure that is commonly used in measuring text similarity, as documents can be decomposed into sets based on the words they contain. For two sets X1 and X2, the Jaccard Distance is given by