Ancillary to the hypermatrix software is document similarity software which, at its simplest, analyses the hypermatrix output and calculates the number of bonds between sentences in the exemplar texts and documents within the testbed.

These raw counts are then transformed into similarity coefficients on a standardised scale of 0 to 1, where a score of 0 implies no bonds between the documents under comparison and a score of 1 implies that the document pair are as bonded with each other as they are with themselves. The final match weight is calculated by applying weighting measures based upon factors including term, expected and observed frequency, document length, sentence length, and z-scores on the raw bond counts. We have also examined factors such as sentence position, and theme & rheme (Peng, 1999).

For the purposes of illustrating granularity in the results, our online SHARES demo shows the raw similarity score and not the standardised score described above. The raw scores can be ranked in order of magnitude so that articles on the same topic have much higher match weights (in the 100s or 1000s) than those on different topics. (Note that raw scores rarely drop to zero as articles on different topics may share common common, non-topic words which are not classed as stopwords.)

The standardised and unstandardised scores for every document pair comparison in our test corpus (in Excel format) can be seen by clicking on the respective links.

