Links & Bonds
In SHARES, a link is an instance of lexical repetition between
a pair of sentences, one from text A and one from text B. In the example
below there are 3 links between the two sentences, on the words Indonesia,
economic and growth. Stopwords
are excluded from linking. If the stemmed corpus is selected, then links
are also formed between words with the same base (e.g. economic would
link with economy).
An individual token in a
sentence is only allowed to link once with another sentence. In the
example below, if the word economic had appeared twice in sentence
B, the single economic in sentence A would only link with the first
occurrence of economic in sentence B. However, if economic
had appeared twice in each sentence then 2 links would be formed on
Consecutive links between two sentences are treated as a phrase and,
thus, a single link. For example, if the phrase economic retrenchment
had appeared in both sentences then it would count as only one link.
The links between each pair of sentences in a text are recorded in a connectivity matrix.
If the number of links between a pair of sentences (from different
articles) is higher than the selected link threshold, the sentences are said to have
Sentences over a given bond threshold (i.e. bonded with more than a specified number of other
sentences) are considered to be core bearers of information. The number of
bonded sentences between a pair of texts is aggregated and this count is higher for texts on similar topics.
This method of calculation of aboutness is particularly
relevant in comparing newspaper articles which, as relatively short texts, are
typically expressing just one main idea or proposition and developing this
sentence by sentence with little redundancy.
Links in SHARES: 3 Links, 1 Bond (if Link
Threshold = 3)
Next: Weighting >>