Natural Language Processing with Java Cookbook
上QQ阅读APP看书,第一时间看更新

Performing SBD on specialized text

Unique types of text, such as medical or unusual languages, pose challenges when performing SBD. The frequent heavy use of specialized words and numeric values will not always yield good result with a model trained on a normal text. As a result, there are numerous models that have been trained on specialized datasets. In this recipe, we will demonstrate the use of a LingPipe model that has been trained to handle medical text.

The model will be demonstrated against an actual paragraph from a medical research article found in the Journal of Biomedical Science in 2018, Association between heavy metal levels and acute ischemic stroke, by Ching-Huang Lin, Yi-Ting Hsu, Cheng-Chung Yen, Hsin-Hung Chen, Ching-Jiunn Tseng, Yuk-Keung Lo, and Julie Y. H. Chan at https://jbiomedsci.biomedcentral.com/articles/10.1186/s12929-018-0446-0.

Specifically, we will use the paragraph found in the Results section of the article. It is shown here:

"In total, 33 patients with AIS and 39 healthy controls were enrolled in this study. The major findings were as follows: (1) The stroke group had a significantly lower level of serum Hg (6.4?±?4.3 µg/L vs. 9.8?±?7.0 µg/L, P =?0.032, OR?=?0.90, 95% CI?=?0.81–0.99) and a lower level of urine Hg (0.7?±?0.7 µg/L vs. 1.2?±?0.6 µg/L, P =?0.006, OR?=?0.27, 95% CI?=?0.11–0.68) than the control group. (2) No significant difference in serum Pb (S-Pb), As (S-As), and Cd (S-Cd) levels and urine Pb (U-Pb), As (U-As) and Cd (U-Cd) levels was observed in either group."