dc.contributor.author | Jain, Shubham | |
dc.contributor.author | de Buitléir, Amy | |
dc.contributor.author | Fallon, Enda | |
dc.date.accessioned | 2021-10-19T10:56:56Z | |
dc.date.available | 2021-10-19T10:56:56Z | |
dc.date.copyright | 2020 | |
dc.date.issued | 2020-11-30 | |
dc.identifier.citation | Jain, S., de Buitléir, A., Fallon, E. (2020) Unsupervised Noise Detection in Unstructured data for Automatic Parsing. 16th International Conference on Network and Service Management (CNSM), 2020, pp. 1-5, doi: 10.23919/CNSM50824.2020.9269096. | en_US |
dc.identifier.uri | http://research.thea.ie/handle/20.500.12065/3722 | |
dc.description.abstract | The telecommunications industry makes extensive use of data extracted from logs, alarms, traces, diagnostics, and other monitoring devices. Analyzing the generated data requires that the data be parsed, re-structured, and re-formatted. Developing custom parsers for each input format is labor-intensive and requires domain knowledge. In this paper, we describe a novel unsupervised text processing pipeline to automatically detect and label relevant data and eliminate noise using Levenshtein similarity and Agglomerative clustering. We experiment with different similarity and clustering algorithms on a selection of common data formats to verify the accuracy of the proposed technique. The results suggest that the proposed methodology has higher accuracy. | en_US |
dc.format | PDF | en_US |
dc.language.iso | eng | en_US |
dc.publisher | IEEE | en_US |
dc.relation.ispartof | 2020 16th International Conference on Network and Service Management (CNSM) | en_US |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Unsupervised data mining | en_US |
dc.subject | Information extraction | en_US |
dc.subject | Clustering | en_US |
dc.subject | Similarity | en_US |
dc.title | Unsupervised noise detection in unstructured data for automatic parsing | en_US |
dc.conference.date | 2020-11-02 | |
dc.conference.host | IEEE | en_US |
dc.conference.location | Izmir, Turkey | en_US |
dc.contributor.affiliation | Athlone Institute of Technology | en_US |
dc.contributor.sponsor | Irish Research Council Enterprise Partnership Scheme Postgraduate Scholarship 2020 | en_US |
dc.description.peerreview | yes | en_US |
dc.identifier.doi | 10.23919/CNSM50824.2020.9269096. | en_US |
dc.identifier.orcid | https://orcid.org/ 0000-0002-0913-3948 | en_US |
dc.identifier.orcid | https://orcid.org/ 0000-0001-8359-0920 | en_US |
dc.identifier.orcid | https://orcid.org/ 0000-0002-8300-5813 | en_US |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | en_US |
dc.subject.department | Software Research Institute AIT | en_US |
dc.type.version | info:eu-repo/semantics/acceptedVersion | en_US |
dc.relation.projectid | Project EPSPG/2020/7 | en_US |