Auto-Generate UML Actors & Use Cases Models from Business Requirements, Vision & Scope, Business Case, Project Charters and more, using Python & Natural Language Processing: Part 1.1
Further to part 1 (that you can find here Auto-Generate UML Actors & Use Cases Models from — Business Requirements, Vision & Scope, Business Case, Project Charters and more — using Python & Natural Language Processing: Part 1 | by Shalabh Bhatnagar | Dec, 2022 | Medium), here is a little piece of implementation that some have asked my help on. In that they want to extract all the requirements — usually numbered or sub-numbered.
I think it is a useful addition to this series so I thank you one and all for making my lazy bones work on it.
As I said in part 1, the following documents contain a lot of info when it comes to the end-user needs or requirements. Frequently, requirements are found in & as text prefixed by numbers and also nest into sub-sections that themselves are numbered (a section or sub-section wouldn’t be numbered, it if was not as a key concept or topic in the domain space). It makes sense to pull out such requirements.
1. Vision & Scope
2. Project Charter
3. Business Case
4. User Requirements
5. Business Requirements
6. Product Requirements
7. Software Requirements Specification
From Numbered Requirements
· We can find Actors and Use Cases faster.
· Code becomes efficient.
Applies To
· Any text document. And what makes it great is the power RegEx has. Just 4 characters is all it takes to find heaps of numbered text. RegEx, as we know, offers billions of possibilities.
Input
· Text versions of any of the above documents that contain numbered text lines. I have used the text in this paper as input.
· Text lines and they must start with the number. For example, “9. Quick brown fox jumped over the silly lazy dog.” prefixes 9 and a valid input for the search expression. My feel is that it should work in any combination. Please holler if it doesn’t.
Output
· Numbered lines of text with any of the suffices– new line, space.
Benefits
· Send them all to the greedy testers that are itching to write their test cases (and harass you).
· You can quickly create Traceability Matrix and leave some fluff behind.
· You can summarize a document.
· You can make a file faster for a tool such as JIRA or Confluence or any other tools you use.
· Unstructured text becomes structured.
When to Use
· Your call!
Code
import os
import re
import pandas as pd
cwd = os.getcwd() + "\\"
filename = str(__file__).split("\\")[-1].split(".")[0]
# Text converted to lower case for standardization
raw_text = open(cwd + "some.txt", "r").read().lower()
# List to help store the outcome in a CSV file
overall_matches = []
# Regex criterion
find_criteria = ['\d.+']
for to_find in find_criteria:
print(f"\nFinding '{to_find}'...")
pattern = re.compile(to_find)
matches = pattern.finditer(raw_text)
for match in matches:
print(raw_text[match.start(): match.end()])
overall_matches.append(raw_text[match.start(): match.end()])
schema = { "overall_matches": overall_matches }
df = pd.DataFrame(schema)
df.to_csv(cwd + filename + ".csv", index_label="serial")