Class PersonPageData
- java.lang.Object
-
- romanemperorsscraper.scraping.people.PersonPageData
-
public class PersonPageData extends java.lang.Object
Class used to get and represent information about people related to Roman Emperors Dynasties.- Author:
- Radu Ionut Barbalata
- See Also:
Person
,PersonNameUrl
,PersonPageDataSerializer
-
-
Field Summary
Fields Modifier and Type Field Description private java.util.ArrayList<PersonNameUrl>
adoptedChildren
private PersonNameUrl
adoptiveFatherNameUrl
private java.lang.String
birthDate
private java.util.ArrayList<PersonNameUrl>
children
private java.lang.String
deathDate
private static java.util.HashMap<java.lang.String,java.util.HashMap<java.lang.String,java.lang.String>>
dynastiesPeopleList
private PersonNameUrl
fatherNameUrl
private java.lang.String
imageUrl
private static java.util.HashSet<java.lang.String>
months
private PersonNameUrl
motherNameUrl
private java.lang.String
personDynastyPageUrl
private PersonNameUrl
personNameUrl
private java.lang.String
reignBeginningDate
private java.lang.String
reignEndDate
private java.lang.String
role
private java.util.ArrayList<PersonNameUrl>
spouses
private java.util.ArrayList<PersonNameUrl>
successors
private static java.util.HashMap<java.lang.String,PersonPageData>
urlPersonPageDataMatches
-
Constructor Summary
Constructors Constructor Description PersonPageData(java.lang.String personPageUrl, org.json.simple.JSONObject serializedPersonPageData)
Fulfill the fields of a PersonPageData object from a JSON objectPersonPageData(org.openqa.selenium.WebDriver webDriver, PersonNameUrl personNameUrl, java.lang.String dynastyPageUrl)
Fulfill the fields of a PersonPageData object with the information obtained during the scraping of a dynasty member's Wikipedia page
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static void
addToUrlPersonPageDataMatches(java.lang.String url, PersonPageData personPageData)
Add a PersonPageData instance to the urlPersonPageDataMatches HashMapprivate static java.lang.StringBuilder
clearBrackets(java.lang.String information)
Clean a given information string by removing brackets and characters insideprivate static java.lang.StringBuilder
extractDates(java.lang.String[] possibleDates)
Check for every element in the input if it can be part of a possible date, so the output contain numbers, months, a.C or d.C, and -> as well it contains particular cases like an or in the middle of two dates date1 or date2, and the output will be date1 or date2java.util.ArrayList<PersonNameUrl>
getAdoptedChildren()
PersonNameUrl
getAdoptiveFatherNameUrl()
java.lang.String
getBirthDate()
static PersonPageData
getCachedPersonPageData(java.lang.String personPageUrl, java.lang.String dynastyPageUrl)
Return an already created PersonPageData object or nulljava.util.ArrayList<PersonNameUrl>
getChildren()
java.lang.String
getDeathDate()
static java.util.HashMap<java.lang.String,java.util.HashMap<java.lang.String,java.lang.String>>
getDynastiesPeopleList()
PersonNameUrl
getFatherNameUrl()
java.lang.String
getImageUrl()
PersonNameUrl
getMotherNameUrl()
java.lang.String
getPersonDynastyPageUrl()
PersonNameUrl
getPersonNameUrl()
private static java.util.ArrayList<PersonNameUrl>
getPersonNameUrls(java.util.ArrayList<java.lang.String> peopleNames, org.openqa.selenium.WebElement informationDataElement)
For each link contained in the informationDataElement we check if its text is also contained in the peopleNames ArrayList of strings and eventually add it to an output ArrayList if that's truestatic PersonPageData
getPersonPageData(org.openqa.selenium.WebDriver webDriver, PersonNameUrl personNameUrl, java.lang.String dynastyPageUrl)
Construct a PersonPageData object or return it if it was already constructedjava.lang.String
getReignBeginningDate()
java.lang.String
getReignEndDate()
java.lang.String
getRole()
java.util.ArrayList<PersonNameUrl>
getSpouses()
java.util.ArrayList<PersonNameUrl>
getSuccessors()
static java.util.HashMap<java.lang.String,PersonPageData>
getUrlPersonPageDataMatches()
boolean
isEmperorOrDictator()
static void
setUrlPersonPageDataMatches(java.util.HashMap<java.lang.String,PersonPageData> urlPersonPageDataMatches)
Replace urlPersonPageDataMatches with the given one.static boolean
textImpliesDictatorRole(java.lang.String textLine)
Check if a given line of text contains something which implies the dictator rolestatic boolean
textImpliesEmperorRole(java.lang.String textLine)
Check if a given line of text contains something which implies the emperor role
-
-
-
Field Detail
-
personNameUrl
private PersonNameUrl personNameUrl
-
personDynastyPageUrl
private java.lang.String personDynastyPageUrl
-
imageUrl
private java.lang.String imageUrl
-
role
private java.lang.String role
-
birthDate
private java.lang.String birthDate
-
deathDate
private java.lang.String deathDate
-
reignBeginningDate
private java.lang.String reignBeginningDate
-
reignEndDate
private java.lang.String reignEndDate
-
motherNameUrl
private PersonNameUrl motherNameUrl
-
fatherNameUrl
private PersonNameUrl fatherNameUrl
-
adoptiveFatherNameUrl
private PersonNameUrl adoptiveFatherNameUrl
-
successors
private java.util.ArrayList<PersonNameUrl> successors
-
spouses
private java.util.ArrayList<PersonNameUrl> spouses
-
children
private java.util.ArrayList<PersonNameUrl> children
-
adoptedChildren
private java.util.ArrayList<PersonNameUrl> adoptedChildren
-
urlPersonPageDataMatches
private static java.util.HashMap<java.lang.String,PersonPageData> urlPersonPageDataMatches
-
dynastiesPeopleList
private static java.util.HashMap<java.lang.String,java.util.HashMap<java.lang.String,java.lang.String>> dynastiesPeopleList
-
months
private static java.util.HashSet<java.lang.String> months
-
-
Constructor Detail
-
PersonPageData
public PersonPageData(java.lang.String personPageUrl, org.json.simple.JSONObject serializedPersonPageData)
Fulfill the fields of a PersonPageData object from a JSON object- Parameters:
personPageUrl
- the person's Wikipedia page URLserializedPersonPageData
- the JSONObject to deserialize data from
-
PersonPageData
public PersonPageData(org.openqa.selenium.WebDriver webDriver, PersonNameUrl personNameUrl, java.lang.String dynastyPageUrl)
Fulfill the fields of a PersonPageData object with the information obtained during the scraping of a dynasty member's Wikipedia page- Parameters:
webDriver
- the Web Driver instance to be used to scrape datapersonNameUrl
- the person's PersonNameUrl objectdynastyPageUrl
- the Wikipedia page URL of the dynasty we're currently scraping on
-
-
Method Detail
-
getPersonPageData
public static PersonPageData getPersonPageData(org.openqa.selenium.WebDriver webDriver, PersonNameUrl personNameUrl, java.lang.String dynastyPageUrl)
Construct a PersonPageData object or return it if it was already constructed- Parameters:
webDriver
- the Web Driver instance to be used to scrape datapersonNameUrl
- PersonNameUrl object of the persondynastyPageUrl
- the dynasty's Wikipedia page url- Returns:
- the constructed PersonPageData object
-
getCachedPersonPageData
public static PersonPageData getCachedPersonPageData(java.lang.String personPageUrl, java.lang.String dynastyPageUrl)
Return an already created PersonPageData object or null- Parameters:
personPageUrl
- the person's Wikipedia page URLdynastyPageUrl
- the dynasty's Wikipedia page URL- Returns:
- PersonPageData object relative to the given URL
-
addToUrlPersonPageDataMatches
public static void addToUrlPersonPageDataMatches(java.lang.String url, PersonPageData personPageData)
Add a PersonPageData instance to the urlPersonPageDataMatches HashMap- Parameters:
url
- the Wikipedia page URL to be used to later retrieve itpersonPageData
- the PersonPageData instance
-
setUrlPersonPageDataMatches
public static void setUrlPersonPageDataMatches(java.util.HashMap<java.lang.String,PersonPageData> urlPersonPageDataMatches)
Replace urlPersonPageDataMatches with the given one. Used to replace all the PersonPageData stored instances with new ones (e.g. when importing data from JSON files)- Parameters:
urlPersonPageDataMatches
- the new urlPersonPageDataMatches HashMap content
-
getUrlPersonPageDataMatches
public static java.util.HashMap<java.lang.String,PersonPageData> getUrlPersonPageDataMatches()
- Returns:
- the urlPersonPageDataMatches HashHap of Wikipedia page URL : PersonPageData entries
-
textImpliesEmperorRole
public static boolean textImpliesEmperorRole(java.lang.String textLine)
Check if a given line of text contains something which implies the emperor role- Parameters:
textLine
- the line of text to check- Returns:
- true if the line implies the emperor role, false otherwise
-
textImpliesDictatorRole
public static boolean textImpliesDictatorRole(java.lang.String textLine)
Check if a given line of text contains something which implies the dictator role- Parameters:
textLine
- the line of text to check- Returns:
- true if the line implies the dictator role, false otherwise
-
extractDates
private static java.lang.StringBuilder extractDates(java.lang.String[] possibleDates)
Check for every element in the input if it can be part of a possible date, so the output contain numbers, months, a.C or d.C, and -> as well it contains particular cases like an or in the middle of two dates date1 or date2, and the output will be date1 or date2- Parameters:
possibleDates
- a list of Strings- Returns:
- StringBuilder object with the date
-
clearBrackets
private static java.lang.StringBuilder clearBrackets(java.lang.String information)
Clean a given information string by removing brackets and characters inside- Parameters:
information
- the given information string- Returns:
- the cleaned result as a StringBuilder instance
-
getPersonNameUrls
private static java.util.ArrayList<PersonNameUrl> getPersonNameUrls(java.util.ArrayList<java.lang.String> peopleNames, org.openqa.selenium.WebElement informationDataElement)
For each link contained in the informationDataElement we check if its text is also contained in the peopleNames ArrayList of strings and eventually add it to an output ArrayList if that's true- Parameters:
peopleNames
- an ArrayList containing the people namesinformationDataElement
- a WebElement containing the people anchor elements with their text and pointed page URL- Returns:
- an ArrayList containing the PersonNameUrl(s) of all the people with a Wikipedia page URL
-
isEmperorOrDictator
public boolean isEmperorOrDictator()
- Returns:
- true if the person's role is Emperor or Dictator, false otherwise
-
getPersonNameUrl
public PersonNameUrl getPersonNameUrl()
- Returns:
- the PersonNameUrl instance related to this PersonPageData
-
getMotherNameUrl
public PersonNameUrl getMotherNameUrl()
- Returns:
- the PersonNameUrl instance related to this person's mother
-
getFatherNameUrl
public PersonNameUrl getFatherNameUrl()
- Returns:
- the PersonNameUrl instance related to this person's father
-
getAdoptiveFatherNameUrl
public PersonNameUrl getAdoptiveFatherNameUrl()
- Returns:
- the PersonNameUrl instance related to this person's adoptive father
-
getSuccessors
public java.util.ArrayList<PersonNameUrl> getSuccessors()
- Returns:
- an ArrayList containing the PersonNameUrl instance of each successor
-
getSpouses
public java.util.ArrayList<PersonNameUrl> getSpouses()
- Returns:
- an ArrayList containing the PersonNameUrl instance of each spouse
-
getChildren
public java.util.ArrayList<PersonNameUrl> getChildren()
- Returns:
- an ArrayList containing the PersonNameUrl instance of each child
-
getAdoptedChildren
public java.util.ArrayList<PersonNameUrl> getAdoptedChildren()
- Returns:
- an ArrayList containing the PersonNameUrl instance of each adopted child
-
getBirthDate
public java.lang.String getBirthDate()
- Returns:
- the person's birthdate
-
getDeathDate
public java.lang.String getDeathDate()
- Returns:
- the person's death date
-
getReignBeginningDate
public java.lang.String getReignBeginningDate()
- Returns:
- the person's reign beginning date. It may be null if it isn't an emperor or a dictator.
-
getReignEndDate
public java.lang.String getReignEndDate()
- Returns:
- the person's reign end date. It may be null if it isn't an emperor or a dictator.
-
getPersonDynastyPageUrl
public java.lang.String getPersonDynastyPageUrl()
- Returns:
- the Wikipedia page URL of the person's dynasty
-
getRole
public java.lang.String getRole()
- Returns:
- the person's role (Emperor or Dictator)
-
getImageUrl
public java.lang.String getImageUrl()
- Returns:
- the person's image URL
-
getDynastiesPeopleList
public static java.util.HashMap<java.lang.String,java.util.HashMap<java.lang.String,java.lang.String>> getDynastiesPeopleList()
- Returns:
- an HashMap having as key the dynasties' Wikipedia page URLs and as value another HashMap containing the dynasty people with the name-birthdate as key and the Wikipedia page URL as value
-
-