Index()
method. The HomeController Index()
method is the default call when you first open an MVC web application.Index()
method in the HomeController file:CallUrl()
method.HttpClient
variable, which is an object from the native .NET framework.Bearer
authorization token. In such a scenario, you would then add a header to the request. For example:Index()
method where it was called. The following code is what your Index()
method should contain (for now):Index()
method at the following line:tocsection
class attribute. With the Agility Pack, we can eliminate them from the list.ParseHtml()
and add the following code to it:foreach
loop to parse the first anchor tag that contains the link to the programmer profile. Because Wikipedia uses relative links in the href
attribute, we manually create the absolute URL to add convenience when a reader goes into the list to click each link.WriteToCsv()
to write data from the generic list to a file. The following code is the full method that writes the extracted links to a file named “links.csv” and stores it on the local disk.Index()
method. This requires a small change to the default Index()
method shown in the code below.Index()
method changes, you must also add the library reference to the top of your HomeController code. Before you can use Puppeteer, you first must install the library from NuGet and then add the following line in your using
statements:Index()
method, which replaces the same method in the previous section example):programmerLinks
. Notice that the path to chrome.exe
is added to the options
variable. If you don’t specify the executable path, Puppeteer will be unable to initialize Headless Chrome.using
statements:selenium
packagedriver.page_source
will return the full page HTML code.driver.title
gets the page's titledriver.current_url
gets the current URL (this can be useful when there are redirections on the website and you need the final URL)find_elements
(note the plural) to return a list of elements.WebElement
is a Selenium object representing an HTML element.element.text
element.click()
element.get_attribute('class')
element.send_keys('mypassword')
is_displayed()
. This returns True if an element is visible to the user.type=hidden
like this:driver.get()
driver.find_element_by_*
and then element.send_keys()
to send text to the inputelement.click()
None
because all of the find_element_by_*
raise an exception if the element is not found in the DOM.So we have to use a try/except block and catch the NoSuchElementException
exception:time.sleep(ARBITRARY_TIME)
before taking the screenshot.WebDriverWait
object.time.sleep()
you will probably use an arbitrary value. The problem is, you're either waiting for too long or not enough.Also the website can load slowly on your local wifi internet connection, but will be 10 times faster on your cloud server.With the WebDriverWait
method you will wait the exact amount of time necessary for your element/data to be loaded.element_to_be_clickable
text_to_be_present_in_element
element_to_be_clickable