Ferenc "Frank" LENGYEL
  • Home
  • Blog EN
  • Blog HU
  • About

Author

Former teacher of Maths, Chemistry and Computer Science in Hungary, former college professor in Hungary, Learning Technologist in Scotland.

View my profile on LinkedIn

Exhausted frog (how to find broken links on Moodle)

29/6/2020

1 Comment

 
Picture
Broken links are bad for business or reputation. Most visitors consider stale hyperlinks to be a sign of laziness, carelessness and disrespect them. Unfortunately, if you have a large website with thousands of external links or let say a VLE with hundreds of courses and lots of external references, it's really hard to identify what external links became dead, and it's even harder to fix them because you need to know the exact location of the broken links.
I am going to give you a possible solution specially designed for our Moodle site, but you can use it on other websites as well. It is not perfect and it needs some human supervision (not 100% automated), but it is free and pretty much customisable.


If you are looking for a tool to do the job, you would find quite a lot, but when I tried to use them (even paid versions) they were not good enough for our purpose.
Let me explain.
The problem we have to deal with is:
  • we have got a lot of courses (hundreds)
  • the courses have got a lot of versions with almost the same (or very similar) content
  • the professional body keeps changing the URLs on their website
    • the content is still there, but you need a brand new URL, the old one is not diverted
    • the content is no longer available
  • the broken links often  don't give you a proper (easy-to-recognise) 404 error message (or any other error)
  • we use different filters on our Moodle site to generate hyperlinks on the go, they are not all stored/coded in the database, they become available only when you visit the page in your browser (rendering)
  • we want our students to do some research, read further studies, papers, or watch videos on those sites and we try to provide the resources, but we have no capacity to go through thousands of pages and try to click on every single link
It is really hard to write some script to scan the content in the database as our database is huge (I mean huge), we use different activities (pages, books, lessons, H5Ps, SCORMs) but the biggest problem is the filters, it makes almost impossible to find every single link just scanning through the database.
There is an existing plugin for Moodle sites, but we cannot install plugins if they are not on our provider's matrix, so, unfortunately, it is not a valid option for us.
We also tried one of the industry-leading SEO tools (and now you understand the title hopefully) and it was very promising, but it was a little bit too much. The tool was very accurate, I would say too accurate as it 'clicked' on every single possible item on the page to scan the whole VLE. Therefore it generated thousands of log entries and we had to be very careful what kind of permission we gave to the 'robot' user. Also, it was complicated to use it as it belongs to the marketing department and we wanted something we can use on our own.
I have also tried different freeware products a lot, but most of them worked fine with static websites only, or not password protected sites only. No good for us.
Picture
And on a random Sunday afternoon, I realised that I have got all the tools and skills I need to create my own solution.
This is what I came up with:
  1. A special read-only student type of user account with permission to access any course on the VLE, but strictly no create/write/delete permission.
  2. MySQL code to generate the list of URLs on the VLE to feed the 'frog'
  3. Browser automation with UI Vision free (formerly known as Kantu, free open-source browser extension works with Chrome or Firefox)
  4. Patience (a lot)

Step 1: The user account

You most probably need a brand new role. You can start from the student archetype and remove 'dangerous' capabilities, or you can start from nothing/blank and add the necessary capabilities. I prefer the second method as you add only the really necessary ones.
At least you need the following settings:
  • Context types where this role may be assigned: System
  • Allow role assignments: None (indifferent)
  • Allow role overrides: None
  • Allow role switches: None
  • Allow role to view: None
  • every single mod/activity:view or mod/activity:read or similar capability to be able to see/access the given activities you would like to check (page, book, lesson in my case)
    • mod/book:read
    • mod/lesson:view
    • mod/page:view
  • moodle/course:view to be able to access all the courses without being enrolled
Now we need an actual user with this role on the system (site) level and you need to know the password to be able to log in (it is better to log in directly as the 'frog' user than to use the login as functionality, as some of the features are very different if you login as someone else).


Step 2 : generate URL's to check

This is a MySQL SELECT statement to give you all the pages, books (with their chapters) and lessons (with their subpages) within a course or a course category.
MySQL

And the result is something like this:

​https://YOUR_OWN_MOODLE_URL/mod/book/view.php?id=78238&chapterid=32344
https://YOUR_OWN_MOODLE_URL/mod/book/view.php?id=78238&chapterid=32345
https://YOUR_OWN_MOODLE_URL/mod/book/view.php?id=78238&chapterid=32346
https://YOUR_OWN_MOODLE_URL/mod/book/view.php?id=78238&chapterid=32347
https://YOUR_OWN_MOODLE_URL/mod/book/view.php?id=78238&chapterid=32348
https://YOUR_OWN_MOODLE_URL/mod/book/view.php?id=78238&chapterid=32349

https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15829
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15830
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15831
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15832
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15833
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15834
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15835
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15836
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15837
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15838
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15839
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15840
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15841
https://YOUR_OWN_MOODLE_URL/mod/lesson/view.php?id=78253&pageid=15842

https://YOUR_OWN_MOODLE_URL/mod/page/view.php?id=78254
https://YOUR_OWN_MOODLE_URL/mod/page/view.php?id=78257
As you can see the script gives you all the pages, books and lessons in all courses under category 51. You can add more categories or can use specific courses, it is up to you. For the first time, I would recommend a small(ish) course where you have around 100 links only, so you can practise and gain a better understanding of the process. The links are ordered by the course module id, as sometimes you have to go and see which link was checked last time without an issue and do some troubleshooting, so it is good to know the order. For example, if you see the 'frog' is struggling with id=78253, you can presume that all the smaller ID numbers have been checked without any problem.

Step 3: Automation

This tool is fun. It helps a lot when you have to do repetitive, boring tasks. You can write scripts (macros) to do the same boring steps over and over again. You can create stable robotic process automation (RPA) scripts with image and text recognition on Windows, Mac and Linux. Real fun. And most importantly it works very well.
Visit their website for more: ​https://ui.vision/
I use the Chrome extension.
This is the script/macro I wrote and I am going to explain it in details so you can amend it to your needs.
JSON

{
    "Name": "BrokenLinksToFind_Pages",
    "CreationDate": "2020-6-30",
    "Commands": [
        {
            "Command": "store",
            "Target": "fast",
            "Value": "!replayspeed"
        },
        {
            "Command": "store",
            "Target": "true",
            "Value": "!ErrorIgnore"
        },
        {
            "Command": "csvRead",
            "Target": "activitiesToCheck.csv",
            "Value": ""
        },
        {
            "Command": "store",
            "Target": "\"YOUR_OWN_MOODLE_URL\"",
            "Value": "avoid"
        },
        {
            "Command": "open",
            "Target": "${!COL1}",
            "Value": ""
        },
        {
            "Command": "waitForPageToLoad",
            "Target": "5000",
            "Value": ""
        },
        {
            "Command": "storeTitle",
            "Target": "",
            "Value": "mytitle"
        },
        {
            "Command": "storeXpathCount",
            "Target": "xpath=//*[@id=\"region-main\"]/descendant-or-self::div[@class=\"no-overflow\"]/descendant-or-self::a[not(contains(@href,${avoid}))]",
            "Value": "links"
        },
        {
            "Command": "gotoIf_v2",
            "Target": "${links}==0",
            "Value": "LabelZeroLinks"
        },
        {
            "Command": "store",
            "Target": "1",
            "Value": "loop"
        },
        {
            "Command": "while_v2",
            "Target": "(${loop}<=${links})",
            "Value": ""
        },
        {
            "Command": "store",
            "Target": "${mytitle}",
            "Value": "!csvLine"
        },
        {
            "Command": "store",
            "Target": "${!URL}",
            "Value": "!csvLine"
        },
        {
            "Command": "store",
            "Target": "${links}",
            "Value": "!csvLine"
        },
        {
            "Command": "storeAttribute",
            "Target": "xpath=//*[@id=\"region-main\"]/descendant-or-self::div[@class=\"no-overflow\"]/descendant-or-self::a[not(contains(@href,${avoid}))][${loop}]@href",
            "Value": "linktovisit"
        },
        {
            "Command": "storeText",
            "Target": "xpath=//*[@id=\"region-main\"]/descendant-or-self::div[@class=\"no-overflow\"]/descendant-or-self::a[not(contains(@href,${avoid}))][${loop}]",
            "Value": "linktovisittext"
        },
        {
            "Command": "store",
            "Target": "${linktovisit}",
            "Value": "!csvLine"
        },
        {
            "Command": "store",
            "Target": "${linktovisittext}",
            "Value": "!csvLine"
        },
        {
            "Command": "gotoIf_v2",
            "Target": "${linktovisit}==\"#\"",
            "Value": "NotOpenedNewTab"
        },
        {
            "Command": "gotoIf_v2",
            "Target": "${linktovisit}.includes(\"A_SPECIFIC_URL_WE_DONT_WANT_TO_CHECK\")",
            "Value": "NotOpenedNewTab"
        },
        {
            "Command": "gotoIf_v2",
            "Target": "${linktovisit}.startsWith(\"mailto\")",
            "Value": "NotOpenedNewTab"
        },
        {
            "Command": "selectWindow",
            "Target": "tab=open",
            "Value": "${linktovisit}"
        },
        {
            "Command": "gotoIf_v2",
            "Target": "!${!LastCommandOK}",
            "Value": "NotOpenedNewTab"
        },
        {
            "Command": "storeTitle",
            "Target": "",
            "Value": "visitedtitle"
        },
        {
            "Command": "store",
            "Target": "${visitedtitle}",
            "Value": "!csvLine"
        },
        {
            "Command": "store",
            "Target": "${!URL}",
            "Value": "!csvLine"
        },
        {
            "Command": "selectWindow",
            "Target": "tab=close",
            "Value": ""
        },
        {
            "Command": "label",
            "Target": "NotOpenedNewTab",
            "Value": ""
        },
        {
            "Command": "executeScript",
            "Target": "return Number (${loop}) + 1",
            "Value": "loop"
        },
        {
            "Command": "csvSave",
            "Target": "checkedURLs",
            "Value": ""
        },
        {
            "Command": "end",
            "Target": "",
            "Value": ""
        },
        {
            "Command": "gotoLabel",
            "Target": "RealEnd",
            "Value": ""
        },
        {
            "Command": "label",
            "Target": "LabelZeroLinks",
            "Value": ""
        },
        {
            "Command": "store",
            "Target": "${mytitle}",
            "Value": "!csvLine"
        },
        {
            "Command": "store",
            "Target": "${!URL}",
            "Value": "!csvLine"
        },
        {
            "Command": "store",
            "Target": "${links}",
            "Value": "!csvLine"
        },
        {
            "Command": "store",
            "Target": "na",
            "Value": "!csvLine"
        },
        {
            "Command": "store",
            "Target": "na",
            "Value": "!csvLine"
        },
        {
            "Command": "store",
            "Target": "na",
            "Value": "!csvLine"
        },
        {
            "Command": "store",
            "Target": "na",
            "Value": "!csvLine"
        },
        {
            "Command": "csvSave",
            "Target": "checkedURLs",
            "Value": ""
        },
        {
            "Command": "label",
            "Target": "RealEnd",
            "Value": ""
        }
    ]
}

OK, let me explain it, hopefully, it won't be too difficult.




  • Line 6: when we replay the macro, make it fast



  • Line 11: do not handle errors as we except errors going to happen, but we would like to deal with them on our own

  • Line 16: we will provide the links uploading a CSV file called 'activitiesToCheck.csv', it will have only 1 column and every row represents an activity (page, book or lesson) on our Moodle instance

  • Line 21: we don't want to check internal links, i.e. glossary or activity name auto-links, so we need to avoid every URL containing the VLE's root URL

  • Line 26: visit the next URL from the provided CSV file, use the value in the first (and only) column


  • Line 31: wait a little bit (5 sec) to make sure that the page is fully there


  • Line 36: remember the title of that page (we are going to write it to another CSV later)

    ​
  • Line 41: find out how many external links are on the recently opened page, it works with BOOST or CLASSIC theme and it checks only the main section of the page, so nothing from the navigation bars or blocks. (See below, highlighted in yellow.)
  • Line 46: if we don't have external links, jump to Line 166 and add a new line to the result CSV file, mostly 'na' as we have nothing to check

  • Line 51: if we have external links, check them one by one using a loop


  • Line 56: have we finished yet?



  • Line 61: remember the title of the actual page



  • Line 66: store the URL of the page we are on



  • Line 71: store the number of links we are about to check


  • Line 76: store the HREF attribute of the hyperlink (the actual link)



  • Line 81: store the visible text on the web page (the text between <a> and </a>)


  • Line 86 and 91: add them to the CSV file


Now we need to handle some exceptions, these part is flexible, you might want to remove them or add even more, it is up to you.


  • Line 96: sometimes the HREF is only a # char as the link's behaviour is provided by JavaScript, we are going to ignore them

  • Line 101: another company-specific link we are going to ignore, even they are external links, we don't want to check them 


  • Line 106: ignore the link if it is an email address
That was the exception-handling part, now back to the normal links
  • Line 111: try to open the link using a new tab



  • Line 116: there was an error, couldn't open the new tab



  • Line 121 & 126: add the newly opened page title to the CSV file







  • Line 131: add the new URL to the CSV file. Please note: this step gives us the possibility to see if the page was diverted

  • Line 136: close the extra tab, go back to the original page


  • Line 141: we have to jump here if we had nothing to open as well (see exception handling in line 96, 101, 106)


  • Line 146: we can go back soon and check the next link on the same page (increase the loop counter)


  • Line 151: we are here if we have checked all the external links in the loop, so we can save the new row to the CSV file before we go back to check the next one

  • Line 156: end of the loop



  • Line 161: we can finish now, so go to the very end



  • Line 166 - 206: we jump here only if we couldn't find any external link on the page, so we have to save a lot of NAs to the CSV file



































    ​
  • Line 211: this is the end of the macro
 
Picture
Actual Learning Content only

Phew, that was interesting. I really hope you are still with me and you can create your own little Frog based on mine.
If you have several pages, books and lessons, put all the URLs into a CSV file called 'activitiesToCheck.csv' and use it with UI.Vision but click on the arrow next to the Play Macro button and choose Play Loop. You can set up how many times you want to repeat the macro, and it will use the provided rows from your CSV file one-by-one.
​Sorted.

A working example:

I can offer one more thing: we can try everything on my dummy Moodle site.
Download the following files:
activitiestocheck.csv
File Size: 0 kb
File Type: csv
Download File

testfrogonlengyelkemoodlelts.json
File Size: 4 kb
File Type: json
Download File

  • Visit the following URL: 
    https://lengyelke.com/moodlelts/mod/page/view.php?id=32
  • To be able to see the content you have to log in as a GUEST
  • Open the UI.Vision tool and create a new macro using the provided JSON file
  • On the CSV tab, import the provided CSV file (activitiesToCheck.csv)
You should see something like this:
Picture
UI.Vision and the Frog
Picture
Before you click on PLAY MACRO, it is worth to have a look at the page content and analyse it a wee bit.
  1. The very first link is a glossary auto-link, so it will be ignored.
  2. https://lengyelke.com/broken is really broken and it gives you a 404 error
  3. https://www.cipd.co.uk/whatever is also broken and it gives you 404
  4. http://lengyelk.com/whatever, unfortunately, this one is so broken (see image on the left), it doesn't give you a proper error, so the macro won't recognise it and if you don't close the tab manually, it will start an endless loop and after a while, the macro will stop. If you know how to solve this issue in UI.Vision, please let me know.
    Normally I just leave the macro running on my second screen or even in a tiny window so I can keep my eyes on it, and if I see the loop is happening, I just jump in and close the tab. If you cannot do that, check the result CSV and you will see thousands of repeats at the end of the file, delete them, go back to the very last good one, delete the pages what you have already checked from the 'activitiesToCheck.csv' file and continue.
  5. https://lengyelke.com/ a good one ;)
  6. https://www.youtube.com/watch?v=SGLz0hCpEcW it is a wrong URL, in the result, you will see as YouTube in the Visited Title column, it is always suspicious, normally I double-check the YouTube links if they don't have the expected title
  7. https://www.facebook.com/revolutapp/ it is a proper link but I added Facebook to the exception part (line 102)
  8. https://www.youtube.com/watch?v=SGLz0hCpEcw, working link
  9. and finally, there is an email address, will be ignored​
​OK, now click on PLAY MACRO

And this is the result you should get:
checkedurls.csv
File Size: 1 kb
File Type: csv
Download File

See a wee explanation below. The Title, URL and number of links are the same, so let just focus on the other fields:
Good luck and enjoy your new toy :)

You might also be interested in: CSS, JQuery, Moodle, SQL, Trick, Tutorial, All

1 Comment
Drug Treatment Atlanta link
8/7/2025 01:10:38 pm

Get expert drug treatment in Atlanta with evidence-based therapies and compassionate support. Our programs treat a wide range of substance use disorders in a safe and supportive environment.

Reply



Leave a Reply.

    Archives

    June 2020
    April 2020
    February 2020
    December 2017
    January 2017
    May 2014
    December 2013

    Categories

    All
    Beginner
    CSS
    Database
    Excel
    Extra
    Forums
    It
    JQuery
    LibreOffice
    Moodle
    OpenOffice
    Spreadsheet
    SQL
    Trick
    Tutorial

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • Blog EN
  • Blog HU
  • About