HTML: Remove & and replace with &amp

It is often necessary to replace the & character with the entity & to get validated (x)html. This clip will replace all appropriate ampersands excluding those that are part of defined character entities.

This one has a really really long line at the start, so what it may do with the pre tag to your browser is anyone’s guess.

; a clip to run a file looking for
; stray & characters that aren't part
; of   & etc type of special
; characters and replace them with &
; by Don Passenger
; email: don at
; comments welcome

^!SetWordWrap Off
^!Jump Doc_Start

; assign special characters to array
^!SetListDelimiter ";"
; super long line to follow
^!SetArray %allspecialchars%="&©>< " ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ—™™áâ´æàåãä¦ç¸¢¤°÷éêèðë½¼¾íî¡ì¿ï«¯µ·¬ñóôòªºøÕõö¶±£»®§­¹²³ßþ×úûù¨üý¥ÿ"
; end of super long line

; find next &
^!Find "&" TIS

; select 8 characters starting with an &
^!Jump Select_Start
^!Select +8

^!Set %word%=^$GetSelection$
;^!Info x^%word%x
; split the 8 characters selected on ;
; if there is no semi-colon, then we do
; not have a special character
; if we do, then we may
^!SetListDelimiter ";"
^!SetArray %specialchar%=^%word%

; if ^%specialchar0% is 1 then we had no ; so
; goto replacement phase
^!If "^%specialchar0%" = "1" Replace&

; we have a ; so we want to test the first part up to
; the ; to see if it is a known entity

^!Set %counter%=1
; if all specials exhausted, then need to replace
^!If "^%counter%" > "^%allspecialchars0%" Replace&
; test each special character and on match go to next
^!If "^$StrLower(^%specialchar1%)$" = "^%allspecialchars^%counter%%" MatchSpecial

; that one didn't match, try again
^!Inc %counter%
^!GoTo TestSpecials

; we found a loose & so we need to replace it
; with &
^!Jump Select_Start
^!Replace "&" >> "&" TIHS
; having replaced it go look for next
^!GoTo Loop

^!Jump Select_Start
^!MoveCursor +1
^!GoTo Loop

Keywords: replace, ampersand, &, &, replace, valid, html, xhtml, url

Leave a Reply

Recent Posts