Developing Tag Sources

Mp3tag provides an internal Web Sources Framework which is parameterized through web sources description files.

Using these description files, you can import tag data from theoretically every web site which displays artist/album information via HTML (no JavaScript or PWAs) or provides an API via XML or JSON.

You can find many examples in the Web Sources Scripts category of the Mp3tag Community Forums. Q&A and helpful resources are available via the Web Sources Discussion category.

Specification of the Web Source file format

[Name]
Name of the web source, e.g., Discogs.com

[BasedOn]
Base URL of the web source, e.g., https://www.discogs.com

[IndexUrl]
Search URL where %s is replaced by the search criteria entered by the user, e.g., https://www.discogs.com/artist/%s

[AlbumUrl]
Result base URL (URL result from first search pass will be appended), e.g., https://www.discogs.com

[WordSeparator]
Character/string used instead of blanks within the search criteria entered by the user, e.g., %20 or +

[IndexFormat]
Format string for splitting the output buffer from the first search pass into different fields. %_url% is mandatory, e.g., %_url%|%album%|%type%|%label%

[SearchBy]
Field(s) which are offered as search criteria by the web source. You can use up to three different fields defined as triples of Field Name||%field%||&query=%s

Mp3tag displays one search field for each of the specified fields. Examples of this feature can be found with the standard Discogs web sources.

[Encoding]
Encoding used for all URLs. Can be utf-8, iso-8859-1, url, url-utf-8, or ansi (system codepage will be used).

[UserAgent]
If set to 1, Mp3tag uses its name and version number as user agent, e.g., Mp3tag/3.14.

[ParserScriptIndex]
This key contains a multi-line parser script (start with ...) which parses the search results page for different albums.

[ParserScriptAlbum]
This key contains a multi-line parser script (start with ...) which parses a web page found by the first search pass.

[Include]
This key references another web source description file which is included into this file. The keys from the referenced file overwrite the keys from the referencing file. Examples of this feature can be found with the standard Discogs.src which references Discogs.inc to reduce code duplication.

[MinAppVersionWin]
[MinAppVersionMac]
These keys reference the minimum app version numbers, e.g., 3.22 or 1.8.4, for Mp3tag for Windows and Mac that are required by the web source. As the format and language are ever-evolving, this helps to ensure that the running app supports everything that’s required by the web source to function properly.

[Settings]
This optional key references a .settings configuration schema that steers the configuration dialog for this web source. For details see Configuration Settings.

Parser Scripts

How to Start

First, you need to find a way to identify the lines which contain the interesting bits of information, for example the year of a release or the artist and album names. The Mp3tag parser sees output as lines and characters and you tell the parser how to move from the start to the places you want to extract as information.

Mp3tag uses a pointer which is positioned at the beginning of the file and which can be moved with several commands. There are different ways of moving the pointer to, e.g., the position where the year is referenced on a website

We can either move it down N lines or we can tell it to move down until it finds the text Year:. The first approach would be to use use the command MoveLine N (where N is the number of lines) or GotoLine N — this either means go down N lines from where you are or go to the Nth line from the top of the text.

To perform a search, the command would be FindLine "Year:", which finds the next line from where you are that contains the text Year:.

In all cases the pointer would be moved to the first character of the target line. From there we could tell the pointer to move N steps to the right with MoveChar N, or to move to the Nth character in the line using GotoChar N, or to position the pointer after the text Year: as in FindInLine "Year:".

Please note the difference between the FindLine and FindInLine commands: the former goes through lines from where you are to find a text and places the pointer at the beginning of the line, while the latter looks within the current line where the pointer is and positions it after the found text.

Now that we are where we want, we need tell Mp3tag to store the data. To do this, a Say command is used, but we have to find a way to tell it what to say and what not. In this case we want to output the rest of the line and so we use the SayRest command. An alternative would be the SayUntil " " command, which would output everything until a space character is found.

So, a script to find. extract, and output the year to a buffer name YEAR would look like this:

OutputTo "YEAR"
FindLine "Year:"
FindInLine "Year:"
SayNextNumber

You will notice that the example used the Find-Commands rather than the Move- or Goto-Commands. Whenever you have a chance to use a Find command, please do so, because websites tend to change over time. Using a script that relies on Find-Commands is more likely to survive a change in the raw data than one that relies on absolute positions.

For fields that have varying contents for tracks, e.g., TITLE, TRACK, or ISRC, Mp3tag expects the output buffer to contain the individual values separated and finalized by the pipe character |, e.g., Title 1|Title 2|Title 3|. Use $verticalBar() in case you need to emit the pipe character as content of a field.

JSON

The Web Source Framework supports parsing JSON-formatted data. So whenever the online tag source provides a JSON-based API, it’s recommended to use the API over the rendered website.

Debugging

Doing all this only theoretically can be a bit tricky and if you make an error counting lines or characters you might end up with quite unexpected results. To check what the parser is doing, you can add a Debug "on" "debug.out" command to the top of your script. This will give you an output file which will show you step by step what the parser is doing and why you end up with a given output.

You can use DebugWriteInput "debug-input.out" to see what data you are actually parsing and before you build your script step by step.

List of Parser Commands

Command Parameters Description
FindLine S n Find line with first or Nth occurrence of S (starting from the current position)
FindLineNoCase S n Find line with first or Nth occurrence of S (ignoring case and starting from the current position)
FindInLine S n n Find the next/Nth occurrence of S within the current line. If the 3rd parameter is set to 1, no error is produced if S is not found.
GotoChar N Skip to the Nth character in the current line
GotoLine N Go to Nth line (counting from top)
MoveChar N Move right/left N characters
MoveLine N n Move down/up N lines (starting from current position). If the second parameter is set to 1, possible errors are ignored.
Say S Send S to output
SayUntil S Send everything until S to output
SayUntilML S Send everything until S to output searching across multiple lines
SayRest Send everything to the end of the current line to output
SayNChars N Send next N characters to output
SayNextNumber Outputs the next numeric value from the input.
SayNextWord Outputs the next word from the input.
SayOutput S Send the content of output S to the current output. The output CurrentUrl is always generated at runtime.
SayNewline Outputs a carriage return, line feed (CR LF) sequence.
SayDate s Takes the current input, interprets it as a time interval since 1970-01-01 and outputs a formatted date string in YYYY-MM-DD. It assumes to get the input in seconds, but can be switched to milliseconds via the optional second parameter set to ms.
SayDuration s n n Takes the current input, interprets it as a numerical value and outputs a formatted duration string. It assumes to get the input in seconds, but can be switched to milliseconds via the optional second parameter set to ms or to minutes via m. Setting the optional second parameter to 1 enables the hours part in the duration string. Setting the optional third parameter to 1 enables the milliseconds part in the duration string.
SayRegexp S s s Outputs all matches of the regular expression in the first parameter separated by the string in the second parameter. If the third parameter is provided, the match is only performed till the third parameter content is reached. If the third parameter content cannot be found, the line is ignored.
Set S s Sets the content of output S to the value s. Resets the content if s is omitted.
SkipChars S Skip any characters contained in S
If S Check for occurrence of S on current position.
IfNot S Check for absence of S on current position.
IfOutput S Check for content in output buffer named S.
IfNotOutput S Check for absence of content in output buffer named S.
IfGreater N Check for a number greater than N on current position.
IfLess N Checks for a number less than N on current position.
IfVar S S Checks if the contents of a configuration variable identified by the key in the first parameter matches what is given in the second parameter.
IfNotVar S S Checks if the contents of a configuration variable identified by the key in the first parameter does not match what is given in the second parameter.
Else Alternative branch of an If operation.
Endif End of an If/IfNot/IfOutput/IfNotOutput/IfVar/IfNotVar operation.
OutputTo S Sets the name of the output buffer of the Say commands to S.
Do ... While S n Execute the command surrounded by the two commands while S occurs on current position. The optional second parameter limits the execution of the loop to maximal n times. Nesting of Do commands is not allowed.
Replace S S Replaces all occurrences of the first parameter by the second parameter.
RegexpReplace S S Replaces everything that is matched by the regular expression in the first parameter by the string in the second parameter.
JoinUntil S Joins the current line to the next occurrence of S.
JoinLines N Joins the current line with the next N lines. If N is -1 or exceeds input range, all available lines are joined.
KillTag S s Replaces tag S with s in current line (or blank if omitted).
Trim S Enables or disables default auto-trimming of whitespace characters, S= "on" or "off"
Unspace Removes leading and trailing spaces from the current line.
Debug S s n Debug output, S= "on" or "off", s is an optional file name. n is an optional maximum file size for the debug file in MB.
json S s Enables or disables JSON input mode. S is set to either "on" or "off". If in JSON input mode, the input is parsed as JSON data structure and can be accessed using the following JSON-related functions. The optional second parameter can be set to "current" to use the current, possibly transformed input as JSON input.
json_foreach S If in JSON input mode, starts iteration over an JSON array accessed by S. This scripting function emits the size of the accessed array on the input which can be then used inside the web source script. Iteration must be ended by json_foreach_end.
json_foreach_reverse S If in JSON input mode, starts iteration over an JSON array accessed by S in reversed order. This scripting function emits the size of the accessed array on the input which can be then used inside the web source script. Iteration must be ended by json_foreach_end.
json_foreach_counter n If in JSON input mode, emits the current counter of iteration via json_foreach or json_foreach_reverse with an optional parameter for the number of digits padded with leading zeros if necessary.
json_foreach_end If in JSON input mode, ends iteration of the last iteration started by json_foreach.
json_select_object S If in JSON input mode, selects JSON object denoted by S and makes its fields available to json_select. If successful, the object name S is available at the current position.
json_unselect_object If in JSON input mode, leaves current JSON object and returns to the previous one.
json_select S If in JSON input mode, selects JSON element denoted by S and emits its content to the input.
json_select_array S n s If in JSON input mode, selects the nth element of the JSON array denoted by S to the input. If n is -1, all elements are emitted delimited by s.
json_select_many S S s s n n If in JSON input mode, selects all objects from the JSON array denoted by the first parameter and emits their corresponding elements denoted by the second parameter to the input, delimited by the third parameter. The optional 4th parameter provides the delimiter for the last element. The optional 5th parameter denotes the maximum number of items to select. If the optional 6th parameter is set to 1, empty elements are included in the output.
json_select_many_count S s n If in JSON input mode, emits the number of objects the JSON array denoted by the first parameter and optionally restricts selection further to elements denoted by the second parameter. If the optional 3rd parameter is set to 1, empty elements are included when counting.
N Required numeric parameter
S Required string parameter (in quotes)
n Optional numeric parameter
s Optional string parameter (in quotes)

Configuration Settings

Starting with Mp3tag v3.22 and Mp3tag for Mac 1.8.4, a Tag Source can reference a configuration schema that is used to dynamically create a configuration dialog for the Tag Source.

The configuration schema needs to be stored in a .settings file, e.g., My Tag Source#Settings.settings and is referenced in the corresponding Tag Source via the [Settings] key, e.g., [Settings]=My Tag Source#Settings.settings.

This makes the configuration schema known to the Tag Source.

Configuration Schema Definition

The following example schema illustrates the overall structure and required keys for a configuration schema and lists all supported settings types and means for localization of configuration settings.

{
  "key": "test",
  "title": "Test Settings",
  "settings": [
	{
	  "type": "string",
	  "key": "testChoices",
	  "title": "Something to choose from",
	  "description": "Select the preferred item",
	  "choices": [
		"Item 1",
		"Item 2",
		"Item 3"
	  ],
	  "default": "Item 1"
	},
	{
	  "type": "string",
	  "key": "testEnter",
	  "title": "Something to enter text",
	  "description": "Enter some text",
	  "default": "Default Text"
	},
	{
	  "type": "number",
	  "key": "testNumber",
	  "title": "Something to enter a number",
	  "description": "Enter a number",
	  "default": 42
	},
	{
	  "type": "bool",
	  "key": "testCheck",
	  "title": "Something to check",
	  "description": "Check to enable",
	  "default": false
	},
	{
	  "type": "heading",
	  "key": "testHeading",
	  "title": "A bold-rendered heading text"
	},
	{
	  "type": "separator",
	  "key": "testSeparator",
	  "title": "An optional text that appears below the separator line"
	}
  ],
  "localizations": {
	"de": [
	  {
		"key": "test",
		"title": "Test Einstellungen"
	  },
	  {
		"key": "testChoices",
		"title": "Etwas zur Auswahl",
		"description": "Wählen Sie den gewünschten Eintrag"
	  },
	  {
		"key": "testEnter",
		"title": "Etwas zum Eingeben",
		"description": "Geben Sie den gewünschten Text ein"
	  },
	  {
		"key": "testNumber",
		"title": "Etwas zum Eingeben einer Zahl",
		"description": "Geben Sie die gewünschte Zahl ein"
	  },
	  {
		"key": "testCheck",
		"title": "Etwas zum Abhaken",
		"description": "Häkchen setzen zum Aktivieren"
	  },
	  {
		"key": "testHeading",
		"title": "Ein Überschriftentext in fetter Schrift"
	  },
	  {
		"key": "testSeparator",
		"title": "Ein optionaler Text, der unterhalb der Trennlinie erscheint"
	  }
	]
  }
}

Please note that localization items are optional and are referenced using the key from the individual items in the settings definition. The window title is referenced using the root key from the settings definition.

The language keys are using the ISO-639-1 language code.

Editing the Tag Source Configuration

To configure the settings, the user would use the dedicated menu item that is created at the end of the Tag Sources menu, separated from the actual Tag Sources. The example name My Tag Source#Settings.settings results in a menu item Settings in the My Tag Source submenu.

Choosing this menu item shows a dedicated dialog that is generated based on the individual settings schema.

The actual values that are configured by the user are stored in a separate file settings.json as simple key-value pairs, e.g.,

{
  "test": {
	"testChoices": "Item 1",
	"testCheck": false
  }
}

Note, that this settings file is used for the user settings of all Tag Sources. The individual objects use the key from the respective settings schema definition.

This ensures that you can add new configuration options with future versions of the Tag Source without the need to ask users to copy the additions to their settings file. It would just use the default if there is no value for the added key in the settings.json file.

Accessing Configuration Settings from a Tag Source

A dedicated function is used to access the individual settings in the Tag Source, where the key is used to access the setting which is compared to the given value.

IfVar "key" "value"
IfNotVar "key" "value"

With the examples above, you’d use

IfVar "testCheck" "true"
  ...
Else
 ...
Endif

or

IfNotVar "testChoices" "Item 1"
  ...
Else
 ...
Endif