正则表达式 Regular Expression 例子 sample VB版
VS SDK Regular Expression Language Service Example Deep Dive (VB)
István Novák (DiveDeeper), Grepton Ltd.
May, 2008
Introduction
This example implements a small language service for demonstration purposes. This is called Regular Expression Language Service since it can tokenize text by RegEx patterns (lower case letters, capital letters, digits) and can use its own syntax coloring scheme for each token. However, the functionality of this sample is quite far away from a full language service it illustrates the basics. The source files belonging to this code have only about three hundred lines of essential code. When reading through this deep dive you are going to get familiar with the following concepts: How language services should be registered with Visual Studio?
What kind of lifecycle management tasks a simple language service has?
How to create a very simple language service?
How to implement a scanner supporting syntax coloring?
To understand concepts treated here it is assumed that you are familiar with the idea of VSPackages and you know how to build and register very simple (even non-functional) packages. To get more information about packages, please have a look at the Package Reference Sample (VisualBasic Reference.Package sample). Very basic knowledge about regular expressions is also expected.
Regular Expression Language Service
Open the Microsoft Visual Studio 2008 SDK Browser and select the Samples tab. In the top middle list you can search for the “VisualBasic Example.RegExLangServ” sample. Please, use the “Open this sample in Visual Studio” link at the top right panel of the browser app to prepare the sample. The application opens in Visual Studio 2008.
Running the sample
Rebuild the package and start it with the Experimental Hive! Without creating a new solution, add a new text file with the File|New|File... menu function. Use the
File|Save As menu function to store the text file with the RegexFile.rgx name. To avoid attaching the .txt extension to the end of the file name, set the “Save as type” to “All files (*.*)” as illustrated in Figure 1:
Figure 1: Save the file with .rgx extension
Type a few words, number and punctuation characters into the editor and see how they are colored! You can see an illustration in Figure 2:
Figure 2: Our language service has effect on syntax coloring
Now, try to save the file again with the File|Save As menu function. This time the Sav e As dialog contains the “RegEx File (*.rgx)” in its “Save as type” field indicating that it recognizes this file type with .rgx extension.
The structure of the sample
The solution contains a VSPackage project named RegExLangServ that uses a few reference assemblies for VS interop starting with name “Microsoft.VisualStudio”. The project’s source files are the following:
The essential code of this sample is in the RegExLangServ.vb, RegExScanner.vb and VsPkg.vb files; in the next scenarios I focus on them. In code extracts used in this deep dive I will omit or change comments to support better readability and remove using clauses, namespace declarations or other non-relevant elements. Scenario: Registering the Language Service with an associated file extension
The language service this sample implements is intended to be used by Visual Studio Shell and by any other third party packages that want to use the functionality of the service. For example, the code window of Visual Studio uses this service for syntax coloring. Just as for any other services a language service also has to be registered with Visual Studio. The registration information is provided by attributes decorating the package class (VsPkg.vb):
' --- Other attributes omitted
Public NotInheritable Class RegularExpressionLanguageServicePackage
Inherits Shell.Package
Implements IDisposable
' ...
End Class
Please note, there are a few attributes not indicated in the code extract above. If you are not familiar with them, take a look at the Package Reference Sample Deep Dive. Language service registration uses the following two attributes:
With these attributes we registered the regular expression language service. However to use the service we have to take care of service instantiation. Scenario: Lifecycle management of a language service
Just as in case of other local or proffered services, our package must manage the lifecycle of the regular expression language service. For most services created with the Managed Package Framework lifecycle management is about creating the service instance. For language services we must take care of the cleanup process, since at the back language services use unmanaged code and unmanaged resources. Our package class uses the standard pattern for managing the lifecycle of the language service instance:
Public NotInheritable Class RegularExpressionLanguageServicePackage
Inherits Shell.Package
Implements IDisposable
Private langService As RegularExpressionLanguageService
Protected Overrides Sub Initialize()
MyBase.Initialize()
langService = New RegularExpressionLanguageService()
langService.SetSite(Me)
Dim sc As IServiceContainer = CType(Me, IServiceContainer)
sc.AddService(GetType(RegularExpressionLanguageService), langService, True)
End Sub
Protected Overrides Overloads Sub Dispose(ByVal disposing As Boolean) Try
If disposing Then
If langService IsNot Nothing Then
langService.Dispose()
End If
End If
Finally
MyBase.Dispose(disposing)
End Try
End Sub
Public Sub Dispos() Implements IDisposable.Dispose
Dispose(True)
GC.SuppressFinalize(Me)
End Sub
End Class
Since our package’s goal is to provide the regular expression language service, if our package gets loaded into the memory and sited (this is the time when the overridden Initialize method is called), we instantly create the service instance. The language service gets sited in our package and then added to the package’s service container and also promoted to the parent container.
In the overridden Dispose method we release the resources held by the language service then clean up the other resources held by the package. The overridden Dispose is called from public Dispose that is implicit implementation of the IDisposable interface. Since our package is cleaned up here, we must use the GC.SuppressFinalize method call to avoid double cleanup of the package instance. The lifecycle management pattern used here should be applied for your own language services.
Scenario: Implementing a small language service
The code editor built in Visual Studio can be customized by language services. This customization features include brace matching, syntax coloring, IntelliSence and many others. In order the code editor can leverage on a language service, it must access a few functions of them.
Such kind of function is the access to the so-called scanner and the parser of the language service. The scanner is responsible for retrieving tokens like keywords, identifiers, double precision numbers, strings, comments and many others from the source text. The parser is responsible to understand what the sequence of tokens means, whether it matches with the expected language syntax, and so on.
Syntax coloring basically uses only the scanner, but can use the parser, for example to use different colors for value and reference types. Brace matching generally uses the parser to find the matching pairs of opening and closing braces.
In this example we use a small language service based on regular expressions that use only the scanner for syntax coloring and no parser for more complex tasks.
To be a language service, we must create a COM object implementing a few interfaces with the IVsLanguage prefix in their names. The Managed Package Framework provides the LanguageService class that is the best type to start with instead of implementing the interfaces from scratch. To create a language service of our own, we must create a LanguageService derived class and override a few methods as the following code extract shows:
Friend Class RegularExpressionLanguageService
Inherits LanguageService
Private scanner As RegularExpressionScanner
Private preference As LanguagePreferences
Public Overrides Function ParseSource(ByVal req As ParseRequest) As AuthoringScope
Throw New NotImplementedException()
End Function
Public Overrides ReadOnly Property Name() As String
Get
Return "Regular Expression Language Service"
End Get
End Property
Public Overrides Function GetFormatFilterList() As String
Return VSPackage.RegExFormatFilter
End Function
Public Overrides Function GetScanner(ByVal buffer As _
Microsoft.VisualStudio.TextManager.Interop.IVsTextLines) As IScanner
If scanner Is Nothing Then
scanner = New RegularExpressionScanner()
End If
Return scanner
End Function
Public Overrides Function GetLanguagePreferences() As LanguagePreferences
If preference Is Nothing Then
preference = New LanguagePreferences(Me.Site,
GetType(RegularExpressionLanguageService).GUID, _
"Regular Expression Language Service")
End If
Return preference
End Function
End Class
(This is the full code of the class; I have only changed indenting and omitted comments.)
Our RegularExpressionLanguageService must be visible by COM and so must have an explicit GUID. The overridden Name property is used to obtain the name of
our language service. The GetFormatFilterList method retrieves the file filter expression u sed by the Save As dialog (“RegEx File (*.rgx)”).
The overridden ParseSource method is to parse the specified source code according to a ParseRequest instance. Since our language service does not implement a parser, we throw a NotImplementedException here.
Visual Studio supports language preference settings. Such kind of preference is for example IntelliSense support (supported or not), line numbers (should be displayed or not), the tab size used by the language and so on. By overriding the GetLanguagePreferences method we can tell what preferences are used by our service. In this implementation we use the default settings.
The GetScanner method is the most important one in our language service. This method retrieves an object implementing the IScanner interface. As its name suggests, the returned object represents the scanner used to tokenize the source code text. The responsibility of a scanner object is delegated to a RegularExpressionScanner instance I treat in the next scenario.
Scenario: Creating the scanner to support syntax coloring The scanner object is crucial for our regular expression language service. It implements the IScanner interface that has only two methods:
Public Interface IScanner
Sub SetSource (source As String, offset As Integer)
Function ScanTokenAndProvideInfoAboutIt (tokenInfo As TokenInfo, _
ByRef state As Integer) As Boolean
End Interface
The SetSource method is used to set a line to be parsed and also an offset is provided to start the parsing from. The ScanTokenAndProvideInfoAboutIt method is to obtain the next token from the currently parsed line. The TokenInfo parameter passed in is a structure to be filled up by the method, this represents the token scanned. The state parameter is an integer value representing the scanner state (it is used for so-called context-dependent scanning).
The RegularExpressionScanner class implements this interface:
Friend Class RegularExpressionScanner
Implements IScanner
Private sourceString As String
Private currentPos As Integer
Private Shared patternTable As RegularExpressionTableEntry() = _
New RegularExpressionTableEntry(3) _
{ _
New RegularExpressionTableEntry("[A-Z]?", https://www.360docs.net/doc/0612877917.html,ment), _ New RegularExpressionTableEntry("[a-z]?", TokenColor.Keyword), _ New RegularExpressionTableEntry("[0-9]?", TokenColor.Number), _
New RegularExpressionTableEntry(".", TokenColor.Text) _
}
Private Shared Sub MatchRegEx(ByVal source As String, ByRef charsMatched As Integer, _
ByRef color As TokenColor)
' --- Implementation omitted from this code extract
End Sub
Public Function ScanTokenAndProvideInfoAboutIt(ByVal tokenInfo As TokenInfo, _
ByRef state As Integer) As Boolean _
Implements IScanner.ScanTokenAndProvideInfoAboutIt
If sourceString.Length = 0 Then
Return False
End If
Dim color As TokenColor = TokenColor.Text
Dim charsMatched As Integer = 0
MatchRegEx(sourceString, charsMatched, color)
If tokenInfo IsNot Nothing Then
tokenInfo.Color = color
tokenInfo.Type = TokenType.Text
tokenInfo.StartIndex = currentPos
tokenInfo.EndIndex = Math.Max(currentPos, currentPos + charsMatched - 1)
End If
currentPos += charsMatched
sourceString = sourceString.Substring(charsMatched)
Return True
End Function
Public Sub SetSource(ByVal source As String, ByVal offset As Integer) _
Implements IScanner.SetSource
sourceString = source
currentPos = offset
End Sub
End Class
The implementation of the SetSource method is trivial. The ScanTokenAndProvideInfoAboutIt method uses MatchRegEx to obtain the next token. According to the token it retrieves the TokenInfo structure is filled up and the position where the next token starts is set.
The scanner defines a nested class called RegularExpressionTableEntry that describes a token represented by a RegEx and also assigns a token color to it. The static patternTable array demonstrates how this structure is set up. The MatchRegEx method uses this array to obtain the next token.
Summary
The Regular Expression Language Service sample demonstrates how easy is to create a very simple language service. In this case, the service created here implements a scanner that is able to tokenize the source text according to regular expression patterns. The language service also supports syntax coloring: each token accepted has a distinguishing color.
Language services must be registered in order to be accessible by the VS Shell and third party packages. With a simple decorating attribute on the package owning a language service it can be associated with a file extension. When the file with the specified extension is opened in the code editor the corresponding language service is used to edit it.