2009年8月21日 星期五

regular expression "\w" 在 Xml Schema 以及 .Net RegEx 的意義不同

使用XSD對XML驗證E-MAIL格式時 如果用以下的 pattern 則email含有 "_" "-" 符號的 是不會通過的(在 .NET RegEx class是會通過的),

"\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"

原因如下



https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=471388&wa=wsignin1.0
描述
The matching of the simple regular expression "^\w+$" fails for the underscore character in an xml schema pattern facet. The error does not occur when using the Regex class directly.
註解
Note: This problem did not occur in .NET 1.1.
rkahlert 在 2009/6/30 於 上午 07:43 所公佈
Thanks for your feedback.

We are rerouting this issue to the appropriate group within the Visual Studio Product Team for triage and resolution. These specialized experts will follow-up with your issue.

Thank you
Microsoft 在 2009/6/30 於 下午 06:46 所公佈
Hi,

Xml Schema implementation follows the W3C schema spec for handling regualar expressions and hence may behave slightly different from .Net RegEx implementation.

From the specs,
RegEx class defines \w as [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}\p{Lm}]. Underscore character is in Pc unicode category, so it is included in this definition of \w.
o Reference: http://msdn.microsoft.com/en-us/library/20bw873z.aspx

• XSD spec defines \w as [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}] (all characters except the set of "punctuation", "separator" and "other" characters). P category consist of all punctuation (which includes Pc). Therefore underscore is excluded in XSD definition of \w.
o Reference: http://www.w3.org/TR/xmlschema11-2/#regexs

Thanks
Nithya Sampathkumar
Program Manager

沒有留言: