XML基础讲解
XML语法
1、第一行必须定义为文档声明
<?xml version='1.0' encoding="utf-8" ?>
2、xml文档中必须有且只有一个根标签
<?xml version='1.0' ?>
<Users>//根标签
<User id='1'>
<name>alex</name>
</User>
</Users>
3、属性值必须唯一,如id='1' 4、CDATA区:在该区域的数据回被原样展示
<![CDATA['数据']]]]>
5、xml实体 实体是用于定义引用普通文本或特殊字符的快捷方式的变量。 实体引用是对实体的引用。 实体可在内部或外部进行声明。 <<!ENTITY 实体名称 SYSTEM "URI/URL">>外部实体 <!ENTITY 实体名称 "实体的值">
XML约束
1、约束定义:规定XML文档的书写规则 2、约束的分类: DTD:一种简单的约束技术
//ELEMENT定义标签,定义一个students标签,student标签下可以有多个student子标签
<!ELEMENT students (student*))>
//定义一个student子标签,可以有name,age,sex标签
<!ELEMENT student (name,age,sex))>
定义name标签,值为字符串
<!ELEMENT name (#PCDATA))>
<!ELEMENT name (#PCDATA))>
<!ELEMENT name (#PCDATA))>
//ATTLIST定义属性,定义student的属性ID为数字,且时必须的
<!ATTLIST student number ID #REQUIRED>
- 引入DTD文档到XMl文档中,dtd文档类型有以下俩种
-
- 内部dtd:将约束规则定义在xml文档中 <! DOCTYPE students[ 约束规则]>
-
- 外部dtd:将约束的规则定义在外部的dtd文件中,
<!DOCTYPE studens SYSTEM "student.dtd">
本地:<! DOCTYPE 根标签名 SYSTEM "dtd文件的位置">
网络:<! DOCTYPE 根标签名 PUBLIC "dtd文件名" "dtd文件的URL">
XML解析讲解
1、解析:操作XML文档,将文档中的数据读取到内存中 2、操作xml文档: 1、解析 2、写入:将内存中的数据保存到xml文档中。持久化的存储 3、解析xml方式: DOM:将标记语言一次性加载进内存,在内存中形成dom树 SAX:逐行读取,基于事件驱动。 Digester/JAXB:适用范围 : 有将 XML 文档直接转换为 JavaBean 需求。
XML常见解析器
1、JAXP:sun公司提供的解析器,支持dom和sax俩种解析方式 2、DOM4J:一款非常优秀的解析器 3、Jsoup:Html解析器 4、Pull:android操作系统内置的解析器。sax方式的
Jsoup解析器使用
public static void main(String[] args) throws IOException { //1、获取Document对象,根据xml文档获取 //1.1、获取xml文档的path String path = XMlXXEFormat.class.getClassLoader().getResource("student.xml").getPath(); //1.2解析xml文档,加载文档进内存,获取dom树-->Document Document document = Jsoup.parse(new File(path),"utf-8"); //1.3通过Document对象获取元素Element Elements elements = document.getElementsByTag("name"); //1.4通过Element获取元素的值 Element element = elements.get(0); System.out.println(element.text()); }
XML三种解析方式的使用
DOM 解析 XML Java 中的 DOM 接口简介: JDK 中的 DOM API 遵循 W3C DOM 规范,其中 org.w3c.dom 包提供了 Document、DocumentType、Node、NodeList、Element 等接口, 这些接口均是访问 DOM 文档所必须的。我们可以利用这些接口创建、遍历、修改 DOM 文档。
javax.xml.parsers 包中的 DoumentBuilder 和 DocumentBuilderFactory 用于解析 XML 文档生成对应的 DOM Document 对象
javax.xml.transform.dom 和 javax.xml.transform.stream 包中 DOMSource 类和 StreamSource 类,用于将更新后的 DOM 文档写入 XML 文件
public class DOMParser {
//利用newInstance方法得到创建DOM解析的工厂对象
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
//Load and parse XML file into DOM
public Document parse(String filePath) {
Document document = null;
try {
//DOM parser instance
//调用工厂对象的newDocumentBuilder方法得到DOM解析器对象
DocumentBuilder builder = builderFactory.newDocumentBuilder();
//parse an XML file into a DOM tree
//解析xml文档
document = builder.parse(new File(filePath));
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return document;
}
SAX解析XMl SAX 解析器接口和事件处理器接口定义在 org.xml.sax 包中。SAX是一行一行读取xml文件的,所以是基于事件监听器主要的接口包括 ContentHandler、DTDHandler、EntityResolver 及 ErrorHandler。 其中 ContentHandler 是主要的处理器接口,用于处理基本的文档解析事件;DTDHandler 和 EntityResolver 接口用于处理与 DTD 验证和实体解析相关的事件; ErrorHandler 是基本的错误处理接口。DefaultHandler 类实现了上述四个事件处理接口。上面的例子中 BookHandler 继承了 DefaultHandler 类, 并覆盖了其中的五个回调方法 startDocument()、endDocument()、startElement()、endElement() 及 characters() 以加入自己的事件处理逻辑
1.创建一个SAXParserFactory工厂对象
SAXParserFactory factory=SAXParserFactory.newInstance();
2.获得解析器
SAXParser parser=factory.newSAXParser();
3.调用解析方法解析xml,这里的第一个参数可以传递文件、流、字符串、需要注意第二个参数(new DefaultHander)
File file=new File("girls.xml");
//解析xml文件
parser.parse(file,new DefaultHandler());
Digester 解析 XML
// 定义要解析的 XML 的路径,并初始化工具类
File input = new File("books.xml");
Digester digester = new Digester();
//解析xml文件
Books books = (Books) digester.parse(input);
JAVA常见的XXE漏洞写法和防御
apache OFBiz中的XML解析是由UtilXml.java中readXmlDocument()完成的:
public static Document readXmlDocument(InputStream is, boolean validate, String docDescription)
throws SAXException, ParserConfigurationException, java.io.IOException {
//omit java code
Document document = null;
/* Standard JAXP (mostly), but doesn't seem to be doing XML Schema validation, so making sure that is on... */
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(validate);
factory.setNamespaceAware(true);
factory.setAttribute("http://xml.org/sax/features/validation", validate);
factory.setAttribute("http://apache.org/xml/features/validation/schema", validate);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);
我们就有理由相信XXE漏洞是由DocumentBuilderFactory设置不当操作造成的,当然我们现在看到的是修改之后的版本; JavaMelody中是由PayloadNameRequestWrapper.java中的parseSoapMethodName来解析XML。
private static String parseSoapMethodName(InputStream stream, String charEncoding) {
try {
// newInstance() et pas newFactory() pour java 1.5 (issue 367)
final XMLInputFactory factory = XMLInputFactory.newInstance();
final XMLStreamReader xmlReader;
if (charEncoding != null) {
xmlReader = factory.createXMLStreamReader(stream, charEncoding);
} else {
xmlReader = factory.createXMLStreamReader(stream);
}
// omit java code
}
根据JavaMelody组件XXE漏洞解析的分析,是由于factory没有限制外部查询导致的XXE漏洞。 同样地,微信支付SDK的XXE漏洞和Spring-data-XMLBean XXE漏洞都是是使用了DocumentBuilderFactory没有限制外部查询而导致XXE
不同库的Java XXE漏洞
DocumentBuilderFactory 错误地修复方式
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
String FEATURE = null;
FEATURE = "http://javax.xml.XMLConstants/feature/secure-processing";
dbf.setFeature(FEATURE, true);
FEATURE = "http://apache.org/xml/features/disallow-doctype-decl";
dbf.setFeature(FEATURE, true);
FEATURE = "http://xml.org/sax/features/external-parameter-entities";
dbf.setFeature(FEATURE, false);
FEATURE = "http://xml.org/sax/features/external-general-entities";
dbf.setFeature(FEATURE, false);
FEATURE = "http://apache.org/xml/features/nonvalidating/load-external-dtd";
dbf.setFeature(FEATURE, false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
// 读取xml文件内容
FileInputStream fis = new FileInputStream("path/to/xxexml");
InputSource is = new InputSource(fis);
builder.parse(is)
看似设置得很很全面,但是直接仍然会被攻击,原因就是在于DocumentBuilder builder = dbf.newDocumentBuilder();这行代码需要在dbf.setFeature()之后才能够生效; 正确地修复方式
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
String FEATURE = null;
FEATURE = "http://javax.xml.XMLConstants/feature/secure-processing";
dbf.setFeature(FEATURE, true);
FEATURE = "http://apache.org/xml/features/disallow-doctype-decl";
dbf.setFeature(FEATURE, true);
FEATURE = "http://xml.org/sax/features/external-parameter-entities";
dbf.setFeature(FEATURE, false);
FEATURE = "http://xml.org/sax/features/external-general-entities";
dbf.setFeature(FEATURE, false);
FEATURE = "http://apache.org/xml/features/nonvalidating/load-external-dtd";
dbf.setFeature(FEATURE, false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
DocumentBuilder builder = dbf.newDocumentBuilder();
// 读取xml文件内容
FileInputStream fis = new FileInputStream("path/to/xxexml");
InputSource is = new InputSource(fis);
Document doc = builder.parse(is);
SAXBuilder 这个库貌似使用得不是很多。SAXBuilder如果使用默认配置就会触发XXE漏洞;如下
Document doc = builder.build(InputSource);
修复方法
方法一
SAXBuilder builder = new SAXBuilder(true);
Document doc = builder.build(InputSource);
方式二
SAXBuilder builder = new SAXBuilder();
builder.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
builder.setFeature("http://xml.org/sax/features/external-general-entities", false);
builder.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
builder.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Document doc = builder.build(InputSource)
SAXParserFactory 同样地,在默认配置下就会存在XXE漏洞。
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
parser.parse(InputSource, (HandlerBase) null);
修复方法
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
SAXParser parser = spf.newSAXParser();
parser.parse(InputSource, (HandlerBase) null);
SAXReader 在默认情况下会出现XXE漏洞。
SAXReader saxReader = new SAXReader();
saxReader.read(InputSource);
修复方法
XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
reader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
reader.setFeature("http://xml.org/sax/features/external-general-entities", false);
reader.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
reader.parse(new InputSource(InputSource));
SAXTransformerFactory
SAXTransformerFactory sf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
StreamSource source = new StreamSource(InputSource);
sf.newTransformerHandler(source);
SAXTransformerFactory sf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
sf.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
sf.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, "");
StreamSource source = new StreamSource(InputSource);
sf.newTransformerHandler(source);
SchemaFactory 在默认情况下也会出现XXE漏洞。
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
StreamSource source = new StreamSource(ResourceUtils.getPoc1());
Schema schema = factory.newSchema(InputSource);
修复方法
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "");
factory.setProperty(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");
StreamSource source = new StreamSource(InputSource);
Schema schema = factory.newSchema(source);
ValidatorSample
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();
Validator validator = schema.newValidator();
StreamSource source = new StreamSource(InputSource);
validator.validate(source);
修复方法
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();
Validator validator = schema.newValidator();
validator.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "");
validator.setProperty(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");
StreamSource source = new StreamSource(InputSource);
validator.validate(source);
TransformerFactory
TransformerFactory tf = TransformerFactory.newInstance();
StreamSource source = new StreamSource(InputSource);
tf.newTransformer().transform(source, new DOMResult());
修复方法
TransformerFactory tf = TransformerFactory.newInstance();
tf.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
tf.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, "");
StreamSource source = new StreamSourceInputSource);
tf.newTransformer().transform(source, new DOMResult());
参考文章
http://blog.spoock.com/2018/10/23/java-xxe/