怎样应用机械进修建立歹意软件检测体系 | 申博官网
登录
  • 欢迎进入申博官网!
  • 如果您觉得申博官网对你有帮助,那么赶紧使用Ctrl+D 收藏申博官网并分享出去吧
  • 这里是申博官方网!
  • 申博官网是菲律宾sunbet官网品牌平台!
  • 申博开户专业品牌平台!

怎样应用机械进修建立歹意软件检测体系

申博_新闻事件 申博 104次浏览 已收录 0个评论

在这篇文章中,我们将议论我喜好的两个主题,这些主题是曩昔7年中我(私家)研讨的中心要素:机械进修和歹意软件检测。

我受过相称经验性和相对非学术性的教诲,我晓得一个热忱的开发人员想要靠近机械进修并勤奋明白正式界说,线性代数和诸云云类的器械。因而,我将尽可以或许连结这类实用性,以便纵然是受过较少教诲的读者也能明白并可以或许最先运用神经收集。

另外,大局部资本都鸠合在尽人皆知的题目上,比方MNIST数据集上的手写数字辨认(机械进修的“hello world”),同时让读者设想更庞杂的工程体系应当怎样事变。一般是指怎样处置惩罚非图象的输入。

TL; DR:我不善于数学,MNIST很无聊,检测歹意软件更风趣:D

我还将这个用作ergo的一些新功用的示例用例,一个我和chiconara不久前最先的项目,用于自动化机械进修模子建立,数据编码,GPU培训,基准测试和大规模布置。

与这篇文章相干的源代码可以或许在这里找到。

重要说明:仅此项目不组成贸易防病毒的有效替代品。

怎样应用机械进修建立歹意软件检测体系

题目界说和数据集

怎样应用机械进修建立歹意软件检测体系

传统的歹意软件检测引擎依赖于署名的运用 ——歹意软件研讨人员手动挑选的独一值用以辨认歹意代码的存在,同时确保非歹意样本组中没有争执(称为“误报”)。

这类要领有一些题目,一般很轻易绕过(依据署名的范例,歹意代码中的一名或几个字节的转变可以或许会使歹意软件没法检测到)而且当研讨人员的数目比他们须要手动逆向工程、辨认和写入署名所需的奇异歹意软件系列的数目少几个数目级时,这类要领就不能很好地扩大。

我们的目的是传授盘算机,更具体地说是人工神经收集,在不依赖于我们须要建立的任何显式署名数据库的情况下检测Windows歹意软件,但经由历程简朴地摄取歹意文件的数据集,我们愿望可以或许检测并从中进修以辨别歹意代码,不管是不是在数据集自身内部,最重要的是,在处置惩罚新的、看不见的样本时,我们独一晓得的是哪些文件是歹意的而哪些不是,但不晓得是甚么使它们云云,我们将让ANN(人工神经收集)完成其他的事变。

为了做到这一点,我收集了约莫200,000个Windows PE样本,匀称地分为歹意(在VirusTotal上检测到10多个)和清洁(已知而且在VirusTotal上有0个检测)。由于在雷同的数据集上练习和测试模子没有多大意义(由于它可以或许在练习集上显现得非常好,然则基础没法对新样本举行归纳综合),这个数据集将经由历程遍历自动划分为3个子集:

  • 练习集,70%的样本,用于练习。
  • 考证集,15%的样本,在每一个练习阶段对模子举行基准测试。
  • 测试集,15%的样本,在练习后对模子举行基准测试。

毋庸置疑,数据鸠合(准确标记的)样本的数目是模子准确性的症结,它可以或许准确地星散这两个类并将其推行到看不见的样本 ——在练习历程当中运用的越多越好。另外,抱负情况下,应运用较新的样本按期更新数据集,并对模子举行从新练习,以便纵然在野外涌现新的奇异样本时(即:wget + crontab + ergo),也能连结较高的精度。

由于我在这篇文章中运用的特定数据集的巨细,我没法在不占用带宽的情况下同享它:

怎样应用机械进修建立歹意软件检测体系

然则,我在Google云端硬盘上传了dataset.csv文件,提取了约莫340MB,你可以或许用它来重现这篇文章的效果。

可移植的可实行花样

Windows PE花样有雄厚的文档纪录和很多明白其内部的好资本,比方Ange Albertini的“ Exploring the Portable Executable format ” 44CON 2013演示文稿(从我拍摄下图)可以或许在线免费猎取,因而我不会花太多时候研讨细节。

我们必需切记的症结事实是:

  • PE有几个标头形貌其属性和种种寻址细节,比方PE将在内存中加载的基地点以及进口点的地位。
  • PE有几个局部,每一个局部包罗数据(常量,全局变量等),代码(在这类情况下,该局部被标记为可实行)或偶然二者都包罗。
  • PE包罗导入API和从哪些体系库导入的声明。

怎样应用机械进修建立歹意软件检测体系

致Ange Angeini的作品

比方,这是Firefox PE局部的模样:

怎样应用机械进修建立歹意软件检测体系

致“Machines Can Think”的博客

虽然在某些情况下,若是PE已运用诸如UPX之类的打包顺序举行处置惩罚,那这局部可以或许看起来有点分歧,由于重要代码和数据局部已过紧缩,而且在运行时解紧缩的代码存根已增添:

怎样应用机械进修建立歹意软件检测体系

致“Machines Can Think”的博客

我们如今要做的是看看怎样将这些本质上非常分歧的值(它们是统统范例的区间数和可变长度的字符串)编码成标量数的向量,每一个向量在区间[0.0,1.0]中归一化,而且长度稳定。这是我们的机械进修模子可以或许明白的输入范例。

确定要斟酌的PE的哪些特性的历程多是设想任何机械进修体系的最重要局部,这被称为特性工程,而读取这些值并对其举行编码的行动称为特性提取

特性工程

建立项目后:

ergo create ergo-pe-av

我最先在encode.py文件中完成特性提取算法,这是一个非常简朴的出发点(包罗解释和多行字符串在内150行),它为我们供应了充足的信息来到达使人感兴趣的精度程度,而且在未来可以或许经由历程附加功用轻松扩大。

cd ergo-pe-av
vim encode.py

我们向量的前11个标量编码了一组布尔属性,LIEF,我正在运用的QuarksLab中的使人惊异的库,从PE剖析 ——每一个属性若是为真,编码为1.0,若是为假,编码为0.0

属性 形貌
pe.has_configuration 若是PE具有负载设置装备摆设,则为True。
pe.has_debug 若是PE具有Debug局部,则为True。
pe.has_exceptions 若是PE正在运用非常,则为True。
pe.has_exports 若是PE有任何导出标记,则为True。
pe.has_imports 若是PE正在导入任何标记,则为True。
pe.has_nx 若是PE 设置了NX位,则为True。
pe.has_relocations 若是PE具有重定位条目,则为True。
pe.has_resources 若是PE有任何资本,则为True。
pe.has_rich_header 若是存在富题目,则为True。
pe.has_signature 若是PE经由数字署名,则为Ture。
pe.has_tls 若是PE运用TLS,则为True。

然后是64个元素,代表PE进口点函数的前64个字节,每一个字节经由历程将其除以255规范化为[0.0,1.0]—— 这将有助于模子检测那些具有非常奇异的进口点的可实行文件,这些进口点在同一个系列的分歧样本之间仅略有分歧(您可以或许将其视为一个非常基础的署名):

ep_bytes  =  [0]  *  64
try:
    ep_offset = pe.entrypoint - pe.optional_header.imagebase
    ep_bytes = [int(b) for b in raw[ep_offset:ep_offset+64]]
except Exception as e:
    log.warning("can't get entrypoint bytes from %s: %s", filepath, e)
# ...
# ...
def encode_entrypoint(ep):
    while len(ep) < 64: # pad
        ep += [0.0]
    return np.array(ep) / 255.0 # normalize

然后是二进制文件中ASCII表(因而巨细为256)的每一个字节反复的直方图 – 该数据点将编码有关文件原

始内容的基础统计信息:

# the 'raw' argument holds the entire contents of the file
def encode_histogram(raw):
    histo = np.bincount(np.frombuffer(raw, dtype=np.uint8), minlength=256)
    histo = histo / histo.sum() # normalize
    return  histo

我决定在特性向量中编码的下一件事是导入表,由于PE运用的API是非常相干的信息:D为了做到这一点,我手动挑选了我的数据鸠合的150个最常见的库,每一个PE运用的API将相对库的列加1,建立另一个150个值的直方图,然后经由历程导入的API总量举行规范化:

# the 'pe' argument holds the PE object parsed by LIEF
def encode_libraries(pe):
    global libraries

    imports = {dll.name.lower():[api.name if not api.is_ordinal else api.iat_address \
                           for api in dll.entries] for dll in pe.imports}

    libs = np.array([0.0] * len(libraries))
    for idx, lib in enumerate(libraries):
        calls = 0
        dll   = "%s.dll" % lib
        if lib in imports:
            calls = len(imports[lib])
        elif dll in imports:
            calls = len(imports[dll])
        libs[idx] += calls
    tot = libs.sum()
    return ( libs / tot ) if tot > 0 else libs # normalize

我们继承编码磁盘上PE巨细与内存巨细(其假造巨细)的比率:

min(sz,pe.virtual_size)/ max(sz,pe.virtual_size)

接下来,我们想要编码关于PE局部的一些信息,比方包罗代码的局部与包罗数据的局部的数目,标记为可实行的局部,每一个局部的均匀( Shannon entropy)以及它们的巨细与其假造的均匀比率size – 这些数据点将通知模子PE是不是以及怎样打包/紧缩/殽杂:

bugbounty:应用文件上传 MIME嗅探到存储型XSS

bugbounty:利用文件上传 MIME嗅探到存储型XSS 前言 在私有程序上查找漏洞时,我能够通过文件上传功能找到存储的XSS漏洞。由于滥用IE/Edge处理文件的方式,我能够绕过文件类型检查 并将恶意HTML文件创建为GIF。我还分解了文件上传过滤器,并在利用它时我会进行相应的思考。 第一步:FUZZ探测 当我开始查看新程序时,我总是喜欢的一件事是FUZZ一下文件上传的点。文件上传中的漏洞通常会给你带来高严重性错误,而且开发人员似乎 很难保护它们。简单的FUZZ这个私人程序,我注意到它有一个联系支持的功能。在此联系表单中,您可以上传附件。我注意到的第一件事是, 当我上传图片时,它将其上传到同一个域名下。 示例:文件上传请求 请求上传文件 示例:响应 {“result”:true,”message”:”/UploadFiles/redacted/redacted/3021d74f18ddasdasd50abe934f.png,”code”:0} 这立刻引起了我的注意。通常,存储用户信息,可以在同一位置/域名下上传的文件并不是一个很好的做法,因为它可能导致非常大的漏洞,包括远程代码执行漏洞。 过滤1:Bypass 接下来我们需要弄清楚,为了利用这个,是如何上传恶意文件。我尝试的第一件事就是将文件扩展名更改为.html。当然,这不起作用,我们得到: {“re

def encode_sections(pe):
    sections = [{ \
        'characteristics': ','.join(map(str, s.characteristics_lists)),
        'entropy': s.entropy,
        'name': s.name,
        'size': s.size,
        'vsize': s.virtual_size } for s in pe.sections]

    num_sections = len(sections)
    max_entropy  = max([s['entropy'] for s in sections]) if num_sections else 0.0
    max_size     = max([s['size'] for s in sections]) if num_sections else 0.0 
    min_vsize    = min([s['vsize'] for s in sections]) if num_sections else 0.0
    norm_size    = (max_size / min_vsize) if min_vsize > 0 else 0.0

    return [ \
        # code_sections_ratio
        (len([s for s in sections if 'SECTION_CHARACTERISTICS.CNT_CODE' in s['characteristics']]) / num_sections) if num_sections else 0,
        # pec_sections_ratio
        (len([s for s in sections if 'SECTION_CHARACTERISTICS.MEM_EXECUTE' in s['characteristics']]) / num_sections) if num_sections else 0,
        # sections_avg_entropy
        ((sum([s['entropy'] for s in sections]) / num_sections) / max_entropy) if max_entropy > 0 else 0.0,
        # sections_vsize_avg_ratio
        ((sum([s['size'] / s['vsize'] for s in sections]) / num_sections) / norm_size) if norm_size > 0 else 0.0,
    ]

末了,我们将统统碎片粘合到一个巨细的矢量中486

v = np.concatenate([ \
    encode_properties(pe),
    encode_entrypoint(ep_bytes),
    encode_histogram(raw),
    encode_libraries(pe),
    [ min(sz, pe.virtual_size) / max(sz, pe.virtual_size)],
    encode_sections(pe)
    ])

return v

剩下要做的独一事变是通知我们的模子怎样经由历程自界说先前由ergo建立的prepare.py文件中的prepare_input函数来编码输入样本—— 以下完成支撑给定其途径的文件的编码,给定其内容(作为文件上传到ergo API),或许只是对标量特性的原始向量举行评价:

# used by `ergo encode <path> <folder>` to encode a PE in a vector of scalar features
# used by `ergo serve <path>` to parse the input query before running the inference
def prepare_input(x, is_encoding = False):
    # file upload
    if isinstance(x, werkzeug.datastructures.FileStorage):
        return encoder.encode_pe(x)
    # file path
    elif os.path.isfile(x) :
        return encoder.encode_pe(x)
    # raw vector
    else:
        return x.split(',')

如今我们有了将这个转换为以下所需的统统前提:

`0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.333333333333,0.545098039216,0.925490196078,0.41568627451,1.0,0.407843137255,0.596078431373,0.192156862745,0.250980392157,0.0,0.407843137255,0.188235294118,0.149019607843,0.250980392157,0.0,0.392156862745,0.63137254902,0.0,0.0,0.0,0.0,0.313725490196,0.392156862745,0.537254901961,0.145098039216,0.0,0.0,0.0,0.0,0.513725490196,0.925490196078,0.407843137255,0.325490196078,0.337254901961,0.341176470588,0.537254901961,0.396078431373,0.909803921569,0.2,0.858823529412,0.537254901961,0.364705882353,0.988235294118,0.41568627451,0.0078431372549,1.0,0.0823529411765,0.972549019608,0.188235294118,0.250980392157,0.0,0.349019607843,0.513725490196,0.0509803921569,0.0941176470588,0.270588235294,0.250980392157,0.0,1.0,0.513725490196,0.0509803921569,0.109803921569,0.270588235294,0.250980392157,0.870149739583,0.00198567708333,0.00146484375,0.000944010416667,0.000830078125,0.00048828125,0.000162760416667,0.000325520833333,0.000569661458333,0.000130208333333,0.000130208333333,8.13802083333e-05,0.000553385416667,0.000390625,0.000162760416667,0.00048828125,0.000895182291667,8.13802083333e-05,0.000179036458333,8.13802083333e-05,0.00048828125,0.001611328125,0.000162760416667,9.765625e-05,0.000472005208333,0.000146484375,3.25520833333e-05,8.13802083333e-05,0.000341796875,0.000130208333333,3.25520833333e-05,1.62760416667e-05,0.001171875,4.8828125e-05,0.000130208333333,1.62760416667e-05,0.00372721354167,0.000699869791667,6.51041666667e-05,8.13802083333e-05,0.000569661458333,0.0,0.000113932291667,0.000455729166667,0.000146484375,0.000211588541667,0.000358072916667,1.62760416667e-05,0.00208333333333,0.00087890625,0.000504557291667,0.000846354166667,0.000537109375,0.000439453125,0.000358072916667,0.000276692708333,0.000504557291667,0.000423177083333,0.000276692708333,3.25520833333e-05,0.000211588541667,0.000146484375,0.000130208333333,0.0001953125,0.00577799479167,0.00109049479167,0.000227864583333,0.000927734375,0.002294921875,0.000732421875,0.000341796875,0.000244140625,0.000276692708333,0.000211588541667,3.25520833333e-05,0.000146484375,0.00135091145833,0.000341796875,8.13802083333e-05,0.000358072916667,0.00193684895833,0.0009765625,0.0009765625,0.00123697916667,0.000699869791667,0.000260416666667,0.00078125,0.00048828125,0.000504557291667,0.000211588541667,0.000113932291667,0.000260416666667,0.000472005208333,0.00029296875,0.000472005208333,0.000927734375,0.000211588541667,0.00113932291667,0.0001953125,0.000732421875,0.00144856770833,0.00348307291667,0.000358072916667,0.000260416666667,0.00206705729167,0.001171875,0.001513671875,6.51041666667e-05,0.00157877604167,0.000504557291667,0.000927734375,0.00126953125,0.000667317708333,1.62760416667e-05,0.00198567708333,0.00109049479167,0.00255533854167,0.00126953125,0.00109049479167,0.000325520833333,0.000406901041667,0.000325520833333,8.13802083333e-05,3.25520833333e-05,0.000244140625,8.13802083333e-05,4.8828125e-05,0.0,0.000406901041667,0.000602213541667,3.25520833333e-05,0.00174153645833,0.000634765625,0.00068359375,0.000130208333333,0.000130208333333,0.000309244791667,0.00105794270833,0.000244140625,0.003662109375,0.000244140625,0.00245768229167,0.0,1.62760416667e-05,0.002490234375,3.25520833333e-05,1.62760416667e-05,9.765625e-05,0.000504557291667,0.000211588541667,1.62760416667e-05,4.8828125e-05,0.000179036458333,0.0,3.25520833333e-05,3.25520833333e-05,0.000211588541667,0.000162760416667,8.13802083333e-05,0.0,0.000260416666667,0.000260416666667,0.0,4.8828125e-05,0.000602213541667,0.000374348958333,3.25520833333e-05,0.0,9.765625e-05,0.0,0.000113932291667,0.000211588541667,0.000146484375,6.51041666667e-05,0.000667317708333,4.8828125e-05,0.000276692708333,4.8828125e-05,8.13802083333e-05,1.62760416667e-05,0.000227864583333,0.000276692708333,0.000146484375,3.25520833333e-05,0.000276692708333,0.000244140625,8.13802083333e-05,0.0001953125,0.000146484375,9.765625e-05,6.51041666667e-05,0.000358072916667,0.00113932291667,0.000504557291667,0.000504557291667,0.0005859375,0.000813802083333,4.8828125e-05,0.000162760416667,0.000764973958333,0.000244140625,0.000651041666667,0.000309244791667,0.0001953125,0.000667317708333,0.000162760416667,4.8828125e-05,0.0,0.000162760416667,0.000553385416667,1.62760416667e-05,0.000130208333333,0.000146484375,0.000179036458333,0.000276692708333,9.765625e-05,0.000406901041667,0.000162760416667,3.25520833333e-05,0.000211588541667,8.13802083333e-05,1.62760416667e-05,0.000130208333333,8.13802083333e-05,0.000276692708333,0.000504557291667,9.765625e-05,1.62760416667e-05,9.765625e-05,3.25520833333e-05,1.62760416667e-05,0.0,0.00138346354167,0.000732421875,6.51041666667e-05,0.000146484375,0.000341796875,3.25520833333e-05,4.8828125e-05,4.8828125e-05,0.000260416666667,3.25520833333e-05,0.00068359375,0.000960286458333,0.000227864583333,9.765625e-05,0.000244140625,0.000813802083333,0.000179036458333,0.000439453125,0.000341796875,0.000146484375,0.000504557291667,0.000504557291667,9.765625e-05,0.00760091145833,0.0,0.370786516854,0.0112359550562,0.168539325843,0.0,0.0,0.0337078651685,0.0,0.0,0.0,0.303370786517,0.0112359550562,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0561797752809,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0449438202247,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.25,0.25,0.588637653212,0.055703845605`

假定你有一个文件夹包罗pe-malicious子文件夹中的歹意样本和pe-legit中的清洁样本(随便给它们任何称号,但文件夹称号将成为与每一个样本相干联的标签),你可以或许最先编码历程到一个dataset.csv文件,我们的模子可以或许运用该文件举行培训:

ergo encode /path/to/ergo-pe-av /path/to/dataset --output /path/to/dataset.csv

喝咖啡放松一下,这个历程可以或许须要一段时候:),这取决于数据集的巨细以及存储磁盘的速率。

向量的有效属性

虽然ergo正在编码我们的数据集,但让我们歇息一下,议论这些向量的风趣属性以及怎样运用它。

如今很清晰,构造上或行动上类似的可实行文件将具有类似的向量,个中可以或许丈量与一个向量和另一个向量的间隔,比方,经由历程运用余弦类似性,界说为:

怎样应用机械进修建立歹意软件检测体系

除其他方面,这个器量规范可用于从数据鸠合(我要提示的是,这是一个重大的文件鸠合,不管它们是不是是歹意的,你其实不真正相识其他文件)提取给定族的统统样本,给定一个已知的“轴”样本。比方,假定您有MIPS的Mirai样本,而且您愿望从不计其数个分歧的未标记样本的数据鸠合提取任何体系构造的每一个Mirai变体。

我在sum数据库中对诸如findSimilar “oracle”存储历程的一个奇异称号)实行的算法非常简朴:

// Given the vector with id="id", return a list of
// other vectors which cosine similarity to the reference
// one is greater or equal than the threshold.
// Results are given as a dictionary of :
//      "vector_id => similarity"
function findSimilar(id, threshold) {
    var v = records.Find(id);
    if( v.IsNull() == true ) {
        return ctx.Error("Vector " + id + " not found.");
    }

    var results = {};
    records.AllBut(v).forEach(function(record){
        var similarity = v.Cosine(record);
        if( similarity >= threshold ) {
           results[record.ID] = similarity
        }
    });

    return results;
}

但相称有效:

怎样应用机械进修建立歹意软件检测体系

ANN作为黑匣子和练习

同时,我们的编码器应当已完成了它的事变,而且天生包罗从每一个样本中提取的统统标记向量的dataset.csv文件,此时应当可以或许用于练习我们的模子……然则“练习我们的模子”实际上意味着甚么?这个“模子”起首是甚么?

我们运用的模子是一种称为人工神经收集的盘算构造,我们运用Adam优化算法举行练习。在网上你会找到二者非常细致和正式的界说,但底线是:

ANN是一个“盒子”,包罗数百个数值参数(“神经元” 的“权重”,按层构造),它们与输入的(我们的向量)相乘并组合以发生输出展望。培训历程包罗向体系供应数据集、依据已知标签搜检展望、少许变动这些参数、视察这些转变是不是以及怎样影响模子准确性并反复此历程达给定次数(时代)直到团体机能到达我们界说的所需最小值。

怎样应用机械进修建立歹意软件检测体系

来自nature.com的申谢)

重要假定我们未知的数据鸠合的数据点之间存在数值联系关系,但若是已知数据集,我们将可以或许把该数据集划分为输出类。我们要做的是请求黑盒子摄取数据集并经由历程迭代调解其内部参数使其近似于如许的函数。

model.py文件中你可以或许找到ANN的界说,这是一个完全衔接的收集,每一个隐蔽层有70个神经元,ReLU作为激活函数,在练习时期丧失30%:

n_inputs = 486

return Sequential([
    Dense(70, input_shape=(n_inputs,), activation='relu'),
    Dropout(0.3),
    Dense(70, activation='relu'),
    Dropout(0.3),
    Dense(2, activation='softmax')
])

我们如今可以或许最先培训历程:

ergo train /path/to/ergo-pe-av --dataset /path/to/dataset.csv

依据CSV文件中向量的总量,此历程可以或许须要几分钟到几小时以至几天。若是你的机械上有GPU,ergo会自动运用它们而不是CPU中心,以便明显加速练习速率(若是你觉得疑心,请检察这篇文章)。

完成后,你可以或许运用以下要领搜检模子机能统计信息:

ergo view /path/to/ergo-pe-av

这将显现培训汗青,我们可以或许考证模子的准确性是不是确切跟着时候的推移而增添(在我们的例子中,它在epoch30四周到达了97%的准确度)和ROC曲线,它通知我们模子怎样有效地辨别歹意与否(AUC,或许说曲线下的地区,为0.994,意味着模子非常好):

怎样应用机械进修建立歹意软件检测体系

另外,还将显现每一个培训、考证和测试集的殽杂矩阵。左上角的对角线值(深红色)代表准确展望的数目,而其他值(粉色)则是毛病的(我们的模子在约莫30000个样本的测试鸠合有1.4%的误报率):

怎样应用机械进修建立歹意软件检测体系

怎样应用机械进修建立歹意软件检测体系

怎样应用机械进修建立歹意软件检测体系

斟酌到我们的特性提取算法的简朴性,如许一个大数据集的97%准确度是一个非常风趣的效果。很多毛病检测都是由UPX(或许以至只是自解压zip / msi档案)如许的打包顺序引发的,这些打包顺序会影响我们正在编码的一些数据点 – 增添解包战略(比方模仿解包存根直到真正的PE处于内存)和更多功用(更大的进口点矢量,动态剖析跟踪被挪用的API,设想力是极限!)是取得99%的症结:)

结论

我们如今可以或许删除临时文件:

ergo clean /path/to/ergo-pe-av

加载模子并将其用作API:

ergo serve /path/to/ergo-pe-av --classes "clean, malicious"

并请求客户端分类:

curl -F "x=@/path/to/file.exe" "http://localhost:8080/"

您将收到以下相应(此处正在扫描的文件):

怎样应用机械进修建立歹意软件检测体系

该模子将样本检测为歹意样本,置信度凌驾99%。

如今您可以或许运用该模子扫描您想要的任何内容,enjoy!:)

怎样应用机械进修建立歹意软件检测体系


申博|网络安全巴士站声明:该文看法仅代表作者自己,与本平台无关。版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权
转载请注明怎样应用机械进修建立歹意软件检测体系
喜欢 (0)
[]
分享 (0)
发表我的评论
取消评论
表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址