08/31
2017

一直在用flask写一些小型的后台服务。有余力之下,去研究了一下flask的源码。不得不赞叹flask对于python的运用之炉火纯青。下面开始分析。

flask框架使用了库werkzeug,werkzeug是基于WSGI的,WSGI是什么呢?(Web Server Gateway Interface),WSGI是一个协议,通俗翻译Web服务器网关接口,该协议把接收自客户端的所有请求都转交给这个对象处理。 基于如上背景知识,这里不只是研究了flask的源码,也会把涉及到的werkzeug库的源码,以及werkzeug库使用到的python基本库的源码列出来,力求能够把整个逻辑给陈列清楚。

##服务类

###Flask:

应用程序类,该类直接面向用户。所有的Flask程序都必须创建这样一个程序实例。简单点儿可以app=Flask(__name__),Flask可以利用这个参数决定程序的根目录。

  • 类成员:

    1. Request

    2. Response

  • 类方法:

    1. run

run方法调用werkzeug库中的一个run_simple函数来启动BaseWSGIServer

def run_simple(hostname, port, application, use_reloader=False,
           use_debugger=False, use_evalex=True,
           extra_files=None, reloader_interval=1,
           reloader_type='auto', threaded=False,
           processes=1, request_handler=None, static_files=None,
           passthrough_errors=False, ssl_context=None):
    if use_debugger:
        from werkzeug.debug import DebuggedApplication
        application = DebuggedApplication(application, use_evalex)
    if static_files:
        from werkzeug.wsgi import SharedDataMiddleware
        application = SharedDataMiddleware(application, static_files)

    def inner():
        try:
            fd = int(os.environ['WERKZEUG_SERVER_FD'])
        except (LookupError, ValueError):
            fd = None
        make_server(hostname, port, application, threaded,
                processes, request_handler,
                passthrough_errors, ssl_context,
                fd=fd).serve_forever()

    if use_reloader:
        # If we're not running already in the subprocess that is the
        # reloader we want to open up a socket early to make sure the
        # port is actually available.
        if os.environ.get('WERKZEUG_RUN_MAIN') != 'true':
            if port == 0 and not can_open_by_fd:
                raise ValueError('Cannot bind to a random port with enabled '
                             'reloader if the Python interpreter does '
                             'not support socket opening by fd.')

            # Create and destroy a socket so that any exceptions are
            # raised before we spawn a separate Python interpreter and
            # lose this ability.
            address_family = select_ip_version(hostname, port)
            s = socket.socket(address_family, socket.SOCK_STREAM)
            s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
            s.bind((hostname, port))
            if hasattr(s, 'set_inheritable'):
                s.set_inheritable(True)

            # If we can open the socket by file descriptor, then we can just
            # reuse this one and our socket will survive the restarts.
            if can_open_by_fd:
                os.environ['WERKZEUG_SERVER_FD'] = str(s.fileno())
                s.listen(LISTEN_QUEUE)
            else:
                s.close()

        from ._reloader import run_with_reloader
        run_with_reloader(inner, extra_files, reloader_interval,
                          reloader_type)
    else:
        inner()

make_server创建http服务器,server_forever启动服务监听端口。使用select异步处理。

####BaseWSGIServer

WSGI服务器,继承自HTTPServer。

  • 类成员:

    app;初始化的时候将Flask的实例传入。

  • 类方法:

    __init__ ;初始化完成后启动HTTPServer,并将请求处理类`WSGIRequestHandler传入。

####HTTPServer

HTTP服务器。python基础库(BaseHTTPServer.py)。该类实现了一个server_bind方法用来绑定服务器地址和端口,可不必理会。继承自SocketServer.TCPServer。可以看到,HTTP实际上是基于TCP服务的。

####SocketServer.TCPServer

继承自BaseServer。实现了一个fileno的函数,用来获取socket套接字的描述符。这一个是预留给select模块使用的。

##请求处理类

####BaseRequestHandler 最基础的请求处理类。类成员包括server(HTTPServer)和request。然后在初始化的时候,依次调用setup、handle、finish方法。

####SocketServer.StreamRequestHandler 继承自BaseRequestHandler。重写setup和finish方法。setup主要是建立读和写两个描述符,finish用来关闭连接。关键的是handle方法。

####BaseHTTPRequestHandler 继承自SocketServer.StreamRequestHandler。利用这个类基本上可以处理简单的http请求了。但Flask只使用了它的几个方法。最重要的是handle方法。

def handle(self):
    """Handle multiple requests if necessary."""
    self.close_connection = 1

    self.handle_one_request()
    while not self.close_connection:
        self.handle_one_request()

可以看到,它主要是去调用了handle_one_request方法,并且可以在同一个连接中处理多个request。这个类的handle_one_request方法我们不用管,在WSGIRequestHandler中我们会重写这个方法。

重点说一下WSGIRequestHandler。

####WSGIRequestHandler

继承自BaseHTTPRequestHandler。Flask没有自定义请求处理类,使用了WSGI库的WSGIRequestHandler

#####类方法make_environ 一个重要的方法,用来创建上下文环境。这个暂且不说。

#####类方法handle_one_request def handle_one_request(self): “"”Handle a single HTTP request.””” self.raw_requestline = self.rfile.readline() if not self.raw_requestline: self.close_connection = 1 elif self.parse_request(): return self.run_wsgi()

调用run_wsgi方法来处理

def run_wsgi(self):
    if self.headers.get('Expect', '').lower().strip() == '100-continue':
        self.wfile.write(b'HTTP/1.1 100 Continue\r\n\r\n')

    self.environ = environ = self.make_environ()
    headers_set = []
    headers_sent = []

    def write(data):
        assert headers_set, 'write() before start_response'
        if not headers_sent:
            status, response_headers = headers_sent[:] = headers_set
            try:
                code, msg = status.split(None, 1)
            except ValueError:
                code, msg = status, ""
            self.send_response(int(code), msg)
            header_keys = set()
            for key, value in response_headers:
                self.send_header(key, value)
                key = key.lower()
                header_keys.add(key)
            if 'content-length' not in header_keys:
                self.close_connection = True
                self.send_header('Connection', 'close')
            if 'server' not in header_keys:
                self.send_header('Server', self.version_string())
            if 'date' not in header_keys:
                self.send_header('Date', self.date_time_string())
            self.end_headers()

        assert isinstance(data, bytes), 'applications must write bytes'
        self.wfile.write(data)
        self.wfile.flush()

    def start_response(status, response_headers, exc_info=None):
        if exc_info:
            try:
                if headers_sent:
                    reraise(*exc_info)
            finally:
                exc_info = None
        elif headers_set:
            raise AssertionError('Headers already set')
        headers_set[:] = [status, response_headers]
        return write

    def execute(app):
        application_iter = app(environ, start_response)
        try:
            for data in application_iter:
                write(data)
            if not headers_sent:
                write(b'')
        finally:
            if hasattr(application_iter, 'close'):
                application_iter.close()
            application_iter = None

    try:
        execute(self.server.app)
    except (socket.error, socket.timeout) as e:
        self.connection_dropped(e, environ)
    except Exception:
        if self.server.passthrough_errors:
            raise
        from werkzeug.debug.tbtools import get_current_traceback
        traceback = get_current_traceback(ignore_system_exceptions=True)
        try:
            # if we haven't yet sent the headers but they are set
            # we roll back to be able to set them again.
            if not headers_sent:
                del headers_set[:]
            execute(InternalServerError())
        except Exception:
            pass
        self.server.log('error', 'Error on request:\n%s',
                        traceback.plaintext)

这个函数比较长,我们只看程序的主干部分,execute(self.server.app)==》application_iter = app(environ, start_response)。app之前已经说过,在创建BaseWSGIServer的时候就把Flask的实例对象传入。这里你可能会疑惑,一个对象怎么还能带参数的调用?这是属于python的语法,只要你在Flask类中定义了__call__方法,当你这样使用的时候,实际上调用的是Flask的__call__方法,类似的是,所有的函数都有__call__方法,这里不再赘述。那么,Flask.__call__到底做什么了呢?我们来看。

def __call__(self, environ, start_response):
    """Shortcut for :attr:`wsgi_app`."""
    return self.wsgi_app(environ, start_response)
def wsgi_app(self, environ, start_response):
    ctx = self.request_context(environ)
    ctx.push()
    error = None
    try:
        try:
            response = self.full_dispatch_request()
        except Exception as e:
            error = e
            response = self.make_response(self.handle_exception(e))
        return response(environ, start_response)
    finally:
        if self.should_ignore_error(error):
            error = None
        ctx.auto_pop(error)

#####分析wsgi_app方法。

前边两行是创建上下文变量,self.request_context(environ)主要做两件事:

  • 创建url适配器
  • url适配器去匹配客户端请求路径self.path,然后存放到environ中的PATH_INFO变量中。其中会执行到WSGIRequestHandlermake_environ方法,这里不再叙述。

最重要的处理请求的环节,response = self.full_dispatch_request()

def full_dispatch_request(self):
    """Dispatches the request and on top of that performs request
    pre and postprocessing as well as HTTP exception catching and
    error handling.

    .. versionadded:: 0.7
    """
    self.try_trigger_before_first_request_functions()
    try:
        request_started.send(self)
        rv = self.preprocess_request()
        if rv is None:
            rv = self.dispatch_request()
    except Exception as e:
        rv = self.handle_user_exception(e)
    response = self.make_response(rv)
    response = self.process_response(response)
    request_finished.send(self, response=response)
    return response

self.try_trigger_before_first_request_functions()是在正式处理请求之前执行before_first_request请求钩子注册的函数,在处理第一个请求之前运行。

然后self.preprocess_request执行before_request请求钩子注册的函数,在每次请求之前运行。

self.dispatch_request重头戏来了,就是这个方法来处理http请求的。怎么处理呢?

def dispatch_request(self):
    """Does the request dispatching.  Matches the URL and returns the
    return value of the view or error handler.  This does not have to
    be a response object.  In order to convert the return value to a
    proper response object, call :func:`make_response`.

    .. versionchanged:: 0.7
       This no longer does the exception handling, this code was
       moved to the new :meth:`full_dispatch_request`.
    """
    req = _request_ctx_stack.top.request
    if req.routing_exception is not None:
        self.raise_routing_exception(req)
    rule = req.url_rule
    # if we provide automatic options for this URL and the
    # request came with the OPTIONS method, reply automatically
    if getattr(rule, 'provide_automatic_options', False) \
       and req.method == 'OPTIONS':
        return self.make_default_options_response()
    # otherwise dispatch to the handler for that endpoint
    return self.view_functions[rule.endpoint](**req.view_args)

首先自然是获取请求参数req。self.view_functions是什么东西呢?就是存放视图与函数的映射表,即你用@app.route('/', methods=['GET','POST'])的时候存放的映射信息。我们可以来看一下app.route这个装饰器。

def route(self, rule, **options):
    """A decorator that is used to register a view function for a
    given URL rule.  This does the same thing as :meth:`add_url_rule`
    but is intended for decorator usage::

        @app.route('/')
        def index():
            return 'Hello World'

    For more information refer to :ref:`url-route-registrations`.

    :param rule: the URL rule as string
    :param endpoint: the endpoint for the registered URL rule.  Flask
                     itself assumes the name of the view function as
                     endpoint
    :param options: the options to be forwarded to the underlying
                    :class:`~werkzeug.routing.Rule` object.  A change
                    to Werkzeug is handling of method options.  methods
                    is a list of methods this rule should be limited
                    to (`GET`, `POST` etc.).  By default a rule
                    just listens for `GET` (and implicitly `HEAD`).
                    Starting with Flask 0.6, `OPTIONS` is implicitly
                    added and handled by the standard request handling.
    """
    def decorator(f):
        endpoint = options.pop('endpoint', None)
        self.add_url_rule(rule, endpoint, f, **options)
        return f
    return decorator
def add_url_rule(self, rule, endpoint=None, view_func=None, **options):
    if endpoint is None:
        endpoint = _endpoint_from_view_func(view_func)
    options['endpoint'] = endpoint
    methods = options.pop('methods', None)

    # if the methods are not given and the view_func object knows its
    # methods we can use that instead.  If neither exists, we go with
    # a tuple of only `GET` as default.
    if methods is None:
        methods = getattr(view_func, 'methods', None) or ('GET',)
    methods = set(methods)

    # Methods that should always be added
    required_methods = set(getattr(view_func, 'required_methods', ()))

    # starting with Flask 0.8 the view_func object can disable and
    # force-enable the automatic options handling.
    provide_automatic_options = getattr(view_func,
        'provide_automatic_options', None)

    if provide_automatic_options is None:
        if 'OPTIONS' not in methods:
            provide_automatic_options = True
            required_methods.add('OPTIONS')
        else:
            provide_automatic_options = False

    # Add the required methods now.
    methods |= required_methods

    # due to a werkzeug bug we need to make sure that the defaults are
    # None if they are an empty dictionary.  This should not be necessary
    # with Werkzeug 0.7
    options['defaults'] = options.get('defaults') or None

    rule = self.url_rule_class(rule, methods=methods, **options)
    rule.provide_automatic_options = provide_automatic_options

    self.url_map.add(rule)
    if view_func is not None:
        old_func = self.view_functions.get(endpoint)
        if old_func is not None and old_func != view_func:
            raise AssertionError('View function mapping is overwriting an '
                                 'existing endpoint function: %s' % endpoint)
        self.view_functions[endpoint] = view_func

请求处理类的实例化: 当select监听到连接到来时,调用http服务的_handle_request_noblock方法(由BaseServer实现),==》process_request==》finish_request==》self.RequestHandlerClass(request, client_address, self)实例化请求处理类。

下载源码:git clone git@github.com:mitsuhiko/flask.git

注:本文旨在叙述flask的工作流程,所以只讲了最基础的BaseWSGIServer,而没有讲ThreadedWSGIServer等。
本站总访问量 本站访客数人次 本文总阅读量